Чтобы посмотреть этот PDF файл с форматированием и разметкой, скачайте его и откройте на своем компьютере.
†If this were a real database system, the array would be kept in secondary storage.
However, this data structure would waste a great deal of space.
†Note that there are probably many more enrollment records than course or student
records, so shrinking enrollment records shrinks the total space requirement almost as
‡It would in practice be useful to place fields like grade or status (credit or audit) in
enrollment records, but our original problem statement does not require this.
†We must have some way of identifying the type of records; we shall discuss a way
to do this momentarily.
†We assume this and is a conditional and.
Fig. 5.9
2. Most nodes will have many fewer than 27 children.
3. A leaf reached by an edge labeled $ cannot have any children, and may as well
not be there.


elementtype =

created to right of










child of




Fig. 5.19.
Now we can write the procedure INSERT, which calls

Show the successive values of the various
LCS algorithm of Fig. 5.29 with first string
Suppose we use 2-3 trees to implement the MERGE and SPLIT
a. Show the result of splitting the tree of Exercise 5.7 at 6.
b. Merge the tree of Exercise 5.7 with the tree consisting of leaves
for elements 10 and 11.
Some of the structures discussed in this chapter can be modified easily
a. Binary search trees. The "" ordering applies to domain elements.
b. 2-3 trees. At interior nodes, place only the key field of domain
Show that in any subtree of a binary search tree, the minimum element is
To implement such a step, we need the notion of an
Fig. 6.15.

Fig. 6.16.
adjacency list version of
superior, at least for large sparse graphs.

To print out the intermediate vertices on the shortest path from vertex

Fig. 6.19.
and 0 otherwise. A is often called the
Example 6.10.
the digraph of Fig. 6.14.
The transitive closure can be computed using a procedure similar to
applying the following formula in the

This formula states that there is a path from
Fig. 6.20.
vertex numbered higher than
1. there is already a path from
higher than
2. there is a path from
- 1 and a path from
As before
computation with only one copy of the


Fig. 6.21.
Example 6.11.
Fig. 6.22.
The eccentricities of the vertices are
vertex eccentricity

Thus the center is vertex
1. First apply the procedure
shortest paths matrix
2. Find the maximum cost in each column
3. Find a vertex with minimum eccentricity. This is the center of
This running time of this process is dominated by the first step, which takes
time. Step (2) takes
Example 6.12.
maximum value in each column is shown below.
Fig. 6.23.
depth- first search of an entire graph is

Fig. 6.24.
Example 6.13.
Assuming that
adjacency list of
Fig. 6.25.
search remains at
unvisited vertices. The arcs leading to new vertices are called
In addition to the tree arcs, there are three other types of arcs defined by a depth-
first search of a directed graph. These are called back arcs, forward arcs, and cross
arcs. An arc such as
its ancestors in the spanning forest. Note that an arc from a vertex to itself is a back
Arcs such as
neither an ancestor nor a descendant, are called
from left to right. This pattern is not accidental. Had the arc
on encountering arc
Fig. 6.26.
How do we distinguish among the four types of arcs? Clearly tree arcs are
special since they lead to unvisited vertices during the depth-first search. Suppose we

introduced in Section 3.1.
All descendants of a vertex
than or equal to the number assigned
forward arcs go from low-numbered to high-numbered vertices and back arcs go
All cross arcs go from high-numbered vertices to low-numbered vertices. To see
this, suppose that
Fig. 6.27.
Among other things, dags are useful in representing the syntactic structure of
Fig. 6.28.
Fig. 6.29.
Fig. 6.30.
Fig. 6.31.
dag so that if there is an arc from vertex
Topological sort can be easily accomplished by adding a print statement after
line (4) to the depth-first search procedure in Fig. 6.24:

The effect of calling
This technique works because there are no back arcs in a dag. Consider what
happens when depth-first search leaves a vertex
Fig. 6.32.
Fig. 6.33.
Note that every vertex of a directed graph
in the
We shall now present an algorithm to find the strongly connected
Fig. 6.34.
components of a given directed graph
1. Perform a depth-first search of
Performing the depth-first search on
shown in Fig. 6.37. We begin with
precisely to the vertices of a tree in the spanning forest of the second depth-first
Suppose that in the depth-first search of
reach either
Fig. 6.35.
Fig. 6.36.
Fig. 6.37.
Now suppose
forest of
We conclude that in the search of
b. linked adjacency lists, and
c. adjacency lists represented as in Fig. 6.5.
a. Use the algorithm
the other vertices.
b. Use the algorithm
Fig. 6.38.
Describe a mathematical model for the following scheduling problem.
Given tasks
graph theory. Some books treating graph algorithms are Deo [1975], Even [1980],
The single-source shortest paths algorithm in Section 6.3 is from Dijkstra [1959].
The all-pairs shortest paths algorithm is from Floyd [1962] and the transitive closure
The strong components algorithm of Section 6.7 is similar to one suggested by
R. Kosaraju in 1978 (unpublished), and to one published by Sharir [1981]. Tarjan
Coffman [1976] contains many examples of how digraphs can be used to model
scheduling problems as in Exercise 6.2. Aho, Garey, and Ullman [1972] show that
Fig. 7.1.
that is, a connected induced subgraph that is not itself a proper subgraph of any other
Example 7.2
namely itself. Figure 7.2 is a graph with two connected components.
Fig. 7.2.
A (simple)
cannot be distinct; eventually, we find
because there are no loops from a vertex to itself, and we cannot have
we entered and left vertex
only one edge incident, and therefore conclude that such a vertex
Now consider the graph
Fig. 7.3.
Example 7.4.
A typical application for minimum-cost spanning trees occurs in the design of
Fig. 7.4.
Fig. 7.5.
step until

Fig. 7.7.
added to
Whenever we find another vertex
The time complexity of Prim's algorithm is
of the loop of lines (4)-(16) and each iteration of the loop takes
inner loops of lines (7)-(10) and (13)-(16). As

increasing cost. If the edge connects two vertices in two different connected
Example 7.5.
added to
2. FIND(
iteration of the while-loop, finding the least cost
Fig. 7.9.
edge in
time in the worst case. The total time required to perform the MERGE and FIND

become either
versa, or
other directly, but one called the other indirectly (i.e.,
Example 7.6.
To follow a few steps of the search, the procedure
edge (
Fig. 7.11.
As for depth-first search, we can build a spanning forest when we perform a
breadth-first search. In this case, we consider edge (
that is not a tree edge is a cross edge, that is, it connects two vertices neither of which
The breadth-first search algorithm given in Fig. 7.12 inserts the tree edges into a
Fig. 7.13.
first search. Each vertex visited is placed in the queue once, so the body of the while
loop is executed once for each vertex. Each edge (
typical, we shall usually refer to the running time of breadth-first search as
Depth-first search and breadth-first search can be used as frameworks around
Fig. 7.14
b. A vertex
if there is some child
this case,
Fig. 7.15.
After computing
The time taken by the above algorithm on a graph of
Thus, the total time is
We can represent this situation by a graph as in Fig. 7.16 where the vertices are
Fig. 7.16.
The matching problem can be formulated in general terms as follows. Given a
edges (1, 6), (3, 7), and (4, 8). The path 2, 6, 1, 8, 4, 9 in Fig. 7.17(b) is an
Fig. 7.17.
The key observation is that
edges in
endpoint of at most one edge from
Fig. 7.18.
relative to
We can now outline our procedure to find a maximal matching
2. Find an augmenting path
3. Repeat step (2) until no further augmenting paths exist, at which point
maximal matching.
It remains only to show how to find an augmenting path relative to a matching
Fig. 7.19.
Fig. 7.20.
Show that an
† Unless otherwise stated, we shall assume an edge is always a pair of distinct
† Note that MERGE and FIND are defined slightly differently from Section 5.5, since
We shall use several criteria to evaluate the running time of an internal sorting
algorithm. The first and most common measure is the number of algorithm steps

Fig. 8.2.
Fig. 8.3.

keytype =
The bubblesort algorithm of Fig. 8.1 applied to the list of Fig. 8.3 sorts the list in
although according to the algorithm of Fig. 8.1, two additional passes are made.
Fig. 8.4.
Example 8.2.
passes of insertion sort for
Fig. 8.6.
record with the lowest key, among

select the smallest among
Thus lines (3) and (4) of Fig. 8.1 take at most
Hence for a fixed value of

Fig. 8.7.
Fig. 8.8.
takes at most
larger than
program takes at most
steps, where the term
bubblesort is
ever needed (i.e., the input happens to be already sorted), the test of line (3) is
Next consider the insertion sort of Fig. 8.5. The while-loop of lines (4-6) of Fig.
8.5 cannot take more than
that the for-loop of lines (2-6) takes at most
sum is
The reader may check that if the array is initially sorted in reverse order, then we
actually go around the while-loop of lines (4-6)
be shown that this smaller bound holds in the average case as well.
Finally, consider selection sort in Fig. 8.7. We may check that the inner for-loop
of lines (4-7) takes
by the algorithm is
seen easily to be
the worst case and the average case as well.
algorithms where records are copied, will take far more time than any of the other
three algorithms take time proportional to
in the next sections. The value of
internal sorting, has been given the name "quicksort." The essence of quicksort is to
sort an array
Example 8.4.
sort the sequence of integers 3, 1, 4, 1, 5, 9, 2, 6, 5, 3. In each case, we have chosen
the left of the others. To do this task, we introduce two cursors,
or to the right of
over any keys greater than or equal to the pivot. Note that our
Fig. 8.9.
selection of the pivot by
because one or the other of these will move past any given key), then

then begin
Fig. 8.12.
The above loop is awkward, since the test that terminates it is in the middle. To
right or wrong; it doesn't matter, as we assume no particular order for the keys

{ sort elements

Fig. 8.14.
To see why that statement is true, we must use a trick that comes up frequently in
In our case, the "items" are the elements from
element all the time spent by
amount, independent of
There are also the final two unsuccessful tests at lines (4), (6), and (8), that may
not be charged to any "item," but these represent only a constant amount and may be
Fig. 8.15.
of the depths is
which is
Equation (8.1) says that the average time taken by quicksort is the time,
outside of the recursive calls, plus the average time spent in recursive calls. The latter
time is expressed in (8.1) as the sum, over all possible
Our first task is to transform (8.1) so that the sum is simplified, and in fact, so
(8.1) takes on the form it would have had if we had picked a truly random pivot at
By replacing half the left side of (8.2) by the right side, we see
Applying (8.3) to (8.1), with
Next, we can apply (8.3) to (8.4), with
If we pick
greater than zero. The third term of (8.7) makes a negative contribution, so from (8.7)
we may claim that
In Chapters 4 and 5 we presented several data structures, such as 2-3 trees,
property to the heap, we call
We now see how lines (4 - 6) of Fig. 8.16 are to be done. Selecting the minimum
at line (4) is easy; it is always in

Fig. 8.17.

Hence, the number of iterations of the while-loop of
Fig. 8.18, (8.8) says that each call to

{ sorts array

Fig. 8.18.
lines (3 - 5) iterates
initially in

This code calculates where record
Fig. 8.19.
We can now write a program to binsort arbitrary collections of records, where the
{ push
Fig. 8.20.
the proper data structure for bins is used. In particular, if
The loop of lines (1 - 2) of Fig. 8.20, which places the records in bins, takes
time, since the INSERT operation of line (2) requires constant time, the insertion
If pointers to the ends of lists do not exist, then in line (4) we must spend time
running down to the end of
Example 8.6.
1. We sort the
help at all, but it is essential. We use
For example, suppose
10, the bin for integer
shows the placement of our list into bins. Notice that integers appear in bins in the
Now, we concatenate the bins in order, producing the list

0, 1, 81, 64, 4, 25, 36, 16, 9, 49 (8.10)
from Fig. 8.21(a). If we use the linked list data structure with pointers to list ends,
then the placement of the
The integers on the list created by concatenating the bins are redistributed into
bins, but using a different bin selection strategy. Now place integer
For our example, Fig. 8.21(b) shows the list (8.10) distributed into bins, with
going into bin [
To see why this algorithm works, we have only to notice that when several
integers are placed in one bin, as 0, 1, 4, and 9 were placed in bin 0, they must be in
Fig. 8.21.
two-digit numbers in base
strategy works. Consider the integers
not possible, so we may assume
the second pass, so

keytype =
or an array of elements of the same type, as in

keytype =
We shall assume from here that keytype consists of
1900..1999. In (8.12),
to be sorted in this way, we never copy a record. We just move records from one list

{ sorts list
of types
procedure uses
clear bins }

Fig. 8.22.
As before, for concatenation to be done quickly, we need pointers to the end of
number of different values of type
the loop of lines (6 - 7) takes
Example 8.7.
generalize Example 8.6 and view keys as base-
which, since
As another example, if keys are character strings of length
character strings is also
constant, or even
radix sort not be
which is
the parent plus either the fact
Fig. 8.23.
Now consider what happens if the initial data is such that we reach node (2). We
where it does not.
If the algorithm reaches the status of node (5), it will end, since it has already
level 1, 4 nodes at level 2, and in general, 2
number of leaves of a tree with no nodes at levels higher than
way, a binary tree with
Fig. 8.24.
Figure 8.24(a) cannot be the smallest counterexample, because the tree rooted at
then the trees rooted at
That is, the average depth of leaves in
The reader can check that when
show that (8.13) has a minimum when
We leave this proof as an exercise to the reader skilled in differential calculus.
Granted that (8.13) has a minimum value of log
we refer to this problem as "finding the
(finding the minimum),
Certain cases of the problem are quite easy to solve in linear time. For example,
finding the minimum of
mentioned in connection with heapsort, if
by building a heap, which takes
3. If
Eventually we find that
As with quicksort, the function
worst case. For example, suppose we are looking for the first element, but by bad
luck, the pivot is always the highest available key. However, on the average,
Using techniques from the next chapter, we can show that the solution to (8.14) is
element, nor greater than the
most nine-tenths of the array, then this variant of
The trick to finding a good pivot is contained in the following two steps.
1. Divide the
This pivot is far from the extremes, provided that not too many records have the
pivot for key.
elements out of groups of 5 is greater than at least
the five of which it is the middle. If
we may check that the chosen pivot is less than or equal to at least
elements, so for
The term
from line (6), and
We shall show that
now appreciate that the "magic number" 5, the size of the groups in line (5), and our
make to avoid infinite loops when there is a sequence of equal elements?
b. Show that the modified quicksort has average-case running time
Here are eight integers: 1, 7, 3, 2, 0, 5, 0, 8. Sort them using (a)
bubblesort, (b) insertion sort, and (c) selection sort.
Here are sixteen integers: 22, 36, 6, 79, 26, 45, 75, 13, 31, 62, 27, 76, 33,
The procedure
b. Show that if
were swapped), then these two elements remain sorted in pass
Consider the following algorithm
integers: If the elements
Suppose we have a sorted array of strings
improved bound for lines (1 - 2) does not imply an improved bound for
whole; all the time is taken in lines (3 - 5).
† Note that a sequence ranging from 1 to 0 (or more generally, from
† But in this case, if log
several pragmatic reasons for doing so. First, since most algorithms are written in a
1. In the case that
the term 2
breaking of list
upper bound on the time taken by
Observe that (9.1) applies only when
only if
we may suppose that


To begin our proof, assume


From (9.2), with
We thus see that
these two constraints. For example, choose


In other words,
Two observations about Example 9.1 are in order. If we assume that
may succeed!
As we discussed,

Similarly, we could substitute


for any
of Recurrences
are the same). This expression is
terminology. The homogeneous solution is the exact solution when
driving function
On the other hand, the second term of (9.16) represents the cost of creating the
subproblems and combining their results. We call this term the
solution by more than
as the homogeneous solution, or grows faster by at most log
particular solution grows as log
It is important to recognize that when searching for improvements in an
However, for certain common functions
homogeneous solution, and again the particular solution exceeds the homogeneous.
those in Case (2).
Example 9.3.
In Equation (3),
particular solution is of the same order as
above about
Equation (2) we have
particular solution, and therefore

The homogeneous solution, if
we can easily show the homogeneous solution is less than
Example 9.5.

The homogeneous solution is easily seen to be
greater than the homogeneous, is also the value of
Write recurrence equations for the time and space complexity of the
following algorithm, assuming

Solve the following recurrences, where
Give tight big-oh and big-omega bounds on
Solve the following recurrences by guessing a solution and checking
Check your answers to Exercise 9.5 by solving the recurrences by
Generalize Exercise 9.6 by solving all recurrences of the form

in terms of
Suppose in Exercise 9.7 that
does the solution to
Show that the number of different orders in which to multiply a
sequence of
Show that
Show that the number of comparisons required to sort
mergesort is given by

where [

Show that the number of Boolean functions of
the recurrence

Solve for
Show that the number of binary trees of height

Show that
[1980] contain additional material on the solution of recurrences. Aho and Sloane
[1973] show that many nonlinear recurrences of the form
have a solution of the form
It should be emphasized, however, that there are problems, such as the NP-
Fig. 10.1.
make the only legal move not involving the smallest disk.
The above algorithm is concise, and correct, but it is hard to understand why it
works, and hard to invent on the spur of the moment. Consider instead the following
1 smallest disks from peg
that disk from
algorithm for multiplication of
school involves computing
if we count single bit or digit multiplications and additions as one step. One divide-
and-conquer approach to integer multiplication would break each of
Fig. 10.2.
The product of

If we evaluate
multiplications of (
(multiplication by 2
write the following recurrence for

Using reasoning like that in Example 9.4, we can take the constant
so the driving function
particular solutions are both
In the case that formula (10.1) is used to multiply integers, the asymptotic

whose solution is
Fig. 10.3. Note that lines (8)-(11) are performed by copying bits, and the
multiplication by 2
line (16) simply introduces the proper sign into the result.


attempted to teach it in elementary school students would not learn to multiply.
We have now simplified the problem. All that remains is to have lower-numbered
players play high-numbered players. This is easily accomplished by having players 1
be able to generalize the ideas of this figure to provide a schedule for 2
Fig. 10.4.
equal or nearly equal subproblems is a crucial factor in obtaining good performance.
Frequently, however, there are only a polynomial number of subproblems, and
thus we must be solving some subproblem many times. If instead we keep track of
previous chapter that
We have thus proven an exponential upper bound on the time taken by the
recursive computation of
in fact, we can show it is
more slowly than 2
recursive calculation of
The problem with the recursive calculation is that we wind up computing the same
Fig. 10.5.
Fig. 10.6.
that dominates the

approach. Since
programming to the recursive approach under essentially any circumstances.
lengths of the chords (
As well as being interesting in its own right, the triangulation problem has a
Fig. 10.8.
of the points defining the surface.
Each triangle defines a plane in a 3-space, and since a minimum triangulation was
found, the triangles are expected to be very small. It is easy to find the direction of a
Before proceeding with the dynamic programming solution to the triangulation
would bound a region that was not a triangle.
To begin searching for a minimum triangulation, we pick two adjacent vertices,
minimum triangulation, there must be a vertex
chords or edges in the triangulation. We must therefore consider how good a
Each choice of
polygons formed by one chord and the edges in the original polygon from one end of
Fig. 10.9.
Next, we must find minimum triangulations for the polygons of Fig. 10.9(a) and
sides, (
an exponential-time algorithm.
However, by considering the triangle that involves the chord (
have to consider polygons more than one of whose sides are chords of the original
polygon. Fact 2 tells us that, in the minimal triangulation, thechord in the
other vertices. For example, if we select
beginning at
must consider the following three options.
1. We may pick vertex
3. For some
Fig. 10.10.
If we use the obvious recursive algorithm implied by the above rules to solve
subproblem of size
subproblems of size three or less directly and count only calls on subproblems of size
four or more. Thus the number of subproblems to be solved is exponential in
solving subproblem

Fig. 10.11
The distances we need are calculated from the coordinates of the vertices as:

(since these are polygon edges, not chords, and are present "for free")

The three sums above are 38.09, 38.52, and 43.97, respectively. We may conclude
that the minimum cost of the subproblem
was smallest, we know that to achieve this minimum we must utilize the subproblems
There is a useful trick for filling out the table of Fig. 10.11 according to the
of the element being computed. The second pair is (a) next to the bottom of the
Fig. 10.12.
immediately give us the triangulation itself. What we need, for each entry, is the
the value of
Example 10.3.
entire problem of Fig. 10.8, comes from the terms for
The minimum cost for
be solved, but
assumed when
have size less than or equal to three and therefore cost 0. The chord (
introduced, with a cost of 15.65.
we expect that
then discover that
Fig. 10.14.
However, our "greedy" selection of
view. It turns out that the path
distance of 3 for
Fig. 10.15.
The TSP has a number of practical applications. As its name implies, it can be used
now have one path,
Fig. 10.16.
The leaves of the tree correspond to board positions where there is no move, either
First we assign values to the leaves. Say the game is tic-tac-toe. Then a leaf is
corresponds to a board position where it is player 1's move, then the value is the
Example 10.6.
that are wins for
bottom up is not feasible. Chess programs work, in essence, by building for each
termination. We wish to construct its game tree and evaluate the root. We could
The space to store the tree can be prohibitively large, but by being careful we need
never store more than one path, from the root to some node, at any one time. In Fig.

it cannot affect the value of
It is thus possible in the situation of Fig. 10.18, to skip consideration of the
children of
Fig. 10.18.
skipping or "pruning" nodes involves the notion of final and tentative values for
1. If all the children of a node
tentative value of
2. If a max node
Continuing our example,
proceeds to
to evaluate.
children into two groups -- those that have a particular edge and those that do not. For
In Fig. 10.20 we have chosen to consider the edges in lexicographic order (
Fig. 10.20.
of the two tour edges adjacent to node
For example, consider the TSP instance in Fig. 10.21. Unlike the instance in Fig.
10.15, the distance measure for edges is not Euclidean; that is, it bears no relation to
Fig. 10.21.
selected, even if its cost is lowest.
Thus, if we are constrained to include edge (
for node a are (
2. If including (
included does not raise the lower bound, but excluding it raises the bound to 18.5,
since the two cheapest legal edges for nodes
The next edge in lexicographic order is (
we can infer that neither (
Fig. 10.23(
The time taken by the algorithm of Example 10.11 on a graph of
depends on the number of times we need to improve the solution. Just testing that no
Fig. 10.23.
globally optimal
In practice, we may not find a globally optimal solution as suggested in Fig. 10.24,
since the number of locally optimal solutions may be enormous. However, we may at
Note that we cannot
Fig. 10.24.
To find a locally optimal tour, we start with a random tour, and consider all pairs of
nonadjacent edges, such as (
Example 10.12.
10.26(a). We might replace (
Fig. 10.25.
We can generalize 2-opting to
edges and reconnect the remaining pieces in any order so that result is a tour. Note
Fig. 10.27.
It is easy to check that, for fixed
we need to consider if there are
considerably higher than this, however, since we could make many local
verify this formula by computing the costs before and after the interchange
2. Take a package
Fig. 10.28.
and a similar recurrence for
We can recursively define the number of combinations of
a. Give a recursive function to compute
c. Give a dynamic programming algorithm to
table generally known as Pascal's triangle.
d. What is the running time of your answer to (
a function of
One way to compute the number of combinations of
is to calculate (
a. What is the worst case running time of this algorithm as a
function of
b. Is it possible to compute the "World Series Odds" function
perform this computation?
a. Rewrite the odds calculation of Fig. 10.7 to take into account
the fact that the first team has a probability
given game.
b. If the Dodgers have won one game and the Yankees two, but
the Dodgers have a .6 probability of winning any given game,
The odds calculation of Fig. 10.7 requires
algorithm to use only
Prove that Equation (10.4) results in exactly
Find a minimal triangulation for a regular octagon, assuming distances
separated by blanks whose ideal width is
Suppose we are given
binary search tree. Suppose that
find an element will be for
the average cost of a lookup is
the node holding
we can find a binary search tree that minimizes the lookup cost. Find a
dynamic programming algorithm to do so. What is the running time of
elements beginning with
For what values of coins does the greedy change-making algorithm of
a. Write the recursive triangulation algorithm
discussed in Section 10.2.
b. Show that the recursive algorithm results in
exactly 3
started on a problem of size s
Describe a greedy algorithm for
a. The one-dimensional package placement
b. The paragraphing problem (Exercise 10.11).
Give an example where your algorithm does not produce an optimal
Give a nonrecursive version of the tree search algorithm of Fig. 10.17.
Consider a game tree in which there are six marbles, and players 1 and
2 take turns picking from one to three marbles. The player who takes
†Note that we need not consider all
is immaterial. We may therefore consider only those permutations that begin with 1.
required for each child. For example, the reader should be able to modify the greedy
‡We could start with some heuristically found solution, say the greedy one, although
that would not affect this example. The greedy solution for Fig. 10.21 has cost 21.
†Do not be fooled by the picture of Fig. 10.25. True, if lengths of edges are distances
in the plane, then the dashed edges in Fig. 10.25 must be longer than those they
Algorithms for External Storage
represented by a root block pointing to 1024 blocks at an intermediate level, each of
which points to 1024 leaf blocks holding a part of the file, and so on.
The basic operation on files is to bring a single block to a
memory; a buffer is simply a reserved area of main memory whose size is the same
We can now see the rationale behind the rules for reading Pascal files. Each file
is stored in a sequence of blocks, with a whole number of records in each block.
Similarly, we can view the Pascal file-writing process as one of creating a file in
a buffer. As records are "written" into the file, they are placed in the buffer for that
times we read a block into main memory or write a block onto secondary storage.
Fig. 11.1.
The basic step of a merge sort on files is to begin with two files, say
organized into runs of length
2. at most one of
3. the one with a tail has at least as many runs as the other.
Then it is a simple process to read one run from each of
append the resulting run of length 2
organized into runs of length 2

or reached the end of the file of

from runs of length one. We shall save considerable time if we begin with a pass
For example, if we have a million records, it would take 20 passes through the
data to sort starting with runs of length one. If, however, we can hold 10,000 records
Fig. 11.3.
such as merging. We should therefore expect that if there is only one channel over
that we are done in the minimum amount of time. To see what can go wrong,
files are organized into runs of some length much larger than the size of a block, so
from file
in memory. There may not be space to hold all these blocks in main memory, and
To avoid these problems, we consider the keys of the last records in the last
exhausted, we naturally read next from the other file. However, if a run is not
the entire list will be sorted. As log
the number of times we read each record. Moreover, if
used for input files, and
1. In one pass, when runs from each of
file, when it becomes the output file, is filled with runs of a certain length. It
2. Each pass produces files of a different length. Since each of the files loaded
with runs on the previous
Such a merge-sorting process is called a
Example 11.2.
length 1. Records from
length 1 from
of length 3, and are placed on
previous pass.
The run length sequence: 1, 1, 2, 3, 5, 8, 13, 21, . . . . . is the
This sequence is generated by the recurrence relation
the "golden ratio" (
Fig. 11.4.
bottleneck for two reasons.
1. As we have seen, if we have many disk or tape units available, we may speed
input/output sufficiently that the time to do the merge exceeds input time or
beginning of each stage the
The conditions under which merging can be done in parallel with reading are
are always 4
the two runs, there must be
keys equal to or less than
Fig. 11.5.
The question of which file to read from next is trivial. Usually, since two buffers
second, in the range 1..
Fig. 11.6.
We shall maintain, of course, the property that allows us to merge in parallel
two files, as indicated in Fig. 11.6. We call a configuration that obeys the property
empty and
on the assumption that Fig. 11.6 is safe, show that the configuration will be safe after
Fig. 11.7.
To see that Fig. 11.7(a) is safe, consider two cases. First, if
newly-read block
equal to or less than min(
to use the file reading and writing primitives such as found in Pascal. In this
Fig. 11.8.
the record with key
Fig. 11.9.
binary search
key value
the first pair in that block. If
block), we use linear search to see if
on block
With binary search we need examine only [log
distribute the records to blocks, in that order. We may choose to pack as many as
Suppose we have a sorted file of records that are stored on blocks
from the file system. The new record is inserted in this new block, and the new block
for the new block in the index file.
With this organization we use the dense index to find the location in the main
file of a record with a given key. To insert a new record, we keep track of the last
If we used a binary search tree of
then it would require log
Fig. 11.10.
We can view a B-tree as a hierarchical index in which each node occupies a
The keys within a node are in sorted order so
subtree pointed to by
to by
subtree pointed to by
There are several ways to organize the leaves. Here we shall assume that the
main file is stored only in the leaves. Each leaf is assumed to occupy one block.
as its two children. This is the only situation in which a node may have fewer than
adjust the keys and pointers in
children of
11.12. Here, the block containing 10 is discarded. Its parent now has only two
Fig. 11.11.
Fig. 11.12.
Suppose we use a three-file polyphase sort, where at the
create a file with
on one of the files and none on the other two. Explain why each of the
following must be true
to be the lengths of runs on the two initially
the two initial files.
a Fibonacci sequence.
What additional condition must be added to those of Exercise 11.9 to
a. with initial runs of length one (i.e.,
b. running for
than one allowed.

Generalize Exercises 11.9 and 11.10 to polyphase sorts with more than
Show that:
a. Any external sorting algorithm that uses only one
tape as external storage must take
use as external storage.
Suppose we have an external file of directed arcs
directed acyclic graph. Assume that there is not enough space in
Write a program to find the
a. a sparse-indexed file
b. a B-tree file
Assume that it takes
node of an
milliseconds to process each node in internal memory. If there are
record. Therefore, the total time taken to find a given record in the tree
milliseconds. Make reasonable estimates for the values of
A B*-tree is a B-tree in which each interior node is at least 2/3 full
When the key of a record is a string of characters, we can save space
Suppose that the operations on a certain file are insertions and
creates a cycle in an existing component, add the edge and remove the
Suppose we have a file containing a sequence of positive and negative
of any such subsequence.
Secondary index selection, of which Exercise 11.21 is a simplification, is
discussed by Lum and Ling [1970] and Schkolnick [1975]. B-trees originated with
Information about Exercise 11.12, one- and two- tape sorting, can be found in
after the previous runs were read and the last 4b records from these runs are being
† This strategy is the simplest of a number of responses that can be made to the
situation where a block has to be split. Some other choices, providing higher average
shared by different objects that grow and shrink in arbitrary ways. For example, we
We have chosen to represent character strings by pointers to blocks of memory in
the heap. These blocks have their first 2 bytes (the number 2 is a typical value that
Fig. 12.1.
In the four examples above, we can see differences along at least two orthogonal
Fig. 12.2.
Fig. 12.3.
consuming garbage collection must be performed. This garbage collection step is

atomtype = { some appropriate type;


Fig. 12.4.
We assume there is an array of cells, taking up most of the memory, and some

To mark the cells accessible from
or not, by running down the array
{ If current cell was marked, do nothing. Otherwise, mark

Fig. 12.5.
memory be linked in a single chain extending from
Fortunately, there is a clever algorithm, known as the
Example 12.3.
we depth-first search this structure, we visit (1), (2), (3) and (4), in that order. Figure
Fig. 12.6.
There are three basic steps to performing the depth-first search. They are:
Fig. 12.7.

Fig. 12.8.

{ test if left field is atom or null pointer }

Fig. 12.9.
The reader should observe that in the program of Fig. 12.9, we always know
Fig. 12.11.
In order that empty blocks may be found when they are needed to hold new data,
1. Each block is sufficiently large to hold
a. A
computer system) of the block,
b. A pointer (to link the block in available space), plus
length, a full/empty bit with a value of 0, indicating emptiness of the block, a
pointer to the next available block, and empty space.
3. A block holding data has, from the left, a count, a full/empty bit with value 1
indicating the block is in use, and the data itself.
One interesting consequence of the above assumptions is that blocks must be
Fig. 12.12.
3. Keep the available list sorted by position. Then the empty block to the left is
found when we insert the newly emptied block onto the list, and we have only
As with the merger of a newly empty block with the block to its right, the first
2. Use a doubly linked available space list each time a block becomes unused,
Example 12.6.
Choosing a block in which to place the new data is not so easy, since there are
conflicting goals for such strategies. We desire, on one hand, to be able to quickly
the available list from the beginning until we come to a block of size
Fig. 12.14.
examine the entire available list to find that block of size at least
size is as little greater than
Some observations about these strategies can be made. Best-fit is considerably
not speed up first-fit appreciably, and in fact may slow it down if the statistics of
700, and the request for 400 came in, we would place it where it fit best, that is, in
On the other hand, there are situations where, starting with the list 600, 500, 1000,
700 again, best-fit would fail, while first-fit would succeed without storage
for a new datum, we choose an available block of that size

Difficulties arise when no empty blocks of the desired size
we find a block of size
see the way in which the choices of values for the

� Equation (12.1) applies when i
view the heap as a single block of size
which blocks of size
"buddies" of size 2
it is easy to find the buddy of a block of size 2
multiple of 2
Example 12.8.
Figure 12.15 shows the Fibonacci buddy system used in a heap of size 55, with
Fig. 12.15.
In a
case. If no blocks in either of these sizes exist, we may create one by applying this
splitting strategy recursively for size
There is a small catch, however. In a
Rather we must use the block whole, if no smaller block is available. This problem
acceptable since blocks of size 1 (byte or word, perhaps) could be too small to hold a
the case, we can combine the resulting block with its buddy, if that buddy is empty,
The exponential buddy system makes the locating of buddies especially easy. If
buddy of size
as the block of size
information can be maintained easily when we merge two empty buddies. The reader
may check that the information can be maintained when we split an empty block of
If we maintain all this information, and link the available lists in both directions,
Fig. 12.16.
There are some simple solutions to this problem if we allocate a little extra space

We follow each pointer to some block

to move block
which takes time proportional to the amount of the heap in use, will likely dominate
where pointer
Fig. 12.18.
i. allocate a block of 120 bytes
ii. allocate a block of 70 bytes
Give sequences of requests that can be satisfied if we use
a. first fit but not best fit
b. best fit but not first fit.
16 on a heap of size 16. If we request a block of size
we must allocate a block of size 2
portion of the block, if any, cannot be used to satisfy any other request.
we first find a block of size 2
block of size 2
and so on. If we find ourselves looking for a free block of size 32, we
fail and cannot satisfy the request. For the purposes of this question, we
than 16, such that the last request cannot be satisfied. For example,
consider the sequence 5, 5, 5. The first request causes the initial block
Find a sequence
Design a storage allocation scheme for a situation in which memory is
allocated and freed in blocks of lengths one and two. Give bounds on
pointer, a count and a full/empty bit if they are to be chained to an available list.
† Since empty blocks must hold pointers (and, as we shall see, other information as
well) we do not really start the sequence of permitted sizes at 1, but rather at some
† Of course, if there are no empty blocks of size
block of size
out of space and must reorganize the heap as in the next section.
‡ Incidentally, it is convenient to think of the blocks of sizes
block of size
† As in the previous section, we must assume that one bit of each block is reserved to
organization of information,"
Boruvka, O. [1926]. "On a minimal problem,"
Brooks, F. P. [1974].
Ninth Annual ACM Syrup. on Theory of Computing
Cheriton, D., and R. E. Tarjan [1976]. "Finding minimum spanning trees,"
Cocke, J., and F. E. Allen [1976]. "A program data flow analysis procedure,"
Coffman, E. G. (ed.) [1976].
and Sons, New York.
Comer, D. [1979]. "The ubiquitous B-tree,"
Cooley, J. M., and J. W. Tukey [1965]. "An algorithm for the machine calculation of
complex Fourier series,"
DBTG [1971].
Demers, A., and J. Donahue [1979]. "Revised report on RUSSELL," TR 79-389,
Dept. of Computer Science, Cornell Univ., Ithaca, N. Y.
Deo, N. [1975].
Deutsch, L. P., and D. G. Bobrow [1966]. "An efficient incremental automatic
garbage collector,"
Dijkstra, E. W. [1959]. "A note on two problems in connexion with graphs,"
Numerische Mathematik
Edmonds, J. [1965]. "Paths, trees, and flowers,"
Even, S., and O. Kariv [1975]. "An 0(
general graphs,"
Farber, D., R. E. Griswold, and I. Polonsky [1964]. "SNOBOL, a string manipulation
Fischer, M. J. [1972]. "Efficiency of equivalence algorithms," in
Computer Computations
Hall, Englewood Cliffs, N. J.
Greene, D. H., and D. E. Knuth [1983].
Birkhauser, Boston, Mass.
Gudes, E., and S. Tsur [1980]. "Experiments with B-tree reorganization,"
SIGMOD Symposium on Management of Data
traveling salesman problem,"
Lin, S., and B. W. Kernighan [1973]. "A heuristic algorithm for the traveling
salesman problem,"
Liskov, B., A. Snyder, R. Atkinson, and C. Scaffert [1977]. "Abstraction
mechanisms in CLU,"
Liu, C. L. [1968].
Lueker, G. S. [1980]. "Some techniques for solving recurrences,"
maximum matching in general graphs,"
Foundations of Computer Science
Moenck, R., and A. B. Borodin [1972]. "Fast modular transforms via division,"
IEEE Thirteenth Annual Symp. on Switching and Automata Theory
Morris, F. L. [1978]. "A time- and space-efficient garbage compaction algorithm,"
Comm. ACM
Morris, R. [1968]. "Scatter storage techniques,"
Moyles, D. M., and G. L. Thompson [1969]. "Finding a minimum equivalent graph
of a digraph,"
Nicholls, J. E. [1975].
Addison-Wesley, Reading, Mass.
Nievergelt, J. [1974]. "Binary search trees and file organization,"
Papadimitriou, C. H., and K. Steiglitz [1982].
Algorithms and Complexity
Parker, D. S. Jr. [1980]. "Conditions for optimality of the Huffman algorithm,"
J. Computing
Perl, Y., A. Itai, and H. Avni [1978]. "Interpolation search -- a log log
Comm. ACM
Hall, Englewood Cliffs, N. J.
Pratt, V. R. [1979].
ALPHARD: defining and specifying iteration and generators,"
Shell, D. L. [1959]. "A high-speed sorting procedure,"
Shell, D. L. [1971]. "Optimizing the polyphase sort,"
Weinberg, G. M. [1971].
N. Y.
Weiner, P. [1973]. "Linear pattern matching algorithms,"
Annual Symp. on Switching and Automata Theory
Wexelblat, R. L. (ed.) [1981].
N. Y.
Wiederhold, G. [1982].
Williams, J. W. J. [1964]. "Algorithm 232: Heapsort,"
Wirth, N. [1973].
Englewood Cliffs, N. J.
Wirth, N. [1976].
Englewood Cliffs, N. J.
Wulf, W. A., M. Shaw, P. Hilfinger, and L. Flon [1981].
Computer Science
Yao, A. C. [1975]. "An O(|E| log log |V|) algorithm for finding minimum spanning

Приложенные файлы

  • pdf 1146741
    Размер файла: 7 MB Загрузок: 0

Добавить комментарий