*Чтобы посмотреть этот PDF файл с форматированием и разметкой, скачайте его и откройте на своем компьютере.*

Jersey

†If this were a real database system, the array would be kept in secondary storage.

However, this data structure would waste a great deal of space.

†Note that there are probably many more enrollment records than course or student

records, so shrinking enrollment records shrinks the total space requirement almost as

‡It would in practice be useful to place fields like grade or status (credit or audit) in

enrollment records, but our original problem statement does not require this.

†We must have some way of identifying the type of records; we shall discuss a way

to do this momentarily.

†We assume this and is a conditional and.

Fig. 5.9

2. Most nodes will have many fewer than 27 children.

3. A leaf reached by an edge labeled $ cannot have any children, and may as well

not be there.

elementtype =

created to right of

:=

{

nil

pback

lowback

lowback

pback

nil

lowback

child of

nil

pback

pback

lowback

Fig. 5.19.

Now we can write the procedure INSERT, which calls

Show the successive values of the various

LCS algorithm of Fig. 5.29 with first string

string

5.10

Suppose we use 2-3 trees to implement the MERGE and SPLIT

a. Show the result of splitting the tree of Exercise 5.7 at 6.

b. Merge the tree of Exercise 5.7 with the tree consisting of leaves

for elements 10 and 11.

5.11

Some of the structures discussed in this chapter can be modified easily

a. Binary search trees. The "" ordering applies to domain elements.

b. 2-3 trees. At interior nodes, place only the key field of domain

elements.

5.12

Show that in any subtree of a binary search tree, the minimum element is

5.13

To implement such a step, we need the notion of an

Fig. 6.15.

of

Fig. 6.16.

adjacency list version of

superior, at least for large sparse graphs.

of

To print out the intermediate vertices on the shortest path from vertex

j

Fig. 6.19.

and 0 otherwise. A is often called the

Example 6.10.

the digraph of Fig. 6.14.

The transitive closure can be computed using a procedure similar to

applying the following formula in the

j

1

This formula states that there is a path from

Fig. 6.20.

vertex numbered higher than

1. there is already a path from

higher than

2. there is a path from

- 1 and a path from

1.

As before

computation with only one copy of the

]

Fig. 6.21.

The

Example 6.11.

Fig. 6.22.

The eccentricities of the vertices are

vertex eccentricity

Thus the center is vertex

1. First apply the procedure

shortest paths matrix

2. Find the maximum cost in each column

vertex

3. Find a vertex with minimum eccentricity. This is the center of

This running time of this process is dominated by the first step, which takes

time. Step (2) takes

Example 6.12.

maximum value in each column is shown below.

Fig. 6.23.

depth- first search of an entire graph is

Fig. 6.24.

Example 6.13.

Assuming that

adjacency list of

Fig. 6.25.

search remains at

unvisited vertices. The arcs leading to new vertices are called

In addition to the tree arcs, there are three other types of arcs defined by a depth-

first search of a directed graph. These are called back arcs, forward arcs, and cross

arcs. An arc such as

its ancestors in the spanning forest. Note that an arc from a vertex to itself is a back

Arcs such as

neither an ancestor nor a descendant, are called

from left to right. This pattern is not accidental. Had the arc

G

on encountering arc

Fig. 6.26.

How do we distinguish among the four types of arcs? Clearly tree arcs are

special since they lead to unvisited vertices during the depth-first search. Suppose we

introduced in Section 3.1.

All descendants of a vertex

than or equal to the number assigned

dfnumber

forward arcs go from low-numbered to high-numbered vertices and back arcs go

All cross arcs go from high-numbered vertices to low-numbered vertices. To see

this, suppose that

before

Fig. 6.27.

Among other things, dags are useful in representing the syntactic structure of

Fig. 6.28.

Fig. 6.29.

Fig. 6.30.

Fig. 6.31.

dag so that if there is an arc from vertex

Topological sort can be easily accomplished by adding a print statement after

line (4) to the depth-first search procedure in Fig. 6.24:

The effect of calling

This technique works because there are no back arcs in a dag. Consider what

happens when depth-first search leaves a vertex

Fig. 6.32.

Fig. 6.33.

Note that every vertex of a directed graph

in the

We shall now present an algorithm to find the strongly connected

Fig. 6.34.

components of a given directed graph

1. Perform a depth-first search of

Performing the depth-first search on

shown in Fig. 6.37. We begin with

precisely to the vertices of a tree in the spanning forest of the second depth-first

Suppose that in the depth-first search of

reach either

Fig. 6.35.

Fig. 6.36.

Fig. 6.37.

both

Now suppose

forest of

than

We conclude that in the search of

v

b. linked adjacency lists, and

c. adjacency lists represented as in Fig. 6.5.

a. Use the algorithm

the other vertices.

b. Use the algorithm

Fig. 6.38.

Describe a mathematical model for the following scheduling problem.

Given tasks

graph theory. Some books treating graph algorithms are Deo [1975], Even [1980],

The single-source shortest paths algorithm in Section 6.3 is from Dijkstra [1959].

The all-pairs shortest paths algorithm is from Floyd [1962] and the transitive closure

The strong components algorithm of Section 6.7 is similar to one suggested by

R. Kosaraju in 1978 (unpublished), and to one published by Sharir [1981]. Tarjan

Coffman [1976] contains many examples of how digraphs can be used to model

scheduling problems as in Exercise 6.2. Aho, Garey, and Ullman [1972] show that

Fig. 7.1.

A

that is, a connected induced subgraph that is not itself a proper subgraph of any other

Example 7.2

namely itself. Figure 7.2 is a graph with two connected components.

Fig. 7.2.

A (simple)

cannot be distinct; eventually, we find

because there are no loops from a vertex to itself, and we cannot have

we entered and left vertex

v

only one edge incident, and therefore conclude that such a vertex

exists.

Now consider the graph

Fig. 7.3.

v

Example 7.4.

tree.

A typical application for minimum-cost spanning trees occurs in the design of

Fig. 7.4.

Thus,

Fig. 7.5.

step until

Fig. 7.7.

added to

Whenever we find another vertex

be

The time complexity of Prim's algorithm is

of the loop of lines (4)-(16) and each iteration of the loop takes

inner loops of lines (7)-(10) and (13)-(16). As

increasing cost. If the edge connects two vertices in two different connected

Example 7.5.

added to

2. FIND(

iteration of the while-loop, finding the least cost

Fig. 7.9.

edge in

time in the worst case. The total time required to perform the MERGE and FIND

become either

1.

versa, or

2.

other directly, but one called the other indirectly (i.e.,

Example 7.6.

To follow a few steps of the search, the procedure

edge (

Fig. 7.11.

As for depth-first search, we can build a spanning forest when we perform a

breadth-first search. In this case, we consider edge (

that is not a tree edge is a cross edge, that is, it connects two vertices neither of which

The breadth-first search algorithm given in Fig. 7.12 inserts the tree edges into a

Fig. 7.13.

first search. Each vertex visited is placed in the queue once, so the body of the while

loop is executed once for each vertex. Each edge (

is

typical, we shall usually refer to the running time of breadth-first search as

Depth-first search and breadth-first search can be used as frameworks around

Fig. 7.14

b. A vertex

if there is some child

this case,

Fig. 7.15.

After computing

with

The time taken by the above algorithm on a graph of

O

Thus, the total time is

We can represent this situation by a graph as in Fig. 7.16 where the vertices are

Fig. 7.16.

The matching problem can be formulated in general terms as follows. Given a

edges (1, 6), (3, 7), and (4, 8). The path 2, 6, 1, 8, 4, 9 in Fig. 7.17(b) is an

Fig. 7.17.

The key observation is that

Suppose

edges in

endpoint of at most one edge from

Fig. 7.18.

from

M

relative to

We can now outline our procedure to find a maximal matching

(

2. Find an augmenting path

3. Repeat step (2) until no further augmenting paths exist, at which point

maximal matching.

It remains only to show how to find an augmenting path relative to a matching

Fig. 7.19.

Suppose

Fig. 7.20.

Show that an

† Unless otherwise stated, we shall assume an edge is always a pair of distinct

vertices.

† Note that MERGE and FIND are defined slightly differently from Section 5.5, since

C

We shall use several criteria to evaluate the running time of an internal sorting

algorithm. The first and most common measure is the number of algorithm steps

programs:

{

Fig. 8.2.

Fig. 8.3.

keytype =

The bubblesort algorithm of Fig. 8.1 applied to the list of Fig. 8.3 sorts the list in

although according to the algorithm of Fig. 8.1, two additional passes are made.

Fig. 8.4.

Example 8.2.

passes of insertion sort for

Fig. 8.6.

record with the lowest key, among

select the smallest among

Thus lines (3) and (4) of Fig. 8.1 take at most

Hence for a fixed value of

Fig. 8.7.

Fig. 8.8.

takes at most

larger than

program takes at most

steps, where the term

bubblesort is

ever needed (i.e., the input happens to be already sorted), the test of line (3) is

executed

Next consider the insertion sort of Fig. 8.5. The while-loop of lines (4-6) of Fig.

8.5 cannot take more than

A

that the for-loop of lines (2-6) takes at most

sum is

The reader may check that if the array is initially sorted in reverse order, then we

actually go around the while-loop of lines (4-6)

be shown that this smaller bound holds in the average case as well.

Finally, consider selection sort in Fig. 8.7. We may check that the inner for-loop

of lines (4-7) takes

by the algorithm is

seen easily to be

the worst case and the average case as well.

algorithms where records are copied, will take far more time than any of the other

three algorithms take time proportional to

in the next sections. The value of

internal sorting, has been given the name "quicksort." The essence of quicksort is to

sort an array

Example 8.4.

sort the sequence of integers 3, 1, 4, 1, 5, 9, 2, 6, 5, 3. In each case, we have chosen

the left of the others. To do this task, we introduce two cursors,

Initially,

or to the right of

1.

over any keys greater than or equal to the pivot. Note that our

Fig. 8.9.

selection of the pivot by

2.

partitioned

3.

because one or the other of these will move past any given key), then

(1)

then begin

Fig. 8.12.

swap

The above loop is awkward, since the test that terminates it is in the middle. To

right or wrong; it doesn't matter, as we assume no particular order for the keys

{ sort elements

pivot

Fig. 8.14.

To see why that statement is true, we must use a trick that comes up frequently in

In our case, the "items" are the elements from

element all the time spent by

amount, independent of

There are also the final two unsuccessful tests at lines (4), (6), and (8), that may

not be charged to any "item," but these represent only a constant amount and may be

Fig. 8.15.

of the depths is

which is

n

Equation (8.1) says that the average time taken by quicksort is the time,

outside of the recursive calls, plus the average time spent in recursive calls. The latter

time is expressed in (8.1) as the sum, over all possible

Our first task is to transform (8.1) so that the sum is simplified, and in fact, so

(8.1) takes on the form it would have had if we had picked a truly random pivot at

By replacing half the left side of (8.2) by the right side, we see

Applying (8.3) to (8.1), with

Next, we can apply (8.3) to (8.4), with

If we pick

greater than zero. The third term of (8.7) makes a negative contribution, so from (8.7)

we may claim that

In Chapters 4 and 5 we presented several data structures, such as 2-3 trees,

A

property to the heap, we call

We now see how lines (4 - 6) of Fig. 8.16 are to be done. Selecting the minimum

at line (4) is easy; it is always in

Fig. 8.17.

Surely,

Hence, the number of iterations of the while-loop of

log(

Since

Fig. 8.18, (8.8) says that each call to

{ sorts array

Fig. 8.18.

lines (3 - 5) iterates

initially in

This code calculates where record

takes

Fig. 8.19.

We can now write a program to binsort arbitrary collections of records, where the

{ push

Fig. 8.20.

the proper data structure for bins is used. In particular, if

O

The loop of lines (1 - 2) of Fig. 8.20, which places the records in bins, takes

time, since the INSERT operation of line (2) requires constant time, the insertion

If pointers to the ends of lists do not exist, then in line (4) we must spend time

running down to the end of

Example 8.6.

1. We sort the

help at all, but it is essential. We use

For example, suppose

0

10, the bin for integer

shows the placement of our list into bins. Notice that integers appear in bins in the

Now, we concatenate the bins in order, producing the list

0, 1, 81, 64, 4, 25, 36, 16, 9, 49 (8.10)

from Fig. 8.21(a). If we use the linked list data structure with pointers to list ends,

then the placement of the

The integers on the list created by concatenating the bins are redistributed into

bins, but using a different bin selection strategy. Now place integer

For our example, Fig. 8.21(b) shows the list (8.10) distributed into bins, with

going into bin [

To see why this algorithm works, we have only to notice that when several

integers are placed in one bin, as 0, 1, 4, and 9 were placed in bin 0, they must be in

Fig. 8.21.

two-digit numbers in base

strategy works. Consider the integers

not possible, so we may assume

the second pass, so

keytype =

or an array of elements of the same type, as in

keytype =

We shall assume from here that keytype consists of

types

1900..1999. In (8.12),

to be sorted in this way, we never copy a record. We just move records from one list

{ sorts list

of types

procedure uses

of

clear bins }

(4)

where

r

highest

Fig. 8.22.

As before, for concatenation to be done quickly, we need pointers to the end of

number of different values of type

the loop of lines (6 - 7) takes

Example 8.7.

generalize Example 8.6 and view keys as base-

which, since

As another example, if keys are character strings of length

s

character strings is also

constant, or even

radix sort not be

log

which is

Comparisons

the parent plus either the fact

Fig. 8.23.

Now consider what happens if the initial data is such that we reach node (2). We

where it does not.

If the algorithm reaches the status of node (5), it will end, since it has already

moved

level 1, 4 nodes at level 2, and in general, 2

number of leaves of a tree with no nodes at levels higher than

way, a binary tree with

Fig. 8.24.

Figure 8.24(a) cannot be the smallest counterexample, because the tree rooted at

then the trees rooted at

That is, the average depth of leaves in

T

Since

The reader can check that when

show that (8.13) has a minimum when

We leave this proof as an exercise to the reader skilled in differential calculus.

Granted that (8.13) has a minimum value of log

k

we refer to this problem as "finding the

(finding the minimum),

and

Certain cases of the problem are quite easy to solve in linear time. For example,

finding the minimum of

mentioned in connection with heapsort, if

by building a heap, which takes

groups:

keys

3. If

select

Eventually we find that

As with quicksort, the function

worst case. For example, suppose we are looking for the first element, but by bad

luck, the pivot is always the highest available key. However, on the average,

Using techniques from the next chapter, we can show that the solution to (8.14) is

element, nor greater than the

most nine-tenths of the array, then this variant of

The trick to finding a good pivot is contained in the following two steps.

1. Divide the

This pivot is far from the extremes, provided that not too many records have the

pivot for key.

elements out of groups of 5 is greater than at least

exceeds

the five of which it is the middle. If

we may check that the chosen pivot is less than or equal to at least

elements, so for

(3)

The term

from line (6), and

We shall show that

now appreciate that the "magic number" 5, the size of the groups in line (5), and our

make to avoid infinite loops when there is a sequence of equal elements?

b. Show that the modified quicksort has average-case running time

8.1

Here are eight integers: 1, 7, 3, 2, 0, 5, 0, 8. Sort them using (a)

bubblesort, (b) insertion sort, and (c) selection sort.

8.2

Here are sixteen integers: 22, 36, 6, 79, 26, 45, 75, 13, 31, 62, 27, 76, 33,

8.3

The procedure

Shellsort

b. Show that if

were swapped), then these two elements remain sorted in pass

1.

Consider the following algorithm

integers: If the elements

Suppose we have a sorted array of strings

improved bound for lines (1 - 2) does not imply an improved bound for

whole; all the time is taken in lines (3 - 5).

† Note that a sequence ranging from 1 to 0 (or more generally, from

x

† But in this case, if log

Techniques

several pragmatic reasons for doing so. First, since most algorithms are written in a

1. In the case that

the term 2

breaking of list

upper bound on the time taken by

Observe that (9.1) applies only when

only if

we may suppose that

b

To begin our proof, assume

2

From (9.2), with

We thus see that

³

these two constraints. For example, choose

on

c

In other words,

Two observations about Example 9.1 are in order. If we assume that

O

may succeed!

As we discussed,

c

2

Similarly, we could substitute

relationship

2

for any

of Recurrences

equivalently

are the same). This expression is

mergesort

a

terminology. The homogeneous solution is the exact solution when

driving function

On the other hand, the second term of (9.16) represents the cost of creating the

subproblems and combining their results. We call this term the

solution by more than

as the homogeneous solution, or grows faster by at most log

particular solution grows as log

It is important to recognize that when searching for improvements in an

However, for certain common functions

homogeneous solution, and again the particular solution exceeds the homogeneous.

those in Case (2).

Example 9.3.

1.

2.

d

T

In Equation (3),

O

particular solution is of the same order as

above about

Equation (2) we have

particular solution, and therefore

The homogeneous solution, if

we can easily show the homogeneous solution is less than

O

increasing

and

of

Example 9.5.

The homogeneous solution is easily seen to be

log

greater than the homogeneous, is also the value of

Exercises

Write recurrence equations for the time and space complexity of the

following algorithm, assuming

Solve the following recurrences, where

satisfies:

a.

b.

Give tight big-oh and big-omega bounds on

a.

b.

c.

d.

Solve the following recurrences by guessing a solution and checking

a.

T

b.

T

9.6

Check your answers to Exercise 9.5 by solving the recurrences by

Generalize Exercise 9.6 by solving all recurrences of the form

in terms of

*9.8

Suppose in Exercise 9.7 that

does the solution to

Show that the number of different orders in which to multiply a

sequence of

Show that

Show that the number of comparisons required to sort

mergesort is given by

where [

integer

9.13

Show that the number of Boolean functions of

the recurrence

Solve for

**9.14

Show that the number of binary trees of height

recurrence

Show that

[1980] contain additional material on the solution of recurrences. Aho and Sloane

[1973] show that many nonlinear recurrences of the form

have a solution of the form

n

Techniques

It should be emphasized, however, that there are problems, such as the NP-

Fig. 10.1.

make the only legal move not involving the smallest disk.

The above algorithm is concise, and correct, but it is hard to understand why it

works, and hard to invent on the spur of the moment. Consider instead the following

1 smallest disks from peg

that disk from

algorithm for multiplication of

school involves computing

if we count single bit or digit multiplications and additions as one step. One divide-

and-conquer approach to integer multiplication would break each of

Fig. 10.2.

The product of

BD

If we evaluate

multiplications of (

(multiplication by 2

write the following recurrence for

multiply

Using reasoning like that in Example 9.4, we can take the constant

so the driving function

particular solutions are both

In the case that formula (10.1) is used to multiply integers, the asymptotic

whose solution is

3

Fig. 10.3. Note that lines (8)-(11) are performed by copying bits, and the

multiplication by 2

line (16) simply introduces the proper sign into the result.

{

2

attempted to teach it in elementary school students would not learn to multiply.

We have now simplified the problem. All that remains is to have lower-numbered

players play high-numbered players. This is easily accomplished by having players 1

be able to generalize the ideas of this figure to provide a schedule for 2

any

Fig. 10.4.

equal or nearly equal subproblems is a crucial factor in obtaining good performance.

Frequently, however, there are only a polynomial number of subproblems, and

thus we must be solving some subproblem many times. If instead we keep track of

previous chapter that

We have thus proven an exponential upper bound on the time taken by the

recursive computation of

i

in fact, we can show it is

more slowly than 2

recursive calculation of

The problem with the recursive calculation is that we wind up computing the same

P

Fig. 10.5.

Fig. 10.6.

that dominates the

approach. Since

programming to the recursive approach under essentially any circumstances.

lengths of the chords (

As well as being interesting in its own right, the triangulation problem has a

Fig. 10.8.

of the points defining the surface.

Each triangle defines a plane in a 3-space, and since a minimum triangulation was

found, the triangles are expected to be very small. It is easy to find the direction of a

Before proceeding with the dynamic programming solution to the triangulation

v

would bound a region that was not a triangle.

To begin searching for a minimum triangulation, we pick two adjacent vertices,

say

minimum triangulation, there must be a vertex

chords or edges in the triangulation. We must therefore consider how good a

Each choice of

polygons formed by one chord and the edges in the original polygon from one end of

Fig. 10.9.

Next, we must find minimum triangulations for the polygons of Fig. 10.9(a) and

sides, (

an exponential-time algorithm.

However, by considering the triangle that involves the chord (

have to consider polygons more than one of whose sides are chords of the original

polygon. Fact 2 tells us that, in the minimal triangulation, thechord in the

other vertices. For example, if we select

beginning at

(

must consider the following three options.

1. We may pick vertex

(

(

3. For some

Fig. 10.10.

If we use the obvious recursive algorithm implied by the above rules to solve

subproblem of size

subproblems of size three or less directly and count only calls on subproblems of size

four or more. Thus the number of subproblems to be solved is exponential in

solving subproblem

where

v

(

v

Fig. 10.11

The distances we need are calculated from the coordinates of the vertices as:

D

(since these are polygon edges, not chords, and are present "for free")

The three sums above are 38.09, 38.52, and 43.97, respectively. We may conclude

that the minimum cost of the subproblem

was smallest, we know that to achieve this minimum we must utilize the subproblems

v

There is a useful trick for filling out the table of Fig. 10.11 according to the

of the element being computed. The second pair is (a) next to the bottom of the

Fig. 10.12.

immediately give us the triangulation itself. What we need, for each entry, is the

because

and

the value of

Example 10.3.

entire problem of Fig. 10.8, comes from the terms for

problem

v

(

The minimum cost for

problem

and

be solved, but

v

assumed when

have size less than or equal to three and therefore cost 0. The chord (

introduced, with a cost of 15.65.

we expect that

then discover that

Fig. 10.14.

However, our "greedy" selection of

view. It turns out that the path

distance of 3 for

tours.

Fig. 10.15.

The TSP has a number of practical applications. As its name implies, it can be used

(

now have one path,

Fig. 10.16.

The leaves of the tree correspond to board positions where there is no move, either

First we assign values to the leaves. Say the game is tic-tac-toe. Then a leaf is

corresponds to a board position where it is player 1's move, then the value is the

Example 10.6.

that are wins for

bottom up is not feasible. Chess programs work, in essence, by building for each

termination. We wish to construct its game tree and evaluate the root. We could

The space to store the tree can be prohibitively large, but by being careful we need

never store more than one path, from the root to some node, at any one time. In Fig.

it cannot affect the value of

It is thus possible in the situation of Fig. 10.18, to skip consideration of the

children of

Fig. 10.18.

skipping or "pruning" nodes involves the notion of final and tentative values for

1. If all the children of a node

tentative value of

2. If a max node

Continuing our example,

proceeds to

to evaluate.

children into two groups -- those that have a particular edge and those that do not. For

In Fig. 10.20 we have chosen to consider the edges in lexicographic order (

(

Fig. 10.20.

of the two tour edges adjacent to node

nodes

For example, consider the TSP instance in Fig. 10.21. Unlike the instance in Fig.

10.15, the distance measure for edges is not Euclidean; that is, it bears no relation to

Fig. 10.21.

selected, even if its cost is lowest.

Thus, if we are constrained to include edge (

for node a are (

2. If including (

included does not raise the lower bound, but excluding it raises the bound to 18.5,

since the two cheapest legal edges for nodes

The next edge in lexicographic order is (

we can infer that neither (

Fig. 10.23(

The time taken by the algorithm of Example 10.11 on a graph of

depends on the number of times we need to improve the solution. Just testing that no

Fig. 10.23.

globally optimal

In practice, we may not find a globally optimal solution as suggested in Fig. 10.24,

since the number of locally optimal solutions may be enormous. However, we may at

from

Note that we cannot

Fig. 10.24.

connect

To find a locally optimal tour, we start with a random tour, and consider all pairs of

nonadjacent edges, such as (

Example 10.12.

10.26(a). We might replace (

Fig. 10.25.

We can generalize 2-opting to

edges and reconnect the remaining pieces in any order so that result is a tour. Note

Fig. 10.27.

It is easy to check that, for fixed

we need to consider if there are

n

considerably higher than this, however, since we could make many local

L

verify this formula by computing the costs before and after the interchange

2. Take a package

Fig. 10.28.

L

L

and a similar recurrence for

We can recursively define the number of combinations of

of

a. Give a recursive function to compute

of

c. Give a dynamic programming algorithm to

compute

table generally known as Pascal's triangle.

d. What is the running time of your answer to (

a function of

10.6

One way to compute the number of combinations of

is to calculate (

a. What is the worst case running time of this algorithm as a

function of

b. Is it possible to compute the "World Series Odds" function

j

perform this computation?

a. Rewrite the odds calculation of Fig. 10.7 to take into account

the fact that the first team has a probability

given game.

b. If the Dodgers have won one game and the Yankees two, but

the Dodgers have a .6 probability of winning any given game,

10.8

The odds calculation of Fig. 10.7 requires

algorithm to use only

*10.9

Prove that Equation (10.4) results in exactly

10.10

Find a minimal triangulation for a regular octagon, assuming distances

10.11

The

l

separated by blanks whose ideal width is

Suppose we are given

x

binary search tree. Suppose that

find an element will be for

the average cost of a lookup is

the node holding

we can find a binary search tree that minimizes the lookup cost. Find a

dynamic programming algorithm to do so. What is the running time of

elements beginning with

**10.13

For what values of coins does the greedy change-making algorithm of

10.14

a. Write the recursive triangulation algorithm

discussed in Section 10.2.

b. Show that the recursive algorithm results in

exactly 3

started on a problem of size s

10.15

Describe a greedy algorithm for

a. The one-dimensional package placement

problem.

b. The paragraphing problem (Exercise 10.11).

Give an example where your algorithm does not produce an optimal

10.16

Give a nonrecursive version of the tree search algorithm of Fig. 10.17.

Consider a game tree in which there are six marbles, and players 1 and

2 take turns picking from one to three marbles. The player who takes

†Note that we need not consider all

is immaterial. We may therefore consider only those permutations that begin with 1.

required for each child. For example, the reader should be able to modify the greedy

‡We could start with some heuristically found solution, say the greedy one, although

that would not affect this example. The greedy solution for Fig. 10.21 has cost 21.

†Do not be fooled by the picture of Fig. 10.25. True, if lengths of edges are distances

in the plane, then the dashed edges in Fig. 10.25 must be longer than those they

Algorithms for External Storage

represented by a root block pointing to 1024 blocks at an intermediate level, each of

which points to 1024 leaf blocks holding a part of the file, and so on.

The basic operation on files is to bring a single block to a

memory; a buffer is simply a reserved area of main memory whose size is the same

We can now see the rationale behind the rules for reading Pascal files. Each file

is stored in a sequence of blocks, with a whole number of records in each block.

Similarly, we can view the Pascal file-writing process as one of creating a file in

a buffer. As records are "written" into the file, they are placed in the buffer for that

times we read a block into main memory or write a block onto secondary storage.

Fig. 11.1.

The basic step of a merge sort on files is to begin with two files, say

organized into runs of length

2. at most one of

3. the one with a tail has at least as many runs as the other.

Then it is a simple process to read one run from each of

append the resulting run of length 2

organized into runs of length 2

or reached the end of the file of

winner

from runs of length one. We shall save considerable time if we begin with a pass

For example, if we have a million records, it would take 20 passes through the

data to sort starting with runs of length one. If, however, we can hold 10,000 records

Fig. 11.3.

such as merging. We should therefore expect that if there is only one channel over

that we are done in the minimum amount of time. To see what can go wrong,

files are organized into runs of some length much larger than the size of a block, so

from file

in memory. There may not be space to hold all these blocks in main memory, and

To avoid these problems, we consider the keys of the last records in the last

exhausted, we naturally read next from the other file. However, if a run is not

the entire list will be sorted. As log

the number of times we read each record. Moreover, if

used for input files, and

1. In one pass, when runs from each of

1

file, when it becomes the output file, is filled with runs of a certain length. It

2. Each pass produces files of a different length. Since each of the files loaded

with runs on the previous

Such a merge-sorting process is called a

Example 11.2.

length 1. Records from

f

length 1 from

of length 3, and are placed on

previous pass.

The run length sequence: 1, 1, 2, 3, 5, 8, 13, 21, . . . . . is the

This sequence is generated by the recurrence relation

2

the "golden ratio" (

Fig. 11.4.

bottleneck for two reasons.

1. As we have seen, if we have many disk or tape units available, we may speed

input/output sufficiently that the time to do the merge exceeds input time or

beginning of each stage the

The conditions under which merging can be done in parallel with reading are

are always 4

the two runs, there must be

keys equal to or less than

min(

Fig. 11.5.

The question of which file to read from next is trivial. Usually, since two buffers

second, in the range 1..

Fig. 11.6.

We shall maintain, of course, the property that allows us to merge in parallel

two files, as indicated in Fig. 11.6. We call a configuration that obeys the property

empty and

on the assumption that Fig. 11.6 is safe, show that the configuration will be safe after

Fig. 11.7.

To see that Fig. 11.7(a) is safe, consider two cases. First, if

newly-read block

equal to or less than min(

k

and

to use the file reading and writing primitives such as found in Pascal. In this

Fig. 11.8.

the record with key

Fig. 11.9.

binary search

key value

the first pair in that block. If

If

block), we use linear search to see if

on block

With binary search we need examine only [log

distribute the records to blocks, in that order. We may choose to pack as many as

Suppose we have a sorted file of records that are stored on blocks

,

from the file system. The new record is inserted in this new block, and the new block

for the new block in the index file.

With this organization we use the dense index to find the location in the main

file of a record with a given key. To insert a new record, we keep track of the last

If we used a binary search tree of

then it would require log

Fig. 11.10.

We can view a B-tree as a hierarchical index in which each node occupies a

(

The keys within a node are in sorted order so

subtree pointed to by

to by

subtree pointed to by

There are several ways to organize the leaves. Here we shall assume that the

main file is stored only in the leaves. Each leaf is assumed to occupy one block.

as its two children. This is the only situation in which a node may have fewer than

adjust the keys and pointers in

children of

11.12. Here, the block containing 10 is discarded. Its parent now has only two

Fig. 11.11.

Fig. 11.12.

Suppose we use a three-file polyphase sort, where at the

create a file with

on one of the files and none on the other two. Explain why each of the

following must be true

a.

to be the lengths of runs on the two initially

b.

³

the two initial files.

c.

a Fibonacci sequence.

*11.10

What additional condition must be added to those of Exercise 11.9 to

a. with initial runs of length one (i.e.,

b. running for

than one allowed.

1

**11.11

Generalize Exercises 11.9 and 11.10 to polyphase sorts with more than

**11.12

Show that:

a. Any external sorting algorithm that uses only one

tape as external storage must take

sort

b.

use as external storage.

Suppose we have an external file of directed arcs

directed acyclic graph. Assume that there is not enough space in

Write a program to find the

a. a sparse-indexed file

b. a B-tree file

11.18

Assume that it takes

node of an

milliseconds to process each node in internal memory. If there are

record. Therefore, the total time taken to find a given record in the tree

(log

milliseconds. Make reasonable estimates for the values of

*11.19

A B*-tree is a B-tree in which each interior node is at least 2/3 full

*11.20

When the key of a record is a string of characters, we can save space

Suppose that the operations on a certain file are insertions and

creates a cycle in an existing component, add the edge and remove the

11.24

Suppose we have a file containing a sequence of positive and negative

subsequence

of any such subsequence.

Secondary index selection, of which Exercise 11.21 is a simplification, is

discussed by Lum and Ling [1970] and Schkolnick [1975]. B-trees originated with

Information about Exercise 11.12, one- and two- tape sorting, can be found in

after the previous runs were read and the last 4b records from these runs are being

† This strategy is the simplest of a number of responses that can be made to the

situation where a block has to be split. Some other choices, providing higher average

shared by different objects that grow and shrink in arbitrary ways. For example, we

atom

We have chosen to represent character strings by pointers to blocks of memory in

the heap. These blocks have their first 2 bytes (the number 2 is a typical value that

Fig. 12.1.

In the four examples above, we can see differences along at least two orthogonal

Fig. 12.2.

Fig. 12.3.

consuming garbage collection must be performed. This garbage collection step is

atomtype = { some appropriate type;

atomtype);

Fig. 12.4.

We assume there is an array of cells, taking up most of the memory, and some

To mark the cells accessible from

or not, by running down the array

{ If current cell was marked, do nothing. Otherwise, mark

(3)

Fig. 12.5.

memory be linked in a single chain extending from

Fortunately, there is a clever algorithm, known as the

Algorithm

back

Example 12.3.

we depth-first search this structure, we visit (1), (2), (3) and (4), in that order. Figure

Fig. 12.6.

There are three basic steps to performing the depth-first search. They are:

1.

Fig. 12.7.

Fig. 12.8.

{ test if left field is atom or null pointer }

Fig. 12.9.

The reader should observe that in the program of Fig. 12.9, we always know

Fig. 12.11.

In order that empty blocks may be found when they are needed to hold new data,

1. Each block is sufficiently large to hold

a. A

computer system) of the block,

b. A pointer (to link the block in available space), plus

length, a full/empty bit with a value of 0, indicating emptiness of the block, a

pointer to the next available block, and empty space.

3. A block holding data has, from the left, a count, a full/empty bit with value 1

indicating the block is in use, and the data itself.

One interesting consequence of the above assumptions is that blocks must be

Fig. 12.12.

3. Keep the available list sorted by position. Then the empty block to the left is

found when we insert the newly emptied block onto the list, and we have only

As with the merger of a newly empty block with the block to its right, the first

2. Use a doubly linked available space list each time a block becomes unused,

q

Example 12.6.

Choosing a block in which to place the new data is not so easy, since there are

conflicting goals for such strategies. We desire, on one hand, to be able to quickly

the available list from the beginning until we come to a block of size

Fig. 12.14.

c

2.

examine the entire available list to find that block of size at least

size is as little greater than

Some observations about these strategies can be made. Best-fit is considerably

not speed up first-fit appreciably, and in fact may slow it down if the statistics of

700, and the request for 400 came in, we would place it where it fit best, that is, in

On the other hand, there are situations where, starting with the list 600, 500, 1000,

700 again, best-fit would fail, while first-fit would succeed without storage

for a new datum, we choose an available block of that size

Difficulties arise when no empty blocks of the desired size

we find a block of size

s

see the way in which the choices of values for the

s

� Equation (12.1) applies when i

view the heap as a single block of size

which blocks of size

"buddies" of size 2

it is easy to find the buddy of a block of size 2

2

multiple of 2

Example 12.8.

Figure 12.15 shows the Fibonacci buddy system used in a heap of size 55, with

Fig. 12.15.

size

In a

size

case. If no blocks in either of these sizes exist, we may create one by applying this

splitting strategy recursively for size

There is a small catch, however. In a

size

Rather we must use the block whole, if no smaller block is available. This problem

acceptable since blocks of size 1 (byte or word, perhaps) could be too small to hold a

the case, we can combine the resulting block with its buddy, if that buddy is empty,

The exponential buddy system makes the locating of buddies especially easy. If

buddy of size

as the block of size

information can be maintained easily when we merge two empty buddies. The reader

may check that the information can be maintained when we split an empty block of

If we maintain all this information, and link the available lists in both directions,

Fig. 12.16.

There are some simple solutions to this problem if we allocate a little extra space

(1)

(5)

We follow each pointer to some block

to move block

which takes time proportional to the amount of the heap in use, will likely dominate

where pointer

Fig. 12.18.

i. allocate a block of 120 bytes

ii. allocate a block of 70 bytes

Give sequences of requests that can be satisfied if we use

a. first fit but not best fit

b. best fit but not first fit.

16 on a heap of size 16. If we request a block of size

we must allocate a block of size 2

portion of the block, if any, cannot be used to satisfy any other request.

we first find a block of size 2

block of size 2

and so on. If we find ourselves looking for a free block of size 32, we

fail and cannot satisfy the request. For the purposes of this question, we

than 16, such that the last request cannot be satisfied. For example,

consider the sequence 5, 5, 5. The first request causes the initial block

Find a sequence

Design a storage allocation scheme for a situation in which memory is

allocated and freed in blocks of lengths one and two. Give bounds on

pointer, a count and a full/empty bit if they are to be chained to an available list.

† Since empty blocks must hold pointers (and, as we shall see, other information as

well) we do not really start the sequence of permitted sizes at 1, but rather at some

† Of course, if there are no empty blocks of size

block of size

out of space and must reorganize the heap as in the next section.

‡ Incidentally, it is convenient to think of the blocks of sizes

block of size

† As in the previous section, we must assume that one bit of each block is reserved to

organization of information,"

Boruvka, O. [1926]. "On a minimal problem,"

Spolecnosti

Brooks, F. P. [1974].

Ninth Annual ACM Syrup. on Theory of Computing

Cheriton, D., and R. E. Tarjan [1976]. "Finding minimum spanning trees,"

Computing

Cocke, J., and F. E. Allen [1976]. "A program data flow analysis procedure,"

ACM

Coffman, E. G. (ed.) [1976].

and Sons, New York.

Comer, D. [1979]. "The ubiquitous B-tree,"

Cooley, J. M., and J. W. Tukey [1965]. "An algorithm for the machine calculation of

complex Fourier series,"

DBTG [1971].

York.

Demers, A., and J. Donahue [1979]. "Revised report on RUSSELL," TR 79-389,

Dept. of Computer Science, Cornell Univ., Ithaca, N. Y.

Deo, N. [1975].

Science

Deutsch, L. P., and D. G. Bobrow [1966]. "An efficient incremental automatic

garbage collector,"

Dijkstra, E. W. [1959]. "A note on two problems in connexion with graphs,"

Numerische Mathematik

Edmonds, J. [1965]. "Paths, trees, and flowers,"

Even, S., and O. Kariv [1975]. "An 0(

general graphs,"

Science

Farber, D., R. E. Griswold, and I. Polonsky [1964]. "SNOBOL, a string manipulation

language,"

Fischer, M. J. [1972]. "Efficiency of equivalence algorithms," in

Computer Computations

Hall, Englewood Cliffs, N. J.

Greene, D. H., and D. E. Knuth [1983].

Birkhauser, Boston, Mass.

Gudes, E., and S. Tsur [1980]. "Experiments with B-tree reorganization,"

SIGMOD Symposium on Management of Data

traveling salesman problem,"

Lin, S., and B. W. Kernighan [1973]. "A heuristic algorithm for the traveling

salesman problem,"

Liskov, B., A. Snyder, R. Atkinson, and C. Scaffert [1977]. "Abstraction

mechanisms in CLU,"

Liu, C. L. [1968].

Lueker, G. S. [1980]. "Some techniques for solving recurrences,"

Surveys

maximum matching in general graphs,"

Foundations of Computer Science

Moenck, R., and A. B. Borodin [1972]. "Fast modular transforms via division,"

IEEE Thirteenth Annual Symp. on Switching and Automata Theory

Morris, F. L. [1978]. "A time- and space-efficient garbage compaction algorithm,"

Comm. ACM

Morris, R. [1968]. "Scatter storage techniques,"

Moyles, D. M., and G. L. Thompson [1969]. "Finding a minimum equivalent graph

of a digraph,"

Nicholls, J. E. [1975].

Addison-Wesley, Reading, Mass.

Nievergelt, J. [1974]. "Binary search trees and file organization,"

6:

Papadimitriou, C. H., and K. Steiglitz [1982].

Algorithms and Complexity

Parker, D. S. Jr. [1980]. "Conditions for optimality of the Huffman algorithm,"

J. Computing

Perl, Y., A. Itai, and H. Avni [1978]. "Interpolation search -- a log log

Comm. ACM

Hall, Englewood Cliffs, N. J.

Pratt, V. R. [1979].

ALPHARD: defining and specifying iteration and generators,"

Shell, D. L. [1959]. "A high-speed sorting procedure,"

Shell, D. L. [1971]. "Optimizing the polyphase sort,"

719.

Weinberg, G. M. [1971].

N. Y.

Weiner, P. [1973]. "Linear pattern matching algorithms,"

Annual Symp. on Switching and Automata Theory

Wexelblat, R. L. (ed.) [1981].

N. Y.

Wiederhold, G. [1982].

Williams, J. W. J. [1964]. "Algorithm 232: Heapsort,"

348.

Wirth, N. [1973].

Englewood Cliffs, N. J.

Wirth, N. [1976].

Englewood Cliffs, N. J.

Wulf, W. A., M. Shaw, P. Hilfinger, and L. Flon [1981].

Computer Science

Yao, A. C. [1975]. "An O(|E| log log |V|) algorithm for finding minimum spanning

trees,"