Solutions for Chapter 8: Sorting in Linear Time

Solution to Exercise 8.1-3

If the sort runs in linear time form input permutations, then the height h of the portion of the decision tree consisting of the m corresponding leaves and their ancestors is linear.

Use the same argument as in the proof of Theorem 8.1 to show that this is impos- sible form=n!/2,n!/n, orn!/2n.

We have 2h ≥ m, which gives us h ≥ lgm. For all the possible m’s given here, lgm=(nlgn), henceh =(nlgn). In particular, lgn! 2 = lgn!−1≥nlgn−nlge−1 lgn! n = lgn!−lgn ≥nlgn−nlge−lgn lgn! 2n = lgn!−n≥nlgn−nlge−n Solution to Exercise 8.1-4

LetSbe a sequence ofnelements divided inton/ksubsequences each of lengthk

where all of the elements in any subsequence are larger than all of the elements of a preceding subsequence and smaller than all of the elements of a succeeding subsequence.

Claim

Any comparison-based sorting algorithm to sorts must take(nlgk)time in the worst case.

Proof First notice that, as pointed out in the hint, we cannot prove the lower bound by multiplying together the lower bounds for sorting each subsequence. That would only prove that there is no faster algorithmthat sorts the subsequences independently. This was not what we are asked to prove; we cannot introduceany

8-10 Solutions for Chapter 8: Sorting in Linear Time

Now, consider the decision tree of height h for any comparison sort forS. Since the elements of each subsequence can be in any order, any of thek! permutations correspond to theÞnal sorted order of a subsequence. And, since there aren/ksuch subsequences, each of which can be in any order, there are (k!)n/k _permutations ofS that could correspond to the sorting of some input order. Thus, any decision tree for sorting Smust have at least(k!)n/k_{leaves. Since a binary tree of height} _h has no more than 2h _{leaves, we must have 2}h _≥ ₍_k_!₎n/k _or_h _≥ _lg₍₍_k_!₎n/k₎_{. We} therefore obtain

h ≥ lg((k!)n/k

= (n/k)lg(k!)

≥ (n/k)lg((k/2)k/2)

= (n/2)lg(k/2) .

The third line comes fromk! having itsk/2 largest terms being at leastk/2 each. (We implicitly assume here thatkis even. We could adjust withßoors and ceilings ifk were odd.)

Since there exists at least one path in any decision tree for sortingSthat has length at least(n/2)lg(k/2), the worst-case running time of any comparison-based sorting algorithm forSis(nlgk).

Solution to Exercise 8.2-3

[The following solution also answers Exercise 8.2-2.]

Notice that the correctness argument in the text does not depend on the order in which Ais processed. The algorithm is correct no matter what order is used! But the modiÞed algorithm is not stable. As before, in theÞnalforloop an element equal to one taken from A earlier is placed before the earlier one (i.e., at a lower index position) in the output arrray B. The original algorithm was stable because an element taken fromAlater started out with a lower index than one taken earlier. But in the modiÞed algorithm, an element taken from A later started out with a higher index than one taken earlier.

In particular, the algorithm still places the elements with value k in positions

C[k−1]+1 throughC[k], but in the reverse order of their appearance in A.

Solution to Exercise 8.2-4

Compute theC array as is done in counting sort. The number of integers in the range [a. .b] isC[b]−C[a−1], where we interpretC[−1] as 0.

Solutions for Chapter 8: Sorting in Linear Time 8-11

Solution to Exercise 8.3-2

Insertion sort is stable. When insertingA[j] into the sorted sequenceA[1. . .j−1], we do it the following way: compareA[j] toA[i], starting withi = j−1 and going down toi =1. Continue at long as A[j] < A[i].

Merge sort as deÞned is stable, because when two elements compared are equal, the tie is broken by taking the element from array L which keeps them in the original order.

Heapsort and quicksort are not stable.

One scheme that makes a sorting algorithm stable is to store the index of each element (the element’s place in the original ordering) with the element. When comparing two elements, compare them by their values and break ties by their indices.

Additional space requirements: Fornelements, their indices are 1. . .n. Each can be written in lgnbits, so together they take O(nlgn)additional space.

Additional time requirements: The worst case is when all elements are equal. The asymptotic time does not change because we add a constant amount of work to each comparison.

Solution to Exercise 8.3-3

Basis:Ifd =1, there’s only one digit, so sorting on that digit sorts the array.

Inductive step:Assuming that radix sort works ford−1 digits, we’ll show that it works ford digits.

Radix sort sorts separately on each digit, starting from digit 1. Thus, radix sort of

d digits, which sorts on digits 1, . . . ,d is equivalent to radix sort of the low-order

d−1 digits followed by a sort on digitd. By our induction hypothesis, the sort of the low-orderd−1 digits works, so just before the sort on digitd, the elements are in order according to their low-orderd−1 digits.

The sort on digit d will order the elements by their dth digit. Consider two elements,a andb, withdth digitsadandbdrespectively.

• _If_a_d <_b_{d, the sort will put}_a_before_b_{, which is correct, since}_a <_b_regardless

of the low-order digits.

• _If_a_d >_b_{d, the sort will put}_a _after_b_{, which is correct, since}_a >_b_regardless

of the low-order digits.

• _If_a_d =_b_{d, the sort will leave}_a _and_b_{in the same order they were in, because}

it is stable. But that order is already correct, since the correct order ofa andb

is determined by the low-orderd−1 digits when theirdth digits are equal, and the elements are already sorted by their low-orderd−1 digits.

If the intermediate sort were not stable, it might rearrange elements whose dth digits were equal—elements that were in the right order after the sort on their lower-order digits.

8-12 Solutions for Chapter 8: Sorting in Linear Time

Solution to Exercise 8.3-4

Treat the numbers as 2-digit numbers in radixn. Each digit ranges from 0 ton−1. Sort these 2-digit numbers with radix sort.

There are 2 calls to counting sort, each taking(n+n)=(n)time, so that the total time is(n).

Solution to Exercise 8.4-2

The worst-case running time for the bucket-sort algorithm occurs when the assumption of uniformly distributed input does not hold. If, for example, all the input ends up in theÞrst bucket, then in the insertion sort phase it needs to sort all the input, which takesO(n2₎_time.

A simple change that will preserve the linear expected running time and make the worst-case running time O(nlgn)is to use a worst-case O(nlgn)-time algorithm like merge sort instead of insertion sort when sorting the buckets.

Solution to Problem 8-1

a. For a comparison algorithm Ato sort, no two input permutations can reach the same leaf of the decision tree, so there must be at leastn! leaves reached inTA, one for each possible input permutation. Since Ais a deterministic algorithm, it must always reach the same leaf when given a particular permutation as input, so at mostn! leaves are reached (one for each permutation). Therefore exactly

n! leaves are reached, one for each input permutation.

These n! leaves will each have probability 1/n!, since each of then! possible permutations is the input with the probability 1/n!. Any remaining leaves will have probability 0, since they are not reached for any input.

Without loss of generality, we can assume for the rest of this problem that paths leading only to 0-probability leaves aren’t in the tree, since they cannot affect the running time of the sort. That is, we can assume thatTAconsists of only the

n! leaves labeled 1/n! and their ancestors.

b. If k > 1, then the root ofT is not a leaf. This implies that all of T’s leaves are leaves in LT and RT. Since every leaf at depth h inLT or RT has depth

h+1 inT,D(T)must be the sum ofD(LT),D(RT), andk, the total number of leaves. To prove this last assertion, let dT(x) =depth of node x in tree T. Then, D(T) = x∈leaves(T) dT(x) = x∈leaves(LT) dT(x)+ x∈leaves(RT) dT(x)

Solutions for Chapter 8: Sorting in Linear Time 8-13 = x∈leaves(LT) (dLT(x)+1)+ x∈leaves(RT) (dRT(x)+1) = x∈leaves(LT) dLT(x)+ x∈leaves(RT) dRT(x)+ x∈leaves(T) 1 = D(LT)+D(RT)+k.

c. To show thatd(k)=min1≤i≤k−1{d(i)+d(k−i)+k}we will show separately

that d(k)≤ min 1≤i≤k−1{d(i)+d(k−i)+k} and d(k)≥ min 1≤i≤k−1{d(i)+d(k−i)+k} .

• _{To show that}_d(_k)≤ _min₁_≤_i_≤_k₋₁{_d(_i)+_d(_k−_i)+_k}_{, we need only show}

thatd(k)≤d(i)+d(k−i)+k, fori =1,2, . . . ,k−1. For anyifrom 1 to

k−1 we canÞnd trees RT withileaves andLT withk−ileaves such that

D(RT)=d(i)andD(LT)=d(k−i). Construct T such that RT andLT

are the right and left subtrees ofT’s root respectively. Then

d(k) ≤ D(T) (by deÞnition ofdas min D(T)value)

= D(RT)+D(LT)+k (by part (b))

= d(i)+d(k−i)+k (by choice ofRT andLT) .

• _{To show that}_d(_k)≥ _min₁_≤_i_≤_k₋₁{_d(_i)+_d(_k−_i)+_k}_{, we need only show}

thatd(k) ≥ d(i)+d(k−i)+k, for somei in{1,2, . . . ,k−1}. Take the tree T with k leaves such that D(T) = d(k), let RT and LT be T’s right and left subtree, respecitvely, and letibe the number of leaves inRT. Then

k−i is the number of leaves inLT and

d(k) = D(T) (by choice ofT)

= D(RT)+D(LT)+k (by part (b))

≥ d(i)+d(k−i)+k (by deÞntion ofd as min D(T)value) . Neitherinork−i can be 0 (and hence 1≤i ≤k−1), since if one of these were 0, either RT or LT would contain all k leaves of T, and that k-leaf subtree would have a D equal to D(T)−k (by part (b)), contradicting the choice ofT as thek-leaf tree with the minimum D.

d. Let fk(i)=ilgi+(k−i)lg(k−i). ToÞnd the value ofi that minimizes fk,

Þnd theifor which the derivative of fk with respect toiis 0:

f_k(i) = d di ilni+(k−i)ln(k−i) ln 2 = lni+1−ln(k−i)−1 ln 2 = lni−ln(k−i) ln 2

is 0 ati = k/2. To verify this is indeed a minimum (not a maximum), check that the second derivative of fk is positive ati =k/2:

f_k(i) = d di lni−ln(k−i) ln 2

8-14 Solutions for Chapter 8: Sorting in Linear Time = 1 ln 2 1 i + 1 k−i . f_k(k/2) = 1 ln 2 2 k + 2 k = 1 ln 2· 4 k > 0 sincek >1 .

Now we use substitution to prove d(k) = (klgk). The base case of the induction is satisÞed becaused(1)≥0=c·1·lg 1 for any constantc. For the inductive step we assume thatd(i)≥cilgi for 1≤i ≤k−1, wherecis some constant to be determined. d(k) = min 1≤i≤k−1{d(i)+d(k−i)+k} ≥ min 1≤i≤k−1{c(ilgi+(k−i)lg(k−i))+k} = min 1≤i≤k−1{c fk(i)+k} = ck 2lg k 2 k−k 2 lgk−k 2 +k = cklgk 2 +k = c(klgk−k)+k = cklgk+(k−ck) ≥ cklgk ifc≤1, and sod(k)=(klgk).

e. Using the result of part (d) and the fact that TA(as modiÞed in our solution to part (a)) hasn! leaves, we can conclude that

D(TA)≥d(n!)=(n! lg(n!)) .

D(TA) is the sum of the decision-tree path lengths for sorting all input permutations, and the path lengths are proportional to the run time. Since the n! permutations have equal probability 1/n!, the expected time to sort nrandom elements (1 input permutation) is the total time for all permutations divided byn!:

(n! lg(n!))

n! =(lg(n!))=(nlgn) .

f. We will show how to modify a randomized decision tree (algorithm) to deÞne a deterministic decision tree (algorithm) that is at least as good as the randomized one in terms of the average number of comparisons.

At each randomized node, pick the child with the smallest subtree (the subtree with the smallest average number of comparisons on a path to a leaf). Delete all the other children of the randomized node and splice out the randomized node itself.

The deterministic algorithm corresponding to this modiÞed tree still works, because the randomized algorithm worked no matter which path was taken from each randomized node.

Solutions for Chapter 8: Sorting in Linear Time 8-15

The average number of comparisons for the modiÞed algorithm is no larger than the average number for the original randomized tree, since we discarded the higher-average subtrees in each case. In particular, each time we splice out a randomized node, we leave the overall average less than or equal to what it was, because

• _{the same set of input permutations reaches the modi}_Þ_{ed subtree as before, but}

those inputs are handled in less than or equal to average time than before, and

• _{the rest of the tree is unmodi}_Þ_ed.

The randomized algorithm thus takes at least as much time on average as the corresponding deterministic one. (We’ve shown that the expected running time for a deterministic comparison sort is(nlgn), hence the expected time for a randomized comparison sort is also(nlgn).)

Solution to Problem 8-3

a. The usual, unadorned radix sort algorithm will not solve this problem in the required time bound. The number of passes,d, would have to be the number of digits in the largest integer. Suppose that there arem integers; we always havem ≤n. In the worst case, we would have one integer withn/2 digits and

n/2 integers with one digit each. We assume that the range of a single digit is constant. Therefore, we would have d = n/2 and m = n/2+1, and so the running time would be(dm)=(n2₎_.

Let us assume without loss of generality that all the integers are positive and have no leading zeros. (If there are negative integers or 0, deal with the positive numbers, negative numbers, and 0 separately.) Under this assumption, we can observe that integers with more digits are always greater than integers with fewer digits. Thus, we can Þrst sort the integers by number of digits (using counting sort), and then use radix sort to sort each group of integers with the same length. Noting that each integer has between 1 andn digits, letmi be the number of integers withi digits, fori = 1,2, . . . ,n. Since there aren digits altogether, we haven_i₌₁i·mi =n.

It takes O(n)time to compute how many digits all the integers have and, once the numbers of digits have been computed, it takes O(m+n) = O(n)time to group the integers by number of digits. To sort the group withmi digits by radix sort takes(i·mi)time. The time to sort all groups, therefore, is

n i=1 (i·mi) = _n i=1 i·mi = (n) .

b. One way to solve this problem is by a radix sort from right to left. Since the strings have varying lengths, however, we have to pad out all strings that are shorter than the longest string. The padding is on the right end of the string, and it’s with a special character that is lexicographically less than any other character (e.g., in C, the character ’\0’with ASCII value 0). Of course, we

8-16 Solutions for Chapter 8: Sorting in Linear Time

don’t have to actually change any string; if we want to know the jth character of a string whose length isk, then if j >k, the jth character is the pad character. Unfortunately, this scheme does not always run in the required time bound. Suppose that there are m strings and that the longest string has d characters. In the worst case, one string has n/2 characters and, before padding, n/2 strings have one character each. As in part (a), we would have d = n/2 and

m =n/2+1. We still have to examine the pad characters in each pass of radix sort, even if we don’t actually create them in the strings. Assuming that the range of a single character is constant, the running time of radix sort would be (dm)=(n2₎_.

To solve the problem in O(n)time, we use the property that, if theÞrst letter of string x is lexicographically less that the Þrst letter of string y, then x is lexicographically less than y, regardless of the lengths of the two strings. We take advantage of this property by sorting the strings on the Þrst letter, using counting sort. We take an empty string as a special case and put it Þrst. We gather together all strings with the sameÞrst letter as a group. Then we recurse,

within each group, based on each string with theÞrst letter removed.

The correctness of this algorithm is straightforward. Analyzing the running time is a bit trickier. Let us count the number of times that each string is sorted by a call of counting sort. Suppose that theith string, si, has length li. Then

si is sorted by at mostli +1 counting sorts. (The “+1” is because it may have to be sorted as an empty string at some point; for example,abandaend up in the same group in theÞrst pass and are then ordered based onband the empty string in the second pass. The stringais sorted its length, 1, time plus one more time.) A call of counting sort on t strings takes (t)time (remembering that the number of different characters on which we are sorting is a constant.) Thus, the total time for all calls of counting sort is

O _m i=1 (li +1) = O _m i=1 li +m = O(n+m) = O(n) ,

where the second line follows fromm_i=₁li = n, and the last line is because

In document Introduction to Algorithms Cormen Solution pdf (Page 115-125)