• No results found

4 A JAVA QUICKSORT

is a function nvcmp that compares two Nameval items by calling strcmp on their string components, ignoring their values:

nvcmp: compare two Nameval names i n t v o i d ava, v o i d

Nameval ab; a = (Nameval va; b = (Nameval vb: r e t u r n

This is analogous to scmp but differs because the strings are stored as members of a structure.

The clumsiness of providing the key means that bsearch provides less leverage than qsort. A good general-purpose sort routine takes a page or two of code, while binary search is not much longer than the code it takes to interface to bsearch. Nev- ertheless, it's a good idea to use bsearch instead of writing your own. Over the years, binary search has proven surprisingly hard for programmers to get right.

The standard C++ library has a generic algorithm called s o r t that guarantees behavior. The code is easier because it needs no casts or element sizes. and it does not require an explicit comparison function for types that have an order rela- tion.

i n t

The C++ library also has generic binary search routines, with similar notational advantages.

Exercise 2-1. Quicksort is most naturally expressed recursively. Write it iteratively and compare the two versions. (Hoare describes how hard it was to work out quick- sort iteratively, and how neatly it fell into place when he did it recursively.)

2.4 A Java Quicksort

The situation in Java is different. Early releases had no standard sort function, so we needed to write our own. More recent versions do provide a s o r t function. how- ever, which operates on classes that implement the Comparable interface, so we can now ask the library to sort for us. But since the techniques are useful in other situa- tions, in this section we will work through the details of implementing quicksort in Java.

38 ALGORITHMS AND DATA STRUCTURES CHAPTER

It's easy to adapt a quicksort for each type we might want to sort. but it is more instructive to write a generic sort that can be called for any kind of object. more in the style of the q s o r t interface.

One big difference from C or is that in Java it is not possible to pass a com- parison function to another function; there are no function pointers. Instead we create an whose sole content is a function that compares two Objects. For each data type to be sorted, we then create a class with a member function that implements the interface for that data type. We pass an instance of that class to the sort function, which in uses the comparison function within the class to compare elements.

We begin by defining an interface named that declares a single member, a comparison function cmp that compares two Objects:

i n t e r f a c e

i n t x , Object y ) ;

Then we can write comparison functions that implement this interface; for example, this class defines a function that compares Integers:

Icmp : Integer c l a s s public i n t Object i n t i = ((Integer) i n t i 2 = ((Integer) i f ( i l return - 1; e l s e i f ( i l == return e l s e return

and this compares S t r i ngs: Scmp: S t r i n g c l a s s Scmp public i n t Object S t r i n g s l = (String) S t r i n g = (String) return

1

We can sort only types that are derived from Object with this mechanism; it cannot be applied to the basic types like i n t or double. This is why we sort Integers rather than i n ts.

SECTION 2.4 A JAVA QUICKSORT

With these components, we can now translate the C q u i c k s o r t function into Java and have i t call the comparison function a object passed i n as an argument. The most significant change is the use o f indices 1 e f t and r i ght. since Java does not have pointers into arrays.

Q u i c k s o r t . s o r t : q u i c k s o r t

.

s t a t i c v o i d v, i n t l e f t , i n t r i g h t , Cmp i n t i, l a s t ; i f ( l e f t r i g h t ) nothing t o do r e t u r n ; l e f t , r i g h t ) ) ; p i v o t l a s t = l e f t ; f o r (i = i r i g h t ; p a r t i t i o n i f 0) + + l a s t , l e f t , l a s t ) ; r e s t o r e p i v o t l e f t , l a s t - 1 , r e c u r s i v e l y s o r t r i g h t , each p a r t

1

Q u i c k s o r t

.

s o r t uses cmp to compare a pair o f objects, and calls swap as before to interchange them. swap and s t a t i c v o i d v, i n t i n t Object = = =

Random number generation is done by a function that produces a random integer i n the range e f t to r i g h t inclusive:

s t a t i c rgen = new

Q u i c k s o r t . rand: r e t u r n i n t e g e r i n [ l e f t , r i g h t ] s t a t i c i n t l e f t , i n t r i g h t )

r e t u r n e f t + Math

1

We compute the absolute value, using Math. abs, because Java's random number gen- erator returns negative integers as well as positive.

The functions sort, swap, and rand, and the generator object rgen are the bers o f a class Qui cksort.

s a r r = new

f i l l n of s a r r . .

.

0 , new

This calls s o r t with a string-comparison object created for the occasion.

CHAPTER

Exercise 2-2. Our Java quicksort does a fair amount of type conversion as items are cast from their original type (like Integer) to Object and back again. Experiment with a version of cksort. s o r t that uses the specific type being sorted, to estimate what penalty is incurred by type conversions.

We've described the amount of work to be done by a particular algorithm in terms of n, the number of elements in the input. Searching unsorted data can take time pro- portional to n; if we use binary search on sorted data, the time will be proportional to logn. Sorting times might be proportional to n2 or nlogn.

We need a way to make such statements more precise, while at the same time abstracting away details like the CPU speed and the quality of the compiler (and the programmer). We want to compare running times and space requirements of algo- rithms independently of programming language, compiler, machine architecture, pro- cessor speed, system load, and other complicating factors.

There is a standard notation for this idea, called "0-notation." Its basic parame- ter is n, the size of a problem instance, and the complexity or running time is expressed as a function of n. The is for order, as in "Binary search is

it takes on the order of logn steps to search an array of n items." The notation means that. once n gets large, the running time is proportional to at most for example, or Asymptotic estimates like this are valuable for theoretical analyses and very helpful for gross comparisons of algorithms, but details may make a difference in practice. For example, a low-overhead algo- rithm may run faster than a high-overhead algorithm for small values of but inevitably, if n gets large enough, the algorithm with the slower-growing func- tional behavior will be faster.

We must also distinguish between worst-case and expected behavior. It's hard to define "expected," since it depends on assumptions about what kinds of inputs will be given. We can usually be precise about the worst case, although that may be leading. Quicksort's worst-case run-time is but the expected time is

By choosing the pivot element carefully each time, we can reduce the probability of quadratic or behavior to essentially zero; in practice, a implemented quicksort usually runs in time.

SECTION 2.6

These are the most important cases: Notation Name constant logarithmic linear nlogn quadratic cubic exponential Example array index binary search string comparison quicksort

simple sorting methods matrix multiplication set partitioning

Accessing an item in an array is a constant-time or operation. An algorithm that eliminates half the input at each stage, like binary search, will generally take Comparing two n-character strings with strcmp is The traditional matrix multiplication algorithm takes since each element of the output is the result of multiplying pairs and adding them up, and there are elements in each matrix.

Exponential-time algorithms are often the result of evaluating all possibilities: there are 2" subsets of a set of n items, so an algorithm that requires looking at all subsets will be exponential or Exponential algorithms are generally too expensive unless is very small, since adding one item to the problem doubles the running time. Unfortunately there are many problems, such as the famous "Traveling Salesman Problem," for which only exponential algorithms are known. When that is the case. algorithms that find approximations to the best answer are often substituted.