• No results found

7.5 A note on performance

If you have used associative arrays in other languages, those arrays were probably implemented in terms of a data structure called a hash table. Hash tables can be very fast, but they have compensating disadvantages:

 For each key type, someone must supply a hash function, which computes an appropriate integer value from the value of the key.

 A hash table's performance is exquisitely sensitive to the details of the hash function.

 There is usually no easy way to retrieve the elements of a hash table in a useful order. C++ associative containers are hard to implement in terms of hash tables:

 The key type needs only the < operator or equivalent comparison function.

 The time to access an associative-container element with a given key is logarithmic in the total number of elements in that container, regardless of the keys' values.

 Associative-container elements are always kept sorted by key.

In other words, although C++ associative containers will typically be slightly slower than the best hash-table data structures, they perform much better than naive data structures, their performance does not require their users to design good hash functions, and they are more convenient than hash tables because of their automatic ordering. If you're generally familiar with associative data structures, you might want to know that C++ libraries typically use a balanced self-adjusting tree structure to implement associative containers.

If you really want hash tables, they are available as parts of many C++ implementations. However, because they are not part of standard C++, they are beyond the scope of this book. Although no standard can be ideal for every purpose, the standard associative containers are more than adequate for most applications.

7.6 Details

The do while statement is similar to the while statement (§2.3.1/19), except that the test is at the end. The general

form of the statement is

do statement while (condition);

The statement is executed first, after which the condition and statement are executed alternately until the condition is false.

Value-initialization: Accessing a map element that doesn't yet exist creates an element with a value of V(), where V

is the type of the values stored in the map. Such an expression is said to be value-initialized. §9.5/164 explains the details of value-initialization; the most important aspect is that built-in types are initialized to 0.

rand() is a function that yields a random integer in the range [0, RAND_MAX]. Both rand and RAND_MAX are

defined in <cstdlib>.

pair<K, V> is a simple type whose objects hold pairs of values. Access to these data values is through their names,

first and second respectively.

map<K, V> is an associative array with key type K and value type V. The elements of a map are key-value pairs,

which are maintained in key order to allow efficient access of elements by key. The iterators on maps are bidirectional (§8.2.5/148). Dereferencing a map iterator yields a value of type pair<const K, V>. The map operations include: map<K, V> m;

Creates a new empty map, with keys of type const K and values of type V. map<K, V> m(cmp);

Creates a new empty map with keys of type const K and values of type V, that uses the predicate cmp to determine the order of the elements.

m[k]

Indexes the map using a key, k, of type K, and returns an lvalue of type V. If there is no entry for the given key, a new value-initialized element is created and inserted into the map with this key. Because using [] to access a map might create a new element, [] is not allowed on a const map.

m.begin() m.end()

Return iterators that can be used to access the elements of a map. Note that dereferencing one of these iterators yields a key-value pair, not just a value.

m.find(k)

Returns an iterator referring to the element with key k, or m.end() if no such element exists. For a map<K, V> and an associated iterator p, the following apply:

p->first Yields an lvalue of type const K that is the key for the element p denotes. p->second Yields an lvalue of type V that is the value part of the element that p denotes.

7-0. Compile, execute, and test the programs in this chapter.

7-1. Extend the program from §7.2/124 to produce its output sorted by occurrence count. That is, the output should

group all the words that occur once, followed by those that occur twice, and so on.

7-2. Extend the program in §4.2.3/64 to assign letter grades by ranges:

A 90-100 B 80-89.99... C 70-79.99... D 60-69.99... F < 60

The output should list how many students fall into each category.

7-3. The cross-reference program from §7.3/126 could be improved: As it stands, if a word occurs more than once

on the same input line, the program will report that line multiple times. Change the code so that it detects multiple occurrences of the same line number and inserts the line number only once.

7-4. The output produced by the cross-reference program will be ungainly if the input file is large. Rewrite the

program to break up the output if the lines get too long.

7-5. Reimplement the grammar program using a list as the data structure in which we build the sentence.

7-6. Reimplement the gen_sentence program using two vectors: One will hold the fully unwound, generated sentence,

and the other will hold the rules and will be used as a stack. Do not use any recursive calls.

7-7. Change the driver for the cross-reference program so that it writes line if there is only one line and lines

otherwise.

7-8. Change the cross-reference program to find all the URLs in a file, and write all the lines on which each distinct

URL occurs.

7-9. (difficult) The implementation of nrand in §7.4.4/135 will not work for arguments greater than RAND_MAX.

Usually, this restriction is no problem, because RAND_MAX is often the largest possible integer anyway.

Nevertheless, there are implementations under which RAND_MAX is much smaller than the largest possible integer. For example, it is not uncommon for RAND_MAX to be 32767 (215 -1) and the largest possible integer to be 2147483647 (231 -1). Reimplement nrand so that it works well for all values of n.

8