Iterator Categories, Operations, and Traits

Deﬁnition 10.5. An object type is a uniform method of storing and retrieving values of a given value type from a particular object when given its address.

10.5 Iterator Categories, Operations, and Traits

There are several kinds of iterators, which we call iterator categories. Here are the most important: • Input iterators support one-directional traversal, but only once, as is found in single-pass algorithms. The canonical model of an input iterator is the position in an input stream. Bytes are coming over the wire and we can process them one at a time, but once they are processed, they are gone. In particular, with input iterators i == j does not imply ++i == ++j; for example, if you’ve already consumed a character from an input stream, you can’t consume the same character again with a diﬀerent iterator. Keep in mind that just because an algorithm only requires input iterators does not mean it is limited to operating on input streams.

• Forward iterators also support only one-directional traversal, but this traversal can be repeated as needed, as in multi-pass algorithms. The canonical model of a forward iterator is the position in a singly linked list. 9 9 _{We assume that link structure of the list is not modiﬁed as it is traversed.}

• Bidirectional iterators support bidirectional traversal, repeated as needed (i.e., they also can be used in multi- pass algorithms). The canonical model of a bidirectional iterator is the position in a doubly linked list.

Bidirectional iterators have an invertible successor function: if an element x has a successor y, then y has a predecessor.

• Random-access iterators support random-access algorithms; that is, they allow access to any element in constant time (both far and fast). The canonical model is the position in an array.

In addition, there is another common iterator category that behaves diﬀerently from the others: • Output iterators support alternating successor (++) and dereference (*) operations, but the results of dereferencing an output iterator can appear only on the left-hand side of an assignment operator, and they provide no equality function. The canonical model of an output iterator is the position in an output stream. We can’t deﬁne equality because we can’t even get to the elements once they’ve been output.

While the iterators described so far are the only ones included in C++, other useful iterator concepts also exist: • Linked iterators work in situations where the successor function is mutable (for example, a linked list where the link structure is modiﬁed).

• Segmented iterators are for cases where the data is stored in noncontiguous segments, each containing contiguous sequences. std::deque, a data structure that is implemented as a segmented array, would immediately beneﬁt; instead of needing each successor operation to check whether the end of the segment has been reached, a “top level” iterator could ﬁnd the next segment and know its bounds, while the “bottom level” iterator could iterate through that segment.

Iterators like these can easily be implemented. Just because a concept is not built into the language does not mean it’s not useful. In general, STL should be viewed as a set of well-chosen examples, not an exhaustive

mean it’s not useful. In general, STL should be viewed as a set of well-chosen examples, not an exhaustive

collection of all useful concepts, data structures and algorithms. * * *

A simple but important thing we may want to do is ﬁnd the distance between two iterators. For an input iterator, we might write our distance() function like this:

Click here to view code image

template <InputIterator I>

DifferenceType distance(I f, I l, std::input_iterator_tag) { // precondition: valid_range(f, l) DifferenceType n(0); while (f != l) { ++f; ++n; } return n; }

There are three notable things about this code: the use of the type function DifferenceType, the use of the iterator tag argument, and the precondition. We’ll discuss all of these soon, but before we do, let’s compare this to a diﬀerent implementation—one that’s optimized for random access iterators:

Click here to view code image

template <RandomAccessIterator I> DifferenceType distance(I f, I l,

std::random_access_iterator_tag) { // precondition: valid_range(f, l)

return l - f; }

Since we have random access, we don’t have to repeatedly increment (and count) from one iterator to the other; we can just use a constant time operation—subtraction—to ﬁnd the distance.

The difference type of an iterator is an integral type that is large enough to encode the largest possible range. For example, if our iterators were pointers, the difference type in C++ could be ptrdiff_t. But in general we don’t know in advance which type the iterator will be, so we need a type function to get the difference type. Although C++ does not have a general mechanism for type functions, STL iterators have a special set of attributes known as iterator traits, one of which gives us the difference type. The complete set of iterator traits is • value_type • reference • pointer • difference_type • iterator_category

We’ve mentioned value_type before; it returns the type of the values pointed to by the iterator. The reference and pointer traits are rarely used in current architectures, 10 _{but the others are very important.} 10 _{Earlier versions of the Intel processor architecture included diﬀerent types for shorter and longer pointers,}

so it was important to know which to use for a given iterator. Today, if the value type of an iterator is T, the pointer iterator trait would normally be T*.

Since the syntax for accessing iterator traits is rather verbose, we’ll implement our own type function for accessing difference_type, with the using construct of C++11. (See Appendix C for more information about using.)

Click here to view code image

template <InputIterator I> using DifferenceType =

typename std::iterator_traits::difference_type; This gives us the DifferenceType type function used in the earlier code.

The iterator trait iterator_category returns a tag type representing the kind of iterator we’re dealing with. Objects of these tag types contain no data. As we did for DifferenceType, we deﬁne the following type function:

Click here to view code image

template <InputIterator I> using IteratorCategory =

typename std::iterator_traits::iterator_category;

Now we can return to the use of the iterator tag argument in the distance functions. The iterator tags shown in the examples (input_iterator_tag and random_access_iterator_tag) are possible values of the iterator category trait, so by including them as arguments, we are distinguishing the type signature of the two function implementations. (We will see more examples of this in Chapter 11.) This allows us to perform category dispatch on the distance function; that is, we can write a general form of the function for any iterator category, and the fastest one will be invoked:

Click here to view code image

template <InputIterator I>

DifferenceType distance(I f, I l) {

return distance(f, l, IteratorCategory()); }

Note that the third argument is actually a constructor call creating an instance of the appropriate type, because we cannot pass types to functions. When the client calls distance(), it uses the two-argument version shown here. That function then invokes the implementation that matches the iterator category. This dispatch happens at compile time and the general function is inline, so there is literally no performance penalty for choosing the right version of the function.

The use of tag types as arguments to distinguish versions of the function may seem redundant, since we already speciﬁed diﬀerent concepts in the templates. However, recall that our use of concepts serves only as documentation for the programmer; current C++ compilers don’t know anything about concepts. Once concepts are added to the language, the arcane iterator category tag mechanism will no longer be needed.

10.6 Ranges

A range is a way of specifying a contiguous sequence of elements. Ranges can be either semi-open or closed; 11 _a

closed range [i, j] includes items i and j, while a semi-open range [i, j) includes i but ends just before j. It turns out that semi-open ranges are the most convenient for defining interfaces. This is because algorithms that operate on sequences of n elements need to be able to refer to n + 1 positions. For example, there are n + 1 places to insert a new item: before the first element, between any two elements, or after the last element. Also, semi-open ranges, unlike closed ranges, can describe an empty range. Furthermore, a semi-open empty range can be specified at any position; it provides more information than a simple “nil” or empty list.

11 _{In mathematics, there are also open ranges, but they are less useful in programming, so we do not include}

them here.

A range can be speciﬁed in one of two ways: a bounded range has two iterators (one pointing to the beginning and one pointing just past the end), while a counted range has an iterator pointing to the beginning and an integer n indicating how many items are included. This gives us four kinds of ranges altogether:

(A closed counted range must have n > 0.) As we shall see, there are diﬀerent situations where bounded or counted ranges are preferable.

While mathematical texts index sequences from 1, computer scientists start from 0, and we will use the latter convention for our ranges. Interestingly, although 0-based indexing in computer science was initially used as a way to indicate the oﬀset in memory, this convention turns out to be more natural regardless of

implementation, since it means that for a sequence with n elements, the indices are in the range [0,n) and any iteration is bounded by the length.

* * *

Now we can return to the third notable feature of our distance functions: the valid_range precondition. It would be nice if we could have a valid_range function that returned true if the range speciﬁed by the two iterators was valid and false otherwise, but unfortunately, it’s not possible to implement such a function. For example, if two iterators each represent cells in a linked list, we have no way of knowing if there’s a path from one to the other. But even if we’re dealing with simple pointers, we still cannot compute valid_range: there is no way in C or C++ to determine if two pointers point to a single contiguous block of memory; there might be gaps in the middle.

So we can’t write a valid_range function, but we can still use it as a precondition. Instead of guaranteeing the correct behavior in code, we’ll use axioms that, if satisﬁed, ensure that our distance function will behave as intended. Speciﬁcally, we postulate the following two axioms:

The ﬁrst axiom says that if it’s a container, the range from begin() to end() is valid. The second axiom says that if [x,y) is a nonempty valid range, then the range [successor(x),y) is also valid. All STL-style containers, as well as C++ arrays, must obey these axioms. This allows us to prove the algorithms correct. For example, if you go back to our original distance function for input iterators in Section 10.5, you’ll see that the second axiom ensures that if we start with a valid range, we’ll still have one each time through the loop.

* * *

In addition to the successor (++) and distance operations, it’s useful to have a way to move an iterator by several positions at once. We call this function advance. As before, we’ll implement two versions, one for input iterators:

Click here to view code image

template <InputIterator I>

void advance(I& x, DifferenceType n, std::input_iterator_tag) { while (n) {

--n; ++x; }

}

and another for random access iterators:

Click here to view code image

template <RandomAccessIterator I>

void advance(I& x, DifferenceType n,

std::random_access_iterator_tag) { x += n;

}

We’ll also provide with a top-level function for doing the dispatch:

Click here to view code image

template <InputIterator I>

void advance(I& x, DifferenceType n) { advance(x, n, IteratorCategory()); }

In document AStepanov From Mathematics to Generic Programming (Page 143-146)