Inserting and erasing elements

If your program is breaking mysteriously, look for places where you hold onto an iterator while adding more objects to a vector. You’ll need to get a new iterator after adding elements, or use operator[ ] instead for element selections. If you combine the above observation with the awareness of the potential expense of adding new objects to a vector, you may conclude that the safest way to use one is to fill it up all at once (ideally, knowing first how many objects you’ll need) and then just use it (without adding more objects) elsewhere in the program. This is the way vector has been used in the book up to this point.

You may observe that using vector as the “basic” container in the earlier chapters of this book may not be the best choice in all cases. This is a fundamental issue in containers, and in data structures in general: the “best” choice varies according to the way the container is used. The reason vector has been the “best” choice up until now is that it looks a lot like an array, and was thus familiar and easy for you to adopt. But from now on it’s also worth thinking about other issues when choosing containers.

Inserting and erasing elements

The vector is most efficient if:

1. You reserve( ) the correct amount of storage at the beginning so the vector never has to reallocate.

2. You only add and remove elements from the back end.

It is possible to insert and erase elements from the middle of a vector using an iterator, but the following program demonstrates what a bad idea it is:

//: C04:VectorInsertAndErase.cpp // Erasing an element from a vector

#include "Noisy.h"

#include <iostream>

#include <vector>

#include <algorithm>

using namespace std;

int main() {

vector<Noisy> v;

v.reserve(11);

cout << "11 spaces have been reserved" << endl;

generate_n(back_inserter(v), 10, NoisyGen());

ostream_iterator<Noisy> out(cout, " ");

cout << endl;

copy(v.begin(), v.end(), out);

cout << "Inserting an element:" << endl;

vector<Noisy>::iterator it =

v.begin() + v.size() / 2; // Middle v.insert(it, Noisy());

cout << endl;

copy(v.begin(), v.end(), out);

cout << "\nErasing an element:" << endl;

// Cannot use the previous value of it:

it = v.begin() + v.size() / 2;

v.erase(it);

cout << endl;

copy(v.begin(), v.end(), out);

cout << endl;

} ///:~

When you run the program you’ll see that the call to reserve( ) really does only allocate storage – no constructors are called. The generate_n( ) call is pretty busy: each call to NoisyGen::operator( ) results in a construction, a copy-construction (into the vector) and a destruction of the temporary. But when an object is inserted into the vector in the middle, it must shove everything down to maintain the linear array and – since there is enough space – it does this with the assignment operator (if the argument of reserve( ) is 10 instead of eleven

then it would have to reallocate storage). When an object is erased from the vector, the assignment operator is once again used to move everything up to cover the place that is being erased (notice that this requires that the assignment operator properly cleans up the lvalue).

Lastly, the object on the end of the array is deleted.

You can imagine how enormous the overhead can become if objects are inserted and removed from the middle of a vector if the number of elements is large and the objects are

complicated. It’s obviously a practice to avoid.

deque

The deque (double-ended-queue, pronounced “deck”) is the basic sequence container optimized for adding and removing elements from either end. It also allows for reasonably fast random access – it has an operator[ ] like vector. However, it does not have vector’s constraint of keeping everything in a single sequential block of memory. Instead, deque uses multiple blocks of sequential storage (keeping track of all the blocks and their order in a mapping structure). For this reason the overhead for a deque to add or remove elements at either end is very low. In addition, it never needs to copy and destroy contained objects during a new storage allocation (like vector does) so it is far more efficient than vector if you are adding an unknown quantity of objects. This means that vector is the best choice only if you have a pretty good idea of how many objects you need. In addition, many of the programs shown earlier in this book that use vector and push_back( ) might be more efficient with a deque. The interface to deque is only slightly different from a vector (deque has a

push_front( ) and pop_front( ) while vector does not, for example) so converting code from using vector to using deque is almost trivial. Consider StringVector.cpp, which can be changed to use deque by replacing the word “vector” with “deque” everywhere. The following program adds parallel deque operations to the vector operations in StringVector.cpp, and performs timing comparisons:

//: C04:StringDeque.cpp

// Converted from StringVector.cpp

#include "../require.h"

#include <string>

#include <deque>

#include <vector>

#include <fstream>

#include <iostream>

#include <iterator>

#include <sstream>

#include <ctime>

using namespace std;

int main(int argc, char* argv[]) { requireArgs(argc, 1);

ifstream in(argv[1]);

assure(in, argv[1]);

vector<string> vstrings;

deque<string> dstrings;

string line;

// Time reading into vector:

clock_t ticks = clock();

while(getline(in, line)) vstrings.push_back(line);

ticks = clock() - ticks;

cout << "Read into vector: " << ticks << endl;

// Repeat for deque:

ifstream in2(argv[1]);

assure(in2, argv[1]);

ticks = clock();

while(getline(in2, line)) dstrings.push_back(line);

ticks = clock() - ticks;

cout << "Read into deque: " << ticks << endl;

// Now compare indexing:

ticks = clock();

for(int i = 0; i < vstrings.size(); i++) { ostringstream ss;

ss << i;

vstrings[i] = ss.str() + ": " + vstrings[i];

}

ticks = clock() - ticks;

cout << "Indexing vector: " << ticks << endl;

ticks = clock();

for(int j = 0; j < dstrings.size(); j++) { ostringstream ss;

ss << j;

dstrings[j] = ss.str() + ": " + dstrings[j];

}

ticks = clock() - ticks;

cout << "Indexing deqeue: " << ticks << endl;

// Compare iteration

ofstream tmp1("tmp1.tmp"), tmp2("tmp2.tmp");

ticks = clock();

copy(vstrings.begin(), vstrings.end(), ostream_iterator<string>(tmp1, "\n"));

ticks = clock() - ticks;

cout << "Iterating vector: " << ticks << endl;

ticks = clock();

copy(dstrings.begin(), dstrings.end(), ostream_iterator<string>(tmp2, "\n"));

ticks = clock() - ticks;

cout << "Iterating deqeue: " << ticks << endl;

} ///:~

Knowing now what you do about the inefficiency of adding things to vector because of storage reallocation, you may expect dramatic differences between the two. However, on a 1.7 Megabyte text file one compiler’s program produced the following (measured in

platform/compiler specific clock ticks, not seconds):

Read into vector: 8350 Read into deque: 7690 Indexing vector: 2360 Indexing deqeue: 2480 Iterating vector: 2470 Iterating deqeue: 2410

A different compiler and platform roughly agreed with this. It’s not so dramatic, is it? This points out some important issues:

1. We (programmers) are typically very bad at guessing where inefficiencies occur in our programs.

2. Efficiency comes from a combination of effects – here, reading the lines in and converting them to strings may dominate over the cost of the vector vs. deque.

3. The string class is probably fairly well-designed in terms of efficiency.

Of course, this doesn’t mean you shouldn’t use a deque rather than a vector when you know that an uncertain number of objects will be pushed onto the end of the container. On the contrary, you should – when you’re tuning for performance. But you should also be aware that performance issues are usually not where you think they are, and the only way to know for sure where your bottlenecks are is by testing. Later in this chapter there will be a more

“pure” comparison of performance between vector, deque and list.

In document Thinking in C++ (Page 178-182)