Introduction to Generic Data Structures and Standard Template Library (STL)

11.1 INTRODUCTION AND OBJECTIVES

Chapter 10 introduced the template mechanism in C++ and we created a number of simple template classes to show how the syntax is used. In this chapter we give an overview of the Standard Template Library (STL). STL is a library of C++ template classes for commonly occurring data structures (such as lists and vectors), algorithms (for example, sorting, searching and extracting information) as well as functionality for navigating in data structures. It is part of the ISO C++ standard and is not specific to a particular vendor. Thus, code that you write and use with C++ compiler from vendor A will run using C++ compiler from vendor B. Fur-thermore, thecomponents in STL have been designed and implemented with performance in mind. There is no point in trying to write your own components to compete with STL; a better idea would be to use and apply STL to create your own financial engineering applications.

STL components can be used asfoundation classes. These are the data structures that one sees in computer science and algebra books, for example:

r

Lists and vectors (linear sequences of data)

r

Maps and multimaps (key-value pairs, like telephone books)

r

Sets and multisets

r

Specially adapted containers such as stacks and queues

You can use these containers in a myriad of ways in your applications. For example, you can use the abovecontainers as follows:

r

As member data in your classes

r

To hold heterogenous data

r

STL containers can be nested

r

You can instantiate STL with your own specific data types and use them directly in applica-tions

Furthermore, STL has many powerfulalgorithms that allow you to perform operations on containers and collections of data:

r

Inserting data into a container

r

Searching for data in a container

r

Replacing data in a container

r

Merging and combining containers

r

Sorting a container

r

Set-like operations (set union, intersection and difference)

Finally, STLiterators allow us to ‘bind’ containers and algorithms. Iterators are similar to traditional pointers in C but their added value is that they permit access to data in a container without having to know how the container is implemented. In this sense we view an iterator as a kind ofMediator (in the sense of Gamma et al., 1995).

169

In this chapter we give an overview of one subset of STL functionality. In particular, we discuss the list<T> container and how to use algorithms and iterators in conjunction with it.

After having read this chapter you will have gained an understanding of the following fundamental issues:

r

Coming to terms with STL syntax

r

Representative STL functionality and using it in simple applications

r

Creating data structures that will be useful in options pricing applications

The following chapters expand on these issues. A more detailed discussion of template pro-gramming, STL and applications to QF can be found in Duffy (2004).

11.2 COMPLEXITY ANALYSIS

Before we discuss linear and nonlinear data structures in detail, we introduce a number of concepts that have to do with the efficiency of algorithms acting on data structures. This is a neglected topic in much of the modern literature but it is important to know what the time and space efficiency issues will be before we choose a certain data structure for use in an application. Conversely, assuming we know what our efficiency requirements are, how do we choose the ‘optimal’ data structure that fits the bill as it were?

We need to have some measure of the cost of an algorithm and we need an indication of what the total effort will be. To this end, we use so-called logical units that express a relationship between the size n of a data container and the amount of time t required to process the data. In general, it is difficult to find an exact, analytical formula for this rela-tionship and we must then resort to other approximate techniques. But in many cases the approximate formula is sufficiently close to the exact formula, especially for an algorithm that processes large amounts of data, that is, when the size n becomes very large. This mea-sure of efficiency is calledasymptotic complexity and it is used when disregarding certain terms of a function that expresses the efficiency of an algorithm. For example, consider the function:

f (n)= n²+ 100n + log n + 1000 (11.1)

For small values of n the fourth term is the largest. When n reaches the value 100, however the first and second terms are vying for first place. When n becomes even bigger we see that the first term in (11.1) predominates and we say that response time is quadratic.

We define a notation that specifies asymptotic complexity. To this end, we introduce the

‘big-O’ notation.

Definition 1: A function f (n) is O(g(n)) (where g(n) is a given function) if there exist positive numbersc and N such that:

f (n)≤ cg(n) for all n ≥ N (11.2)

This inequality states thatg(n) is an upper bound for the value of f (n); alternatively, we can say that f grows at most as fast as g in the long term. In general, inequality (11.2) is an existence result only and it does not tell us how to calculatec and N .

Thelogarithmic function is one of the most important functions when evaluating the effi-ciency of algorithms. In general, an algorithm is considered to be good if its complexity is of the order of the logarithmic function for largen. Some other examples are given in Table 11.1.

Table 11.1 Classes of algorithms n

Constant 0 (1)

Logarithmic 0 (log n)

Linear 0 (n)

0 (nlog n) 0 (nlog n)

Quadratic 0(n²)

Cubic 0 (n³)

Exponential 0 (2ⁿ)

We now discuss some more notation that refers to issues that give lower bounds for the complexity of an algorithm, in contrast where the big-O notation gave upper bounds for the complexity.

Definition 2: The function f (n) is(g(n)) (where g(n) is a given function) if there exist positive numbersc and N such that:

f (n)≥ cg(n) for all n ≥ N (11.3)

The only difference between (11.2) and (11.3) is the sign of the inequality.

11.2.1 Examples of complexities

An algorithm is calledconstant if its execution time is independent of the number of elements in the data container and we usually use the notation O(1). Similarly, an algorithm is called logarithmic if the execution time is O(log n). In general, we would like to determine the number of milliseconds for a given value of n. This result is machine-dependent.

An example is displayed in Table 11.2 for the case ofn= 10,000 and on a CPU that has an execution time of a million operations per second. Thus, if you are using an algorithm whose complexity you know then you can get a rough idea of how long the algorithm will take to execute.

This concludes our example. Actually finding the asymptotic complexity for an algorithm is outside the scope of this book and we content ourselves with giving the estimates that we use, in particular in conjunction with the STL.

Table 11.2 Execution times (1 second= 10⁶μ sec = 10³ms) n= 10⁴

Constant 1 1μ sec

Logarithmic 13.3 13μ sec

Linear 10⁴ 10 m sec

0 (nlog n) 133* 10³ 133 m sec

Quadratic 10⁸ 1.7 min

Cubic 10¹² 11.6 days

Exponential 10³⁰¹⁰ ‘long’ (forever)

Before we go on to the next section we finish with a couple of concepts that you come across when dealing withcomplexity classes. We define the class P to consist of those decision problems that can be solved on a deterministic sequential machine in the amount of time that is a polynomial function of the size of the input. The class NP (non-deterministic polynomial time) consists of all those decision problems whose positive solutions can be verified in polynomial time given the right information, or equivalently, whose solution can be found in polynomial time on a non-deterministic machine. In complexity theory, the NP-complete problems are the most difficult to solve in NP in the sense that they are the ones most likely not to be in P. An example of an NP-complete problem is the subset sum problem: given a finite set of integers, determine whether any non-empty subset of them sums to zero.

11.3 AN INTRODUCTION TO DATA STRUCTURES

Before we discuss the programming details of the STL we think that it is a good idea to introduce data structures from a Computer Science perspective. A good understanding of the theoretical underpinnings will help you find which STL functionality you need for your QF applications.

11.3.1 Lists

A list is an example of asequential container. This means that the data is stored in a sequential fashion. We distinguish between several kinds of linked list:

r

Singly linked list

r

Doubly linked list (supported in STL)

r

Circular list

In document Introduction to C++ for Financial Engineers An Object-Oriented Approach (The Wil.pdf (Page 186-189)