A Word on Optimization - Design Decisions

Chapter 6 A Simple 3D Vector Class

6.3 Design Decisions

6.3.13 A Word on Optimization

If you have seen vector classes elsewhere, this vector class may seem very simple in comparison. Many vector classes on the Internet and other sources are much more complicated, usually for the sake of “optimization.” There are two reasons to keep our vector class as simple as possible:

78

Chapter 6: A Simple 3D Vector Class

TEAM

FLY

n First, this book is intended to teach you how to program 3D math. All unnecessary complexities were removed so that there would be as few impediments to your understanding as possible.

n Second, and more importantly, we have not been convinced that such so-called “optimizations” actually increase execution performance significantly.

Let’s elaborate just a bit more on this second point. A famous programming rule of thumb states, “95% of the time is spent in 5% of the code.” In other words, to speed up execution, you must find the bottleneck and optimize it. Another famous quote related to optimization is, “Premature optimization is the root of all evil.” Optimizing code that isn’t the bottleneck complicates the code without appreciable benefit.

In the video game industry, performance is critical, and there are frequently times when vector processing can be an execution bottleneck. In these cases, optimization is necessary and extremely effective. However, almost invariably the way to get the biggest performance boost is to perform the operations using a special platform-specific processing unit. Unfortunately, idiosyncrasies of the vector processing unit (such as alignment restrictions or lack of a dot product operation) frequently make it impossible to design the vector class to take advantage of the vector processing unit in all situations. A large part of optimizing vector math code is to completely rearrange the code in order to exploit superscalar architecture, or perform an entire batch of operations in paral- lel on a separate processor. No amount of tweaking of the vector class can accomplish these high-level optimizations. Even when optimizing some vector math in an inner loop, hand-tuned assembly on sequences of vector operations will be much faster than the fastest compiler-generated code could possibly be, no matter how well the vector class is “optimized.”

These observations have caused us to divide vector math code (and code in general) into two categories. The first category contains the majority of code. Not much time is actually spent in this code, and therefore, optimization will not provide huge performance gains. The second category is the minority of code in which optimization is actually effective and often necessary. Rear- ranging the data structures or writing hand-tuned assembly is almost always significantly faster than compiler-generated code, no matter how well organized the vector class. (Notice that the above discussion applies to low-level optimization at the assembly instruction level and does not necessarily apply to higher-level optimizations, such as using better algorithms.)

The bottom line: if you’re seriously concerned about speed, optimizing your vector class will only speed up code that should be written in assembly. Also, it won’t make it as fast as the assembly would be. It will speed up the code elsewhere, but, unfortunately, not much time is spent in that code, so the gains will be relatively small. In our opinion, such small gains are not worth the added complexities to the vector class.

At the same time, “optimizing” a vector class usually results in considerable complications to the design of the vector class. This extra complexity can take its toll on compile times, your sanity, and (depending on how the optimizations were written) the aesthetic quality of code written using the vector class.

There are two specific optimizations that we should comment on in more detail. The first is fixed-point math, and the second is returning references to temporary variables.

Back in “the old days” (a few years ago), floating-point math was considerably slower than integer math on consumer systems. Most notably, floating-point multiplication was slow. Pro- grammers used fixed-point math in order to circumvent this problem. If you are unfamiliar with fixed-point math, the basic idea behind the technique is that a number is stored with a fixed number of fractional bits. For example, there might be eight fractional bits, which would mean a number would be stored times 256. So the number 3.25 would be stored as 3.25×256 = 832. Except in a few special cases, fixed-point math is an optimization technique of the past. Today’s processors not only can perform floating math in the same number of cycles as integer math, but they have dedicated vector processors to perform floating-point vector math. Use floating-point math in your vector class.

In our vector class, many of the vector operations (such as addition, subtraction, and multiplication by a scalar) have been coded to return an actual object of type Vector3. Depending on the compiler, this can be implemented in a variety of different ways. At the very least, returning a class object always results in at least one constructor call (according to the C++ standard). We have tried to code our functions with the constructor call in the actual return statement so that the compiler doesn’t generate any “extra” constructor calls. Unfortunately, it is true that returning a class object can have performance implications.

However, beware of a special optimization “trick” that is often used to avoid these constructor calls. The basic idea is to maintain a pool of temporary class objects. Instead of returning an actual class object, the result of the function is computed into a temporary object, and then a reference to this temporary is returned. It usually looks something like this:

// Maintain a pool of temporary objects

int nextTemp; Vector3 tempList[256];

// Get a pointer to the next temporary object

inline Vector3 *nextTempVector() {

// Advance pointer and loop it

nextTemp = (nextTemp + 1) & 255;

// Return pointer to the slot

return &tempList[nextTemp]; }

// Now rather than returning an actual class object, // we can return a reference to a temporary. For example, // the addition operator could be implemented like this:

const Vector3 &operator +(const Vector3 &a, const Vector3 &b) {

// Snag a temp.

Vector3 *result = nextTempVector();

// Compute the result.

result->x = a.x + b.x; result->y = a.y + b.y; result->z = a.z + b.z;

// Return reference. No constructor calls!

return *result; }

// Now we can add and subtract vectors using the same // natural syntax, just as before. Vector expressions // with multiple additions and subtractions work, provided // that no more than 256 temporaries are used in the same // expression. (A very reasonable restriction.)

Vector3 a, b; Vector3 c = a + b;

At first glance, this appears to be a great idea. The overhead of maintaining the index variable is usually less than the compiler’s overhead of copy constructors and returning temporary objects. Overall, performance is increased (slightly).

There’s just one problem. Our simple system of using a looping index variable assumes a great deal about the lifetime of temporary objects. In other words, we assume that when we create a temporary object, we won’t need that object again by the time 256 more temporaries have been created. For simple vector expressions, this is usually not a problem. The problem comes when we pass these references into functions. A temporary passed into a function should not expire before the completion of the function. For example:

// A set of vertices of a triangle mesh

void bias(

const Vector3 inputList[], int n,

const Vector3 &offset, Vector3 outputList[] ) {

for (int i = 0 ; i < n ; ++i) {

outputList[i] = intputList[i] + offset; }

}

// Elsewhere in the code...

void foo() {

const int n = 512;

Vector3 *before = new Vector3[n]; Vector3 *after = new Vector3[n];

// … (Compute the bounding box into min and max)

Vector3 min, max;

// Let’s recenter the model about its centroid

// YIKES! But this doesn’t work because our temporary // (min + max) / 2.0f gets trampled inside the function!

bias(before, n, (min + max) / 2.0f, after);

// ... }

Of course, this example is a bit contrived in order to illustrate the problem in as few lines of code as possible, but certainly the problem does arise in practice. No matter how big we make our pool, we are in danger of having a bug, since even the most simple function may result in millions of temporary objects being created. The basic problem is that evaluation of the lifespan of a tempo- rary must be done at compile time (by the compiler), not at run time.

Bottom line: keep your classes simple. In the very few cases where the overhead of constructor calls and the like is a significant problem, write the code in hand-tuned C++ or assembly. Also, you can write your function to accept a pointer where the return value should be placed, rather than actually returning a class object. However, don’t complicate 100% of the code in order to optimize 2% of it.

C h a p t e r 7

Introduction to

Matrices

Matrices are of fundamental importance in 3D math, where they are primarily used to describe the relationship between two coordinate spaces. They do this by defining a computation to transform vectors from one coordinate space to another.

7.1 Matrix — A Mathematical Definition

In linear algebra, a matrix is a rectangular grid of numbers arranged into rows and columns. Recalling our earlier definition of vector as a one-dimensional array of numbers, a matrix may likewise be defined as a two-dimensional array of numbers. (The two in “two-dimensional array” comes from the fact that there are rows and columns, and it should not be confused with 2D vectors or matrices.) A vector is an array of scalars, and a matrix is an array of vectors.

7.1.1 Matrix Dimensions and Notation

Just as we defined the dimension of a vector by counting how many numbers it contained, we will define the size of a matrix by counting how many rows and columns it contains. An r×c matrix (read “r by c”) has r rows and c columns. Here is an example of a 4×3 matrix:

83

This chapter introduces the theory and application of matrices. It is divided into two main sections.

n Section 7.1 discusses some of the basic properties and operations of matrices strictly from a mathematical perspective. (More matrix operations are discussed in Chapter 9.)

This 4×3 matrix illustrates the standard notation for writing matrices. We arrange the numbers in a grid, surrounded by square brackets. Note that other authors surround the grid of numbers with parentheses rather than brackets. Other authors use straight vertical lines. We will reserve this notation for an entirely separate concept related to matrices, the determinant of a matrix. (We will discuss determinants in Section 9.1.)

As we mentioned in Section 5.2, we will represent a matrix variable with uppercase letters in boldface, for example: M, A, R. When we wish to refer to the individual elements within a matrix, we use subscript notation, usually with the corresponding lowercase letter in italics. This is shown below for a 3×3 matrix:

mijdenotes the element in M at row i and column j. Matrices use 1-based indices, so the first row and column are numbered one. For example, m12(read “m one two,” not “m twelve”) is the ele-

ment in the first row, second column. Notice that this is different from the C programming language, which uses 0-based array indices. A matrix does not have a column 0 or row 0. This dif- ference in indexing can cause some confusion if using actual C arrays to define matrices. (This is one reason we won’t use arrays to define matrices in our code.)

7.1.2 Square Matrices

Matrices with the same number of rows as columns are called square matrices and are of particular importance. In this book, we will be interested in 2×2, 3×3, and 4×4 matrices.

The diagonal elements of a square matrix are those elements where the row and column index are the same. For example, the diagonal elements of the 3×3 matrix M are m11, m22, and m33. The

other elements are non-diagonal elements. The diagonal elements form the diagonal of the matrix:

If all non-diagonal elements in a matrix are zero, then the matrix is a diagonal matrix. For example:

In document [Wordware] 3D Math Primer for Graphics and Game Development (Page 93-99)