Chapter 9 More on Matrices
9.3 Orthogonal Matrices
9.3.3 Orthogonalizing a Matrix
Sometimes we encounter a matrix that is slightly out of orthogonality. We may have acquired bad data from an external source, or we may have accumulated floating-point error (which is called “matrix creep”). In these situations, we would like to orthogonalize the matrix, resulting in a matrix that has mutually perpendicular unit vector axes and is (hopefully) as close to the original matrix as possible.
The standard algorithm for constructing a set of orthogonal basis vectors (the rows of a matrix) is known as Gram-Schmidt orthogonalization. The basic idea is to go through the rows of the matrix in order. For each row vector, we subtract off the portion of that vector that is parallel to the proceeding rows, which must result in a perpendicular vector.
Let’s look at the 3×3 case as an example. As before, let r1, r2, and r3stand for the rows of a 3×3
matrix M. Then an orthogonal set of row vectors, r'1, r'2, and r'3, can be computed according to the
following algorithm:
The vectors r'1, r'2, and r'3are now mutually perpendicular, and so they are an orthogonal basis.
However, they may not necessarily be unit vectors. We need an orthonormal basis to form an orthogonal matrix, and so we must normalize the vectors. (Again, the terminology can be confus- ing. Please see the note at the end of the previous section.) Notice that if we normalize the vectors as we go, rather than in a second pass, then we can avoid all of the divisions.
The Gram-Schmidt algorithm is biased, depending on the order in which the basis vectors are listed. As an obvious example, r1never changes. A variation on the algorithm that is not biased
toward any particular axis is to not attempt to completely orthogonalize the entire matrix in one pass. We select some small fraction k, and instead of subtracting off all of the projection, we only subtract off k of it. We also subtract the projection on the original axis, not the adjusted one. In this way, the order in which we perform the operations does not matter and we have no dimensional bias. This algorithm is summarized below:
One iteration of this algorithm results in a set of basis vectors that are slightly more orthogonal that the original vectors, but perhaps not completely orthogonal. By repeating this procedure multiple times, we can eventually converge on an orthogonal basis. Selecting an appropriately small value for k (say, ¼) and iterating a sufficient number of times (say, ten) gets us fairly close. Then, we can use the standard Gram-Schmidt algorithm to guarantee a perfectly orthogonal basis.
9.4 4×4 Homogenous Matrices
Up until now, we have used only 2D and 3D vectors. In this section, we will introduce 4D vectors and the so-called “homogenous” coordinate. There is nothing mysterious or magical about 4D vectors and matrices (and no, the fourth coordinate in this case isn’t “time”). As we will see, 4D vectors and 4×4 matrices are nothing more than a notational convenience for what are simple 3D operations.
Chapter 9: More on Matrices
135
Equation 9.9: Gram-Schmidt orthogonaliza- tion of 3D basis vectors
9.4.1 4D Homogenous Space
As was mentioned in Section 4.1.3, 4D vectors have four components, with the first three compo- nents being the standard x, y, and z components. The fourth component in a 4D vector is w (because they ran out of letters in the alphabet!) and is sometimes referred to as the homogenous coordinate.
To understand how the standard physical 3D space is extended into 4D, let’s first examine homogenous coordinates in 2D, which are of the form (x, y, w). Imagine the standard 2D plane existing in 3D at the plane w = 1. So the physical 2D point (x, y) is represented in homogenous space (x, y, 1). For all points that are not in the plane w = 1, we can compute the corresponding 2D point by projecting the point onto the plane w = 1 and dividing by w. So the homogenous coordi- nate (x, y, w) is mapped to the physical 2D point (x/w, y/w). This is shown in Figure 9.2.
Thus, for any given physical 2D point (x, y), there are an infinite number of corresponding points in homogenous space. All of them are of the form (kx, ky, k), provided that k¹ 0. These points form a line through the homogenous origin.
When w = 0 the division is undefined and there is no corresponding physical point in 2D space. However, we can interpret a 2D homogenous point of the form (x, y, 0) as a “point at infin- ity,” which defines a direction rather than a location. There is more on this in the next section.
The same basic idea applies to 4D coordinates. The physical 3D points can be thought of as living in the “plane” in 4D at w = 1. A 4D point is of the form (x, y, z, w), and we project a 4D point onto this “plane” to yield the corresponding physical 3D point (x/w, y/w, z/w). When w = 0 the 4D point represents a “point at infinity,” which defines a direction rather than a location.
Homogenous coordinates and projection by division by w are interesting, but why on earth would we want to use 4D space? There are two primary reasons for using 4D vectors and 4×4 matrices. The first reason, which we will discuss in the next section, is actually nothing more than a notational convenience.
136
Chapter 9: More on MatricesFigure 9.2: Projecting homogenous coordinates onto the plane w = 1 in 2D
9.4.2 4×4 Translation Matrices
Recall from Section 8.8.1 that a 3×3 transformation matrix represents a linear transformation that does not contain translation. Due to the nature of matrix multiplication, the zero vector is always transformed into the zero vector, and, therefore, any transformation that can be represented by a matrix multiplication cannot contain translation. This is unfortunate, since matrix multiplication and inversion are very convenient tools for composing complicated transformations out of simple ones and manipulating nested coordinate space relationships. It would be nice if we could find a way to somehow extend the standard 3×3 transformation matrix to be able to handle transforma- tions with translation. 4×4 matrices provide a mathematical “kludge” which allows us to do this.
Assume for the moment that w is always one. Thus, the standard 3D vector [x, y, z] will always be represented in 4D as [x, y, z, 1]. Any 3×3 transformation matrix can be represented in 4D as shown below:
When we multiply a 4D vector of the form [x, y, z, 1] by a matrix of the form shown above, we get the same result as the standard 3×3 case, only the result in this case is a 4D vector with w = 1:
Now for the interesting part. In 4D, we can also express translation as a matrix multiplication, something we were unable to do in 3D:
It is important to understand that matrix multiplication is still a linear transformation, even in 4D. Matrix multiplication cannot represent “translation” in 4D, and the 4D zero vector will always be transformed back into the 4D zero vector. The reason this trick works to translate points in 3D is that we are actually shearing 4D space. (Compare Equation 9.10 with the shear matrices from
Chapter 9: More on Matrices
137
Equation 9.10: Using a 4×4 matrix to perform translation in 3D
Section 8.6.) The “plane” in 4D that corresponds to physical 3D space does not pass through the origin in 4D. Thus, when we shear 4D space, we are able to translate in 3D.
Let’s examine what happens when we perform a transformation without translation followed by a transformation with only translation. Let R be a rotation matrix. (In fact, R could possibly contain other 3D linear transformations, but for now, let’s assume R only contains rotation.) Let T be a translation matrix of the form in Equation 9.10.
We could then rotate and translate a point v to compute a new point v' as follows:
Remember that the order of transformations is important, and since we have chosen to use row vectors, the order of transformations coincides with the order in which the matrices are multiplied (from left to right). We are rotating first and then translating.
Just like 3×3 matrices, we can concatenate the two matrices into a single transformation matrix, which we’ll assign to the matrix M:
Let’s examine the contents of M:
Notice that the upper 3×3 portion of M contains the rotation portion, and the bottom row contains the translation portion. The rightmost column (for now) will be [0, 0, 0, 1]T. Applying this infor- mation in reverse, we can take any 4×4 matrix and separate it into a linear transformation portion and a translation portion. We can express this succinctly by assigning the translation vector [Dx, Dy, Dz] to the vector t:
Note: For the moment, we are assuming that the rightmost column is always [0, 0, 0, 1]T.
We will begin to encounter situations where this is not the case in Section 9.4.4.
138
Chapter 9: More on MatricesTEAM
FLY
Let’s see what happens with the so-called “points at infinity” when w = 0. Multiplying by a “stan- dard” 3×3 linear transformation matrix extended into 4D (a transformation that does not contain translation), we get:
In other words, when we transform a point at infinity vector of the form [x, y, z, 0] by a transforma- tion matrix containing rotation, scale, etc., the expected transformation occurs. The result is another point at infinity vector of the form [x', y', z', 0].
When we transform a point at infinity vector by a transformation that does contain translation, we get the following result:
Notice that the result is the same (i.e., no translation occurs). In other words, the w component of a 4D vector can be used to selectively “switch off” the translation portion of a 4×4 matrix. This is useful because some vectors represent “locations” and should be translated, and other vectors rep- resent “directions,” such as surface normals, and should not be translated. In a geometric sense, we can think of the first type of data as “points” and the second type of data as “vectors.”
One reason why 4×4 matrices are useful is that a 4×4 transformation matrix can contain trans- lation. When we use 4×4 matrices solely for this purpose, the rightmost column of the matrix will always be [0, 0, 0, 1]T. Since this is the case, why don’t we just drop the column and use a 4×3 matrix? According to linear algebra rules, 4×3 matrices are undesirable for several reasons:
n We cannot multiply a 4×3 matrix by another 4×3 matrix.
n We cannot invert a 4×3 matrix, since the matrix is not square.
n When we multiply a 4D vector by a 4×3 matrix, the result is a 3D vector.
Strict adherence to linear algebra rules forces us to add the fourth column. Of course, in our code, we are not bound by linear algebra rules. In Section 11.5, we will write a 4×3 matrix class that is useful for representing transformations that contain translation. This matrix class doesn’t explic- itly store the fourth column.