The year 1965 was a good one for the discovery of important algorithms. In that year, Cooley and Tukey [101] introduced thefast Fourier transform
(FFT) algorithm and Golub and Kahan [149] their method for calculating thesingular-value decomposition(SVD).
We have just seen that anN byN Hermitian matrixH can be written in terms of its eigenvalues and eigenvectors asH =U LU† or as
H=
N
X
n=1
λnun(un)†.
Thesingular value decomposition(SVD) is a similar result that applies to any rectangular matrix A. It is an important tool in image compression and pseudo-inversion.
5.5.1
Defining the SVD
LetAbe anyM byN complex matrix. In presenting the SVD ofAwe shall assume that N ≥M; the SVD ofA† will come from that ofA. Let
Q=A†AandP =AA†; we assume, reasonably, thatP, the smaller of the two matrices, is invertible, so all the eigenvaluesλ1, ..., λM ofPare positive.
We let the eigenvalue/eigenvector decomposition ofPbeP =U LU†, where
{u1, ..., uM} are orthonormal eigenvectors ofP andP um=λ mum.
From P U = U L or AA†U = U L it follows that A†AA†U = A†U L. Therefore, theM columns ofW =A†Uare eigenvectors ofQcorresponding to the eigenvalues λm; sinceP um=AA†umis not the zero vector, A†um
cannot be the zero vector either. But the columns ofW do not have norm one. To normalize these columns we replace them with theM columns of
A†U L−1/2, which are orthonormal eigenvectors ofQ.
Ex. 5.12 Show that the nonzero eigenvalues of Q = A†A and P =AA†
are the same.
Let Z be theN by N matrix whose first M columns are those of the matrixA†U L−1/2 and whose remainingN−M columns are any mutually
orthogonal norm-one vectors that are all orthogonal to each of the firstM
columns; note that this gives usZ†Z =I.
Let Σ be the M by N matrix with diagonal entries Σmm =
√
λm, for
m = 1, ..., M, and whose remaining entries are zero. The nonzero entries of Σ, the √λm, are called the singular values of A. The singular value decomposition(SVD) ofAisA=UΣZ†. The SVD ofA† isA†=ZΣTU†.
Ex. 5.13 Show thatUΣZ† equalsA.
We have assumed, for convenience, that none of the eigenvalues λm,
m = 1, ..., M are zero. If this is not true, we can obtain the SVD of A
simply by modifying the definition ofL−1/2 to have 1/√λ
m on the main
diagonal ifλmis not zero, and zero if it is. To show thatUΣZ† =A now
we need to use the fact thatP um= 0 implies that A†um= 0. To see this,
note that
0 =P um=AA†um
implies that
0 = (um)†P um= (um)†AA†um=kA†umk2.
As an example of the singular-value decomposition, consider the matrix
A, whose SVD is given by A= 4 8 8 3 6 6 = 4/5 3/5 3/5 −4/5 15 0 0 0 0 0 1/3 2/3 2/3 2/3 −2/3 1/3 2/3 1/3 −2/3 ,
which can also be written in dyad form as
A= 15 4/5 3/5 1/3 2/3 2/3 .
It is just a coincidence that, in this example, the matrices U and Z are symmetric. The SVD ofAT is then AT = 4 3 8 6 8 6 = 1/3 2/3 2/3 2/3 −2/3 1/3 2/3 1/3 −2/3 15 0 0 0 0 0 4/5 3/5 3/5 −4/5 .
Ex. 5.14 If H is a Hermitian matrix, its eigenvalue/eigenvector decom- positionH =U LU† need not be its SVD. Illustrate this point for the real symmetric matrix 1 0 0 −1 .
Using the SVD ofAwe can writeAas a sum of dyads:
A= M X m=1 p λmum(zm)†, (5.8)
wherezmdenotes themth column of the matrixZ.
In image processing, matrices such asA are used to represent discrete two-dimensional images, with the entries of A corresponding to the grey level or color at each pixel. It is common to find that most of theMsingular values of A are nearly zero, so that Acan be written approximately as a sum of far fewer than M dyads; this leads to SVD image compression. Such compression is helpful when many images are being transmitted, as, for example, when pictures of the surface of Mars are sent back to Earth.
Figures 5.1 and 5.2 illustrate what can be achieved with SVD compres- sion. In both Figures the original is in the upper left. It is a 128 by 128 digitized image, so M = 128. In the images that follow, the number of terms retained in the sum in Equation (5.8) is, first, 2, then 4, 6, 8, 10, 20 and finally 30. The full sum has 128 terms, remember. In Figure 5.1 the text is nearly readable using only 10 terms, and certainly could be made perfectly readable with suitable software, so storing just this compressed image would be acceptable. In Figure 5.2, an image of a satellite, we get a fairly good idea of the general shape of the object from the beginning, with only two terms.
Ex. 5.15 Suppose thatM =N andAis invertible. Show that we can write
A−1=
M
X
m=1
(pλm)−1zm(um)†.
5.5.2
An Application in Space Exploration
TheGalileowas deployed from the space shuttleAtlantison October 18, 1989. After a detour around Venus and back past Earth to pick up gravity- assisted speed,Galileoheaded for Jupiter. Its mission included a study of Jupiter’s moon Europa, and the plan was to send back one high-resolution photo per minute, at a rate of 134KB per second, via a huge high-gain antenna. When the time came to open the antenna, it stuck. Without the pictures, the mission would be a failure.
There was a much smaller low-gain antenna on board, but the best transmission rate was going to be ten bits per second. All that could be done from earth was to reprogram an old on-board computer to compress the pictures prior to transmission. The problem was that pictures could be taken much faster than they could be transmitted to earth; some way to store them prior to transmission was key. The original designers of the software had long since retired, but the engineers figured out a way to introduce state-of-the art image compression algorithms into the computer. It happened that there was an ancient reel-to-reel storage device on board that was there only to serve as a backup for storing atmospheric data.
Using this device and the compression methods, the engineers saved the mission [16].
5.5.3
A Theorem on Real Normal Matrices
Consider the real square matrix
S= 1 −1 1 1 . Since STS=SST = 2 0 0 2 ,
S is a real normal matrix. The eigenvalues of S are complex, S is not symmetric, and the eigenvalues of STS are not distinct. In contrast, we have the following theorem.
Let N be a real square matrix that is normal; that is NTN =N NT.
Now we use the SVD ofN to prove the following theorem.
Theorem 5.4 IfN is a real normal matrix and all the eigenvalues ofNTN
are distinct, then N is symmetric.
Proof: LetQ=NTN. SinceQis real, symmetric, and non-negative def- inite, there is an orthogonal matrix O such that QO =N NTO = OD2, with D ≥ 0 and D2 the diagonal matrix whose diagonal entries are the eigenvalues of Q = NTN. We shall want to be able to assume that the entries ofDare all positive, which requires a bit of explanation.
We replace the matrixN with the new matrixN+αI, whereα >0 is selected so that the matrix (N+αI)(N+αI)T has only positive eigenvalues.
We can do this because
(N+αI)(N+αI)T =N NT +α(N+NT) +α2I;
the first and third matrices have only non-negative eigenvalues and the second one has only real ones, so a large enough α can be found. Now we can prove the theorem for the new matrix N+αI, showing that it is symmetric. But it then follows that the matrixN must also be symmetric. Now we continue with the proof, assuming thatD >0. The columns of
Z=NTOD−1are then orthonormal eigenvectors ofNTN and the SVD of
N isN =ODZT.
SinceN is normal, we haveNTN O=OD2, and
ZD2=NTN Z=OD2OTZ,
so that
It follows from Exercise 3.7 thatOTZ=B is diagonal. FromZ =OB and
N=ODZT =ODBTOT =ODBOT =OCOT,
whereC=DB is diagonal, it follows thatNT =N.
This proof illustrates a use of the SVD of N, but the theorem can be proved using the eigenvector diagonalization of the normal matrix N
itself. Note that the characteristic polynomial of N has real coefficients, so its roots occur in conjugate pairs. If N has a complex root λ, then bothλand λare eigenvalues ofN. It follows that|λ|2 is an eigenvalue of
NTNwith multiplicity at least two. Consequently, ifNTN has no repeated eigenvalues, then every eigenvalue ofN is real. UsingU†N U =D, withD
real and diagonal, we getN =U DU†, so thatN†=U DU† =N. Therefore
N is real and Hermitian, and so is symmetric.
5.5.4
The Golub-Kahan Algorithm
We have obtained the SVD ofAusing the eigenvectors and eigenvalues of the Hermitian matricesQ=A†AandP =AA†; for large matrices, this is not an efficient way to get the SVD. The Golub-Kahan algorithm [149] calculates the SVD ofAwithout forming the matricesP andQ.
A matrixAisbi-diagonalif the only non-zero entries occur on the main diagonal and the first diagonal above the main one. Any matrix can be reduced to bi-diagonal form by multiplying the matrix first on the left by a succession of Householder matrices, and then on the right by another suc- cession of Householder matrices. TheQRfactorization is easier to calculate when the matrix involved is bi-diagonal.
The Golub-Kahan algorithm for calculating the SVD ofAinvolves first reducingAto a matrix Bin bi-diagonal form and then applying a variant of theQRfactorization.
Using Householder matrices, we get unitary matrices U0 and Z0 such
thatA=U0BZ0†, whereB is bi-diagonal. Then we find the SVD ofB,
B= ˜UΣ ˜Z†,
usingQRfactorization. Finally, the SVD forA itself is
A=U0U˜Σ ˜Z†Z0†.
Ever since the publication of the Golub-Kahan algorithm, there have been efforts to improve both the accuracy and the speed of the method. The improvements announced in [118] and [119] won for their authors the 2009 SIAM Activity Group on Linear Algebra Prize.