4.3 General Projection Methods
4.3.1 Orthogonal Projection Methods
LetAbe ann×ncomplex matrix andKbe anm-dimensional subspace ofCn. As a notational convention we will denote by the same symbolAthe matrix and the linear application inCn that it represents. We consider the eigenvalue problem:
findubelonging toCnandλbelonging toCsuch that
Au = λu. (4.16)
An orthogonal projection technique onto the subspaceK seeks an approxi- mate eigenpair˜λ,u˜to the above problem, withλ˜ inCandu˜inK,such that the following Galerkin condition is satisfied:
Au˜−λ˜u˜⊥ K, (4.17)
or, equivalently,
(Au˜−λ˜u, v˜ ) = 0, ∀v∈ K. (4.18) Assume that some orthonormal basis{v1, v2, . . . , vm}ofKis available and denote byV the matrix with column vectorsv1, v2, . . . , vm. Then we can solve the approximate problem numerically by translating it into this basis. Letting
˜
u = V y, (4.19)
equation (4.19) becomes
(AV y−λV y, vj˜ ) = 0, j= 1, . . . , m.
Therefore,yandλ˜must satisfy
with
Bm = VHAV.
If we denote byAmthe linear transformation of rankmdefined byAm=PKAPK then we observe that the restriction of this operator to the subspaceKis repre- sented by the matrixBmwith respect to the basis V. The following is a pro- cedure for computing numerically the Galerkin approximations to the eigenval- ues/eigenvectors ofAknown as the Rayleigh-Ritz procedure.
ALGORITHM4.5 Rayleigh-Ritz Procedure:
1. Compute an orthonormal basis{vi}i=1,...,m of the subspaceK. LetV = [v1, v2, . . . , vm].
2. ComputeBm=VHAV;
3. Compute the eigenvalues of Bm and select the k desired ones ˜λi, i = 1,2, . . . , k,wherek≤m.
4. Compute the eigenvectorsyi, i= 1, . . . , k,ofBmassociated withλi, i˜ = 1, . . . , k,and the corresponding approximate eigenvectors ofA,ui˜ =V yi, i= 1, . . . , k.
The above process only requires basic linear algebra computations. The numeri- cal solution of them×meigenvalue problem in steps 3 and 4 can be treated by standard library subroutines such as those in EISPACK. Another important note is that in step 4 one can replace eigenvectors by Schur vectors to get approximate Schur vectorsui˜ instead of approximate eigenvectors. Schur vectorsyican be ob- tained in a numerically stable way and, in general, eigenvectors are more sensitive to rounding errors than are Schur vectors.
We can reformulate orthogonal projection methods in terms of projection op- erators as follows. DefiningPK to be the orthogonal projector onto the subspace
K,then the Galerkin condition (4.17) can be rewritten as
PK(A˜u−λ˜u˜) = 0, λ˜∈C, u˜∈ K or,
PKAu˜= ˜λu ,˜ λ˜∈C, u˜∈ K. (4.21) Note that we have replaced the original problem (4.16) by an eigenvalue problem for the linear transformationPKA|Kwhich is fromKtoK. Another formulation of the above equation is
PKAPKu˜= ˜λu ,˜ λ˜∈C, u˜∈C
n (4.22)
which involves the natural extension
of the linear operatorA′
m=PKA|Kto the whole space. In addition to the eigen- values and eigenvectors ofA′
m, Am has zero as a trivial eigenvalue with every
vector of the orthogonal complement ofK,being an eigenvector. Equation (4.21) will be referred to as the Galerkin approximate problem.
The following proposition examines what happens in the particular case when the subspaceKis invariant underA.
Proposition 4.3 IfKis invariant under Athen every approximate eigenvalue / (right) eigenvector pair obtained from the orthogonal projection method ontoK is exact.
Proof. An approximate eigenpair˜λ,u˜is defined by
PK(Au˜−˜λu˜) = 0,
whereu˜is a nonzero vector inKand˜λ ∈ C. IfKis invariant underAthenAu˜
belongs toKand thereforePKAu˜=Au˜. Then the above equation becomes Au˜−˜λu˜= 0,
showing that the pairλ,˜ u˜is exact.
An important quantity for the convergence properties of projection methods is the distancek(I− PK)uk2 of the exact eigenvectoru,supposed of norm 1, from the subspaceK. This quantity plays a key role in the analysis of projection methods. First, it is clear that the eigenvectorucannot be well approximated from
Kifk(I− PK)uk2is not small because we have ku˜−uk2≥ k(I− PK)uk2.
The fundamental quantityk(I− PK)uk2 can also be interpreted as the sine of the acute angle between the eigenvectoruand the subspaceK. It is also the gap between the spaceKand the linear span ofu. The following theorem establishes an upper bound for the residual norm of the exact eigenpair with respect to the approximate operatorAm,using this angle.
Theorem 4.3 Letγ = kPKA(I− PK)k2. Then the residual norms of the pairs λ,PKuandλ, ufor the linear operatorAmsatisfy respectively
k(Am−λI)PKuk2≤γk(I− PK)uk2 (4.23) k(Am−λI)uk2≤
p
λ2+γ2k(I− P
K)uk2. (4.24)
Proof. For the first inequality we use the definition ofAmto get
k(Am−λI)PKuk2 = kPK(A−λI)(u−(I− PK)u)k2
= kPK(A−λI)(I− PK)uk2
= kPK(A−λI)(I− PK)(I− PK)uk2 ≤ γk(I− PK)uk2.
As for the second inequality we simply notice that
(Am−λI)u = (Am−λI)PKu+ (Am−λI)(I− PK)u
= (Am−λI)PKu−λ(I− PK)u .
Using the previous inequality and the fact that the two vectors on the right hand side are orthogonal to each other we get
k(Am−λI)uk22 = k(Am−λI)PKuk 2 2+|λ|2k(I− PK)uk 2 2 ≤ (γ2+|λ|2)k(I− PK)uk 2 2
which completes the proof.
Note that γ is bounded from above by kAk2. A good approximation can therefore be achieved by the projection method in case the distancek(I− PK)uk2 is small, provided the approximate eigenproblem is well conditioned. Unfortu- nately, in contrast with the Hermitian case the fact that the residual norm is small does not in any way guarantee that the eigenpair is accurate, because of potential difficulties related to the conditioning of the eigenvalue.
If we translate the inequality (4.23) into matrix form by expressing everything in an orthonormal basisV ofK,we would writePK = V V
H and immediately
obtain
k(VHAV −λI)VHuk2≤γk(I−V VH)uk2,
which shows thatλcan be considered as an approximate eigenvalue forBm =
VHAV with residual of the order of(I− P
K)u. If we scale the vectorV Huto
make it of 2-norm unity, and denote the result byyu we can rewrite the above equality as
k(VHAV −λI)yuk2≤γk(I− PK)uk2 kPKuk2
≡γtanθ(u,K).
The above inequality gives a more explicit relation between the residual norm and the angle betweenuand the subspaceK.
4.3.2
The Hermitian Case
The approximate eigenvalues computed from orthogonal projection methods in the particular case where the matrixAis Hermitian, satisfy strong optimality prop- erties which follow from the Min-Max principle and the Courant characterization seen in Chapter 1. These properties follow by observing that (Amx, x)is the same as(Ax, x)whenxruns in the subspaceK. Thus, if we label the eigenvalues decreasingly, i.e.,λ1≥λ2≥. . .≥λn,we have
˜ λ1 = max x∈K,x6=0 (PKAPKx, x) (x, x) =x∈Kmax,x6=0 (PKAx,PKx) (x, x) = max x∈K, x6=0 (Ax, x) (x, x) (4.25)
This is becausePKx=xfor any element inK. Similarly, we can show that ˜ λm= min x∈K,x6=0 (Ax, x) (x, x) .
More generally, we have the following result.
Proposition 4.4 Thei−th largest approximate eigenvalue of a Hermitian matrix
A, obtained from an orthogonal projection
method onto a subspaceK,satisfies,
˜ λi= max S⊆K dim(S)=i min x∈S,x6=0 (Ax, x) (x, x) . (4.26)
As an immediate consequence we obtain the following corollary. Corollary 4.1 Fori= 1,2, . . . , mthe following inequality holds
λi ≥λi˜ . (4.27)
Proof. This is because,
˜ λi= max S⊆K dim(S)=i min x∈S,x6=0 (Ax, x) (x, x) ≤ Smax⊆Cn dim(S)=i min x∈S,x6=0 (Ax, x) (x, x) =λi.
A similar argument based on the Courant characterization results in the fol- lowing theorem.
Theorem 4.4 The approximate eigenvalue˜λiand the corresponding eigenvector
˜
uiare such that
˜ λ1= (Au1,˜ u1˜ ) (˜u1,u1˜ ) =x∈Kmax,x6=0 (Ax, x) (x, x) . and fori >1: ˜ λi= (Aui,˜ ui˜ ) (˜ui,ui˜ ) = x∈Kmax,x6=0, ˜ uH 1x=...=˜uHi−1x=0 (Ax, x) (x, x) (4.28)
One may suspect that the general bounds seen earlier for non-Hermitian ma- trices may be improved for the Hermitian case. This is indeed the case. We begin by proving the following lemma.
Lemma 4.1 LetAbe a Hermitian matrix anduan eigenvector ofAassociated with the eigenvalue λ. Then the Rayleigh quotient µ ≡ µA(PKu)satisfies the
inequality |λ−µ| ≤ kA−λIkk(I− PK)uk 2 2 kPKuk 2 2 . (4.29)
Proof. From the equality
(A−λI)PKu= (A−λI)(u−(I− PK)u) =−(A−λI)(I− PK)u and the fact thatAis Hermitian we get,
|λ−µ| = |((A−λI)PKu,PKu) (PKu,PKu) | = |((A−λI)(I− PK)u,(I− PK)u) (PKu,PKu) | .
The result follows from a direct application of the Cauchy-Schwarz inequality
Assuming as usual that the eigenvalues are labeled decreasingly, and letting
µ1=µA(PKu1),we can get from (4.25) that
0≤λ1−λ1˜ ≤λ1−µ1≤ kA−λ1Ik2k (I− PK)u1k 2 2 kPKu1k 2 2 .
A similar result can be shown for the smallest eigenvalue. We can extend this inequality to the other eigenvalues at the price of a little complication in the equa- tions. In what follows we will denote by Qi˜ the sum of the spectral projectors associated with the approximate eigenvalues˜λ1,˜λ2, . . . ,λi˜−1. For any given vec-
torx, (I −Qi˜ )xwill be the vector obtained by orthogonalizing xagainst the firsti−1approximate eigenvectors. We consider a candidate vector of the form
(I−Qi˜ )PKuiin an attempt to use an argument similar to the one for the largest eigenvalue. This is a vector obtained by projectingui onto the subspaceKand then stripping it off its components in the firsti−1approximate eigenvectors. Lemma 4.2 LetQi˜ be the sum of the spectral projectors associated with the ap- proximate eigenvaluesλ1,˜ λ2, . . . ,˜ λi˜−1and defineµi=µA(xi),where
xi= (I−Qi˜ )PKui k(I−Qi˜ )PKuik2
.
Then
|λi−µi| ≤ kA−λiIk2 kQiuik˜ 2 2+k(I− PK)uik 2 2 k(I−Qi˜ )PKuik 2 2 . (4.30)
Proof. To simplify notation we setα= 1/k(I−Qi˜ )PKuik2. Then we write,
(A−λiI)xi= (A−λiI)(xi−αui) ,
and proceed as in the previous case to get,
Applying the Cauchy-Schwarz inequality to the above equation, we get
|λi−µi|=kA−λiIk2kxi−αuik2 2 .
We can rewritekxi−αuik2 2as kxi−αuik22 = α2k(I−Qi˜ )PKui−uik 2 2 = α2k(I−Qi˜ )(PKui−ui)−Qiuik˜ 2 2.
Using the orthogonality of the two vectors inside the norm bars, this equality becomes kxi−αuik2 2 = α2 k(I−Qi˜ )(PKui−ui)k 2 2+kQiuik˜ 22 ≤ α2k(I− PK)uik 2 2+kQiui˜ k22 .
This establishes the desired result.
The vectorxihas been constructed in such a way that it is orthogonal to all previ- ous approximate eigenvectorsu1, . . . ,˜ ui˜−1. We can therefore exploit the Courant
characterization (4.28) to prove the following result.
Theorem 4.5 LetQi˜ be the sum of the spectral projectors associated with the approximate eigenvaluesλ1,˜ λ2, . . . ,˜ ˜λi−1. Then the error between the i-th exact
and approximate eigenvaluesλiand˜λiis such that
0≤λi−λi˜ ≤ kA−λiIk2kQiui˜ k 2 2+k(I− PK)uik 2 2 k(I−Qi˜ )PKuik 2 2 . (4.31)
Proof. By (4.28) and the fact thatxi belongs toKand is orthogonal to the first
i−1approximate eigenvectors we immediately get
0≤λi−˜λi≤λi−µi.
The result follows from the previous lemma.
We point out that the above result is valid fori= 1, provided we defineQ1˜ = 0. The quantitieskQiui˜ k2represent the cosines of the acute angle betweenuiand the
span of the previous approximate eigenvectors. In the ideal situation this should be zero. In addition, we should mention that the error bound is semi-a-priori, since it will require the knowledge of previous eigenvectors in order to get an idea of the quantitykQiuik2˜ .
We now turn our attention to the eigenvectors.
Theorem 4.6 Letγ = kPKA(I− PK)k2, and consider any eigenvalue λofA
with associated eigenvectoru. Let˜λbe the approximate eigenvalue closest toλ
andδthe distance betweenλand the set of approximate eigenvalues other than
˜
λ. Then there exists an approximate eigenvectoru˜associated with˜λsuch that
sin [θ(u,u˜)]≤ r
1 +γ
2
Proof. K u z ˜ u vcosφ wsinφ θ ω φ
Figure 4.1: Projections of the eigenvectoruontoKand then ontou˜.
Let us define the two vectors
v= PKu kPKuk2
and w= (I− PK)u k(I− PK)uk2
(4.33) and denote byφ the angle betweenuandPKu, as defined bycosφ =kPKuk2. Then, clearly
u=vcosφ+wsinφ,
which, upon multiplying both sides by(A−λI)leads to
(A−λI)v cosφ+ (A−λI)w sinφ= 0.
We now project both sides ontoK,and take the norms of the resulting vector to obtain
kPK(A−λI)vk2 cosφ=kPK(A−λI)wk2 sinφ . (4.34) For the-right-hand side note that
kPK(A−λI)wk2 = kPK(A−λI)(I− PK)wk2
= kPKA(I− PK)wk2≤γ . (4.35) For the left-hand-side, we decomposevfurther as
v= ˜u cosω+z sinω,
in whichu˜is a unit vector from the eigenspace associated with˜λ, zis a unit vector inKthat is orthogonal tou,˜ andω is the acute angle betweenv andu˜. We then obtain,
PK(A−λI)v = PK(A−λI)[cosωu˜+ sinωz]
The eigenvalues of the restriction ofPK(A−λI)to the orthogonal ofu˜areλj˜ −λ, forj= 1,2, . . . m,and˜λj 6= ˜λ. Therefore, sincezis orthogonal tou,˜ we have
kPK(A−λI)zk2≥δ>0. (4.37) The two vectors in the right hand side of (4.36) are orthogonal and by (4.37),
kPK(A−λI)vk 2
2 = |λ˜−λ|2cos2ω+ sin2ωkPK(A−λI)zk 2 2
≥ δ2 sin2ω (4.38)
To complete the proof we refer to Figure 4.1. The projection ofuontou˜is the projection ontou˜of the projection ofuontoK. Its length iscosφcosωand as a result the sine of the angleθbetweenuandu˜is given by
sin2θ = 1−cos2φ cos2ω
= 1−cos2φ (1−sin2ω)
= sin2φ+ sin2ω cos2φ . (4.39) Combining (4.34), (4.35), (4.38) we obtain that
sinω cosφ≤ γδ sinφ
which together with (4.39) yields the desired result.
This is a rather remarkable result given that it is so general. It tells us among other things that the only condition we need in order to guarantee that a projection method will deliver a good approximation in the Hermitian case is that the angle between the exact eigenvector and the subspaceKbe sufficiently small.
As a consequence of the above result we can establish bounds on eigenval- ues that are somewhat simpler than those of Theorem 4.5. This results from the following proposition.
Proposition 4.5 The eigenvaluesλand˜λin Theorem 4.6 are such that
|λ−λ˜| ≤ kA−λIk2sin2θ(u,u˜). (4.40)
Proof. We start with the simple observation thatλ˜−λ= ((A−λI)˜u,u˜). Letting
α= (u,u˜) = cosθ(u,u˜)we can write
˜
λ−λ= ((A−λI)(˜u−αu),u˜) = ((A−λI)(˜u−αu),u˜−αu)
The result follows immediatly by taking absolute values, exploiting the Cauchy- Schwarz inequality, and observing thatku˜−αuk2= sinθ(u,u˜).