The General Eigenvalue Problem - Computation of Eigenvalues and Eigenvectors

2.9 Computation of Eigenvalues and Eigenvectors

2.9.1 The General Eigenvalue Problem

Almost all eigenvalue algorithms have in common that they consist in general of two phases. The

rst one is usually a preprocessing phase which can be carried out in a nite number of steps and which transforms the original matrix into a more structured form while preserving some or most of its properties. It is used primarily to speed up the following computation. The second phase, which would take theoretically an innite number of steps but is stopped after the result has converged to near machine accuracy, reveals the eigenvalues.

Before we can start with presenting the actual algorithms we will begin with some standard results from eigenvalue perturbation theory.

Theorem 2.9.1 (Bauer-Fike). Let A ∈ Matm(C) be diagonalisable, let δA ∈ Matm(C) be arbitrary, and let 1 ≤ p ≤ ∞. Furthermore, let V DV⁻¹ be an eigendecomposition of A, where V ∈ GL_m(C) and D ∈ Matm(C) is a diagonal matrix (compare Denition 2.5.18). If µ is an eigenvalue of A + δA, then there exists an eigenvalue ν of A such that the inequality

|ν − µ| ≤ κ_p(V ) kδAk_p= kV k_pkV⁻¹k_pkδAk_p holds.

Proof. Let µ ∈ Λ (A + δA). We will rst consider the case where also µ ∈ Λ (A). Here we can choose ν = µ so the theorem is trivially true. From now on let us assume that µ /∈ Λ (A). This means that det (D − µIm) 6= 0 and det (A + δA − µIm) = 0. Then

0 = det (A + δA − µI_m) = det(V⁻¹) det (A + δA − µI_m) det (V )

= det D − V⁻¹δAV − µIm = det (D − µI_m) det

(D − µIm)⁻¹V⁻¹δAV + Im

. This means that det

(D − µI_m)⁻¹V⁻¹δAV + I_m

= 0 must hold. So

−1 ∈ Λ

(D − µIm)⁻¹V⁻¹δAV

. By Theorem 2.3.65, we know that

1 ≤ k (D − µI_m)⁻¹V⁻¹δAV k_p ≤ k (D − µI_m)⁻¹k_pkV⁻¹k_pkδAk_pkV k_p

= k (D − µI_m)⁻¹k_pkδAk_pκ_p(V ) .

Now, since (D − µIm)⁻¹ is a diagonal matrix, it follows easily from the denition of the induced matrix normk·k_p that

k (D − µI_m)⁻¹k_p = maxn

k (D − µI_m)⁻¹xk_p

x ∈ Cⁿ with kxk_p= 1o

= max

v∈Λ(A)

|ν − µ|

= 1

v∈Λ(A)min |ν − µ|. Therefore, we can conclude that min

v∈Λ(A)

|ν − µ| ≤ κ_p(V ) kδAk_p, which proves the theorem.

Example 2.9.2. Let us consider again Example 2.7.13 and let us briey recall the setting. The eigenvalues of the matrix

A = 1 1000

0.001 1

are (0, 2) and the eigenvalues of the matrix

A =˜ 1 1000

0 1

are (1, 1) .

We note that matrix A is diagonalisable and we can thus apply Theorem 2.9.1. Furthermore A = A + δA˜ with δA = 0 0

−0.001 0

. We compute an eigendecomposition of Ã such that V DV⁻¹ = Ã. Let ν be an eigenvalue of Ã then according to the theorem of Bauer-Fike there exists an eigenvalue µ of A, such that the bound |ν − µ| ≤ 1 holds. This result is in line with the actual eigenvalues of A and Ã.

Remark 2.9.3. We know by Theorem 2.5.22 that a normal matrix A can be unitarily diagonal-ised. This means that in the theorem we can choose V satisfying kV k₂ =

V⁻¹

2 = κ₂(V ) = 1 and consequently we obtain

|λ − µ| ≤ kδAk₂.

Following [6, Subsection 7.2.4], we now present a theorem which concerns the sensitivity of eigen-spaces and eigenvectors to perturbations of the input data. This result will play an important role in Chapter 4 when we analyse the stability of the computed solutions of the ABM and the extended ABM algorithm.

Denition 2.9.4. Let A ∈ Matm(C) and B ∈ Matn(C). Then we dene the separation between both matrices as

sep (A, B) = min

kAX − XBk_F kXk_F with X ∈ Matm,n(C) \ {0m,n}.

Denition 2.9.5. Let S1 and S2 be linear subspaces of C^m such that dim (S1) = dim (S₂). Let furthermore P1 be the orthogonal projector onto S1 and let P2 be the orthogonal projector onto S2. Then we let

dist (S₁, S₂) = kP₁− P₂k₂ and call it the distance between S1 and S2.

Theorem 2.9.6. Let A ∈ Matm(C) and let

Q^∗AQ = T11 T12

0 T₂₂

be a Schur decomposition of A with Q =

Q₁ Q₂

(compare Denition 2.5.9). The in-volved matrices have the following dimensions: Q1 ∈ Mat_m,r(C), Q2 ∈ Mat_m,m−r(C), T11 ∈ Mat_r,r(C), T12 ∈ Mat_r,m−r(C), T22 ∈ Mat_m−r,m−r(C). Now let δA ∈ Matm(C) be an arbit-rary matrix, which we partition via Q in the same way as A such that we obtain

Q^∗δAQ = E₁₁ E₁₂ E₂₁ E₂₂

! . If sep (T11, T₂₂) > 0 and

kδAk₂

1 + 5 kT12k_E sep (T₁₁, T₂₂)

≤ sep (T11, T22) 5 then there exists P ∈ Matm−r,r(C) with

kP k₂ ≤ 4 kE21k₂ sep (T₁₁, T₂₂)

such that the columns of ˆQ1 = (Q1+ Q2P ) (Ir+ P^∗P )⁻¹² form an orthonormal basis for an invariant subspace of A + δA. Additionally

dist

im(Q1), im( ˆQ1)

≤ 4 kE₂₁k₂ sep (T11, T22)

holds. Note, that the matrix (Ir+ P^∗P )⁻¹² is the inverse of the square root of the symmetric positive denite matrix Ir+ P^∗P (see [9, Subsection 4.2.10]).

Proof. Compare [6, Theorem 7.2.4 and Corollary 7.2.5].

In case a subspace is one-dimensional, it is possible to give the following more specialised result.

Corollary 2.9.7. If we let r = 1 and T11 = λ in the setting of Theorem 2.9.6 we obtain the inequality

dist (im (q1) , im (ˆq1)) ≤ 4 kE₂₁k₂ σmin(T22− λI_m−1).

Proof. The claim follows from Theorem 2.9.6 together with the observation that sep (T11, T22) = min

x6=0m−1

kT₁₁x − xT₂₂k_F

kxk_F = min

x6=0m−1

kx (λI_m−1− T₂₂)k_F kxk_F

= min

x6=0m−1

kx (T₂₂− λI_m−1)k_F

kxk_F .

As x is a vector we can assume w.l.o.g. that kxk_F = 1, so we obtain sep (T₁₁, T₂₂) = min

x6=0m−1

kx (T₂₂− λI_m−1)k_F.

If x is a left-singular vector of T22−λI_m−1 associated with a minimal singular value the expression kx (T₂₂− λI_m−1)k_F becomes minimal and we arrive at

sep (T₁₁, T₂₂) = σ_min(T₂₂− λI_m−1) .

This result shows that the stability of computing eigenspaces and -vectors with respect to per-turbations in the input data depends mostly on the initial separation of the subspaces together with the actual norm of the perturbation.

We will not discuss the algorithms which can be used to compute the eigenvalues and/or -vectors of a general matrix in detail. Of course they can also be applied to Hermitian matrices, however, more ecient algorithms exist for this special case. The most widely used algorithm today if all eigenvalues (and eigenvectors) of a dense unstructured matrix are desired is the QR algorithm and the Divide and Conquer algorithm if the input matrix is Hermitian.

The QR Algorithm

The basic idea behind the QR algorithm is to use the QR decomposition of a matrix A = QR and to multiply the factors in reverse order RQ. This has the consequence that the entries below the diagonal decrease normwise while the matrix product RQ remains similar to A. The procedure is repeated until the entries below the diagonal are below a specied tolerance. Thus a Schur decomposition of the matrix is computed. To accelerate the process the input matrix is

rst transformed via similarity transformations to so-called upper Hessenberg form which is as close to upper triangular form as can be achieved with a nite number of computation steps.

Denition 2.9.8. [Hessenberg form]

A square matrix A ∈ Matm(C) is said to be in upper Hessenberg form if it has only zero entries below the rst subdiagonal. Consequently, we say a square matrix A ∈ Matm(C) is in lower Hessenberg form if it has only zero entries above the rst superdiagonal.

Algorithm 7: Householder Hessenberg reduction Input: A square matrix A ∈ Matm(C)

Output: A matrix in upper Hessenberg form which is unitarily similar to A and the basic reectors

1 for i := 1 to m-2 do

// Householder reflectors are computed via Algorithm 3

2 v_i:= householder(Ai+1:m,i);

3 if vi6= 0_m−i then

4 Ai+1:m,i:m:=

Im− 2^v_vⁱ∗^v^∗ⁱ ivi

Ai+1:m,i:m;

5 A1:m,i+1:m:= A1:m,i+1:m

Im− 2^v_vⁱ∗^v^∗ⁱ ivi ;

6 end

7 end

8 return (A, v1, ..., v_m−2);

Theorem 2.9.9. Given a matrix A ∈ Matm(C), the algorithm returns a matrix in upper Hessen-berg form which is unitarily similar to A. Additionally, the algorithm is backward stable.

Proof. This follows directly from the properties of the Householder reectors which are construc-ted in such a way that they introduce zeros below the rst subdiagonal when applied to A in line 5. The already introduced zeros remain untouched by line 6, which makes sure that the transformation is in fact a similarity transformation. The algorithm is backward stable because all applied similarity transformations are unitary.

Now that we know how to transform a given matrix to upper Hessenberg form it is possible to state a basic version of the QR algorithm.

Algorithm 8: Basic QR algorithm

Input: A square matrix A ∈ Matm(C), error tolerance ε ∈ R⁺ Output: The m eigenvalues of A

1 [H, U ] := HessenbergReduction(A) (e.g. via Algorithm 7);

2 while P^m_i=1Pm

j=i+1|H_j,i| > ε do

3 [Q, R] := QRDecomposition(H) (e.g. via Algorithm 4);

4 H := RQ;

5 end

6 return (H1,1, ..., Hm,m);

Theorem 2.9.10. Let A ∈ Matm(C) be such that A has no two distinct eigenvalues with equal absolute value. The Basic QR algorithm computes a Schur decomposition of A such that A = (U Q) H (U Q)^∗, where Q ∈ Matm(C) and U ∈ Matm(C) are unitary matrices and H ∈ Matm(C) is an upper triangular matrix. Thus H is (unitarily) similar to A and reveals the eigenvalues of A on its diagonal.

Proof. We will only sketch why this algorithm produces a series of matrices Hi which essentially

converges (i.e. the elements on the diagonal converge while the super-diagonal elements may dier by units in each iteration) to an upper triangular matrix which is similar to A. A full proof can be found in [13, Section 11]. First the matrix A is transformed into Hessenberg form;

the resulting matrix H is similar to A. The central steps of the algorithm are computing the QR decomposition of H and then multiplying both factors in reverse order. Each computed matrix H is unitarily similar to A as Hi = Ri−1Qi−1 = Q^∗_i−1Hi−1Qi−1 with Hi,Qi, and Ri

denoting the values of H ,Q, and R during the i-th iteration of the while loop.

Theorem 2.9.11. The Basic QR algorithm is backward stable.

Proof. The claim follows essentially from the fact that only unitary transformations are used to compute the Hessenberg form of the input matrix, followed by a sequence of unitary similarity transformations to compute the solution. A more detailed analysis can, for instance, be found in [6, Subsection 7.5.6].

In practice more advanced versions of the algorithm are used which achieve faster convergence by applying shifts during each step of the computation and by only implicitly computing the QR decomposition in each iteration step. One such algorithm, the Francis QR algorithm, is discussed and analysed in [6, Subsection 7.5.6]. The algorithm requires about 10m³ ops if only the eigenvalues are desired and the unitary transformations are not accumulated, otherwise it requires about 25m³ ops. Additionally, state of the art implementations of the QR algorithm (e.g. the one in LAPACK [17]) guarantee convergence to a Schur decomposition for essentially all input matrices A ∈ Matm(C).

Power Iteration

The technique of power iteration is in itself not often applied directly, but the ideas underlying it form the basis for more advanced techniques like inverse iteration. It is capable to compute the eigenvector associated with the eigenvalue of a matrix A which has the largest absolute value.

Let us denote the eigenvalues of a matrix A ∈ Matm(C) by λ1, ..., λm and let us further assume without loss of generality that |λ1| ≥ |λ₂| ≥ ... ≥ |λ_m|. The algorithm only works properly if |λ1| is reasonably larger than |λ2|.

Algorithm 9: Power Iteration

Input: A matrix A ∈ Matm(C) , n ∈ N

Output: An eigenvector estimate corresponding to the eigenvalue λ1 of A

1 v⁽⁰⁾ := random vector in C^m\{0_m}; guess has components in the direction of q1, where q1 is the eigenvector associated with λ1, then this algorithm produces a sequence v⁽ⁱ⁾ of eigenvector estimates for q1. The following bound for the iterates v⁽ⁱ⁾ can be established:

Proof. We only present the proof for the case that A is diagonalisable as this is the situation most relevant to us. A proof for general matrices can be found in [16, Theorem 4.1]. Let q1, ..., q_m be an orthonormal basis of eigenvectors with corresponding eigenvalues λ1, ..., λm for A. Let us additionally assume that |λ1| > |λ₂| ≥ ... ≥ |λ_m|. Then we can express v⁽⁰⁾ as a linear combination of those basis vectors such that

v⁽⁰⁾=

k=1

ckqk

with ck∈ C. Now for some constant ni ∈ R which arises because of the normalisation in each iteration we obtain

A direct consequence of this equation is that our eigenvector estimate will converge linearly to a multiple of q1 depending on the ratio

Remark 2.9.13. For suciently large values of i we obtain

v⁽ⁱ⁾−

e^iθⁱq₁ ∈ O

λ₂ λ₁

where e^iθⁱ ≈

λ1

|λ1|

. For instance, if λ1 is real and positive this means that v⁽ⁱ⁾ converges to q1. Remark 2.9.14. If A ∈ Matm(R) the condition that |λ1| > |λ₂| implies that λ1 ∈ R.

Proof. Let A ∈ Matm(R) and let us assume that λ1 ∈ C \ R is a complex eigenvalue of A.

Then we know that λ2 = ¯λ1 has to be another complex eigenvalue of A. This would imply that

|λ₁| = |λ₂| which contradicts our assumptions.

More details about the algorithm and applications can, for example, be found in [5, Lecture 27]

or in [6, Section 8.2.1].

Inverse Iteration

Power iteration has two signicant shortcomings. One disadvantage is that it is only capable of

nding the eigenvector corresponding to the largest eigenvalue and additionally its convergence rate largely depends on the ratio of |λ1/λ2|. We will now discuss how Algorithm 9 can be modied to provide an eective method to determine the eigenvectors of a matrix if good estimates of the eigenvalues are known in advance. This method is particularly useful if only a subset of the eigenvectors is needed. This could, for instance, be the eigenvector corresponding to the smallest eigenvalue.

Let A ∈ Matm(C) be nonsingular and let λ ∈ C be an eigenvalue estimate for ˜λ ∈ Λ (A) such that λ /∈ Λ (A). First we observe that det (A − λIm) 6= 0, which implies that the matrix A−λIm

is invertible. Then the eigenvectors of A associated with ˜λ are the same as the eigenvectors of (A − λIm)⁻¹ which correspond to the eigenvalue (˜λ − λ)⁻¹ of (A − λIm)⁻¹. In order to prove this claim let x ∈ C^m be an eigenvector of A which is associated with the eigenvalue ˜λ. Then the equation Ax = ˜λx holds and additionally

(A − λI_m)⁻¹(A − λI_m) x = x ⇐⇒

(A − λI_m)⁻¹(˜λ − λ)x = x ⇐⇒

(A − λI_m)⁻¹x = (˜λ − λ)⁻¹x.

As we know, the rate of convergence for the power iteration algorithm is about

λ2

λ1

. If we choose λ to be an eigenvalue estimate the gap between the largest and second largest eigenvalue of (A − λIm)⁻¹ will broaden tremendously and in this way accelerate convergence. So the essential idea behind inverse iteration is to apply power iteration to the inverse of A − λIm. Note that the matrix (A − λIm)⁻¹ is not computed explicitly. We rather solve a system of linear equations in each round which is more economic. Obviously, the matrix (A − λIm)⁻¹ becomes more ill-conditioned the closer λ gets to an exact eigenvalue of A. Fortunately though, the

occurring error has a dominant component in the direction of the true eigenvector. A more detailed explanation of this behaviour can be found in [5, Lecture 27 and Algorithm 27.2].

Algorithm 10: Inverse Iteration

Input: A matrix A ∈ Matm(C), an eigenvalue estimate λ ∈ C of A, n ∈ N Output: An eigenvector estimate corresponding to the eigenvalue λ of A

1 v⁽⁰⁾ := random vector in C^m\{0_m};

2 v⁽⁰⁾ := k^vv⁽⁰⁾⁽⁰⁾k ;

3 for i := 1 to n do

4 Solve (A − λI) v⁽ⁱ⁾ = v⁽ⁱ⁻¹⁾ for v⁽ⁱ⁾;

5 v⁽ⁱ⁾ := ^v⁽ⁱ⁾ k^v⁽ⁱ⁾k ;

6 end

7 return v⁽ⁿ⁾;

In practice the input matrix is transformed to Hessenberg form rst, for example, via Algorithm 7.

As this is a common preprocessing technique used also in the QR algorithm, most of the time the Hessenberg form is readily available without additional cost.

If the input matrix is in Hessenberg form the cost of inverse iteration is in O m²

per eigenvector, which is essentially the cost for solving the system of linear equations in line 4 (compare [6, Section 7.6.1]).

Reduction to (upper) bidiagonal form

Before we can start, we will rst state what we mean by a bidiagonal matrix.

Denition 2.9.15. [Bidiagonal matrix]

We say a matrix A ∈ Matm(C) is upper bidiagonal if all its entries below the diagonal and above the rst superdiagonal are zero. Consequently, a matrix A ∈ Matm(C) is called lower bidiagonal if all its entries above the diagonal and below the rst subdiagonal are zero.

The following algorithm reduces a matrix to upper bidiagonal form and is used, for instance, as a rst preprocessing step inside the Golub-Kahan algorithm for the computation of a SVD of a matrix. Given a matrix A ∈ Matm,n(C), the algorithm computes a series of 2n − 2 Householder transformations U1,V1,U2,V2,...,Vn−2,Un−1,Un and applies them alternately to the left and right-hand side of A such that U_B^∗AV_B = (U₁...U_n)^∗A (V₁...V_n−2) is upper bidiagonal, with Ui∈ Mat_m(C) and Vi ∈ Mat_n(C).

Algorithm 11: Golub-Kahan Bidiagonalisation Input: A matrix A ∈ Matm,n(C)

Output: The matrix A in upper bidiagonal form and the generating Householder reectors

1 for i := 1 to n do

2 u_i := householder(Ai:m,i) (e.g. via Algorithm 3);

3 if ui6= 0_m−i+1 then Ai:m,i:n :=

Im−i+1− 2^u_uⁱ∗^u^∗ⁱ iui

Ai:m,i:n ;

4 if i < n − 1 then

5 v_i := householder

A^∗_i,i+1:n ;

6 if vi 6= 0_n−1 then Ai:m,i+1:n := A_i:m,i+1:n

I_n−i− 2^v_vⁱ∗^vⁱ^∗ ivi ;

7 end

8 end

9 return (A, u1, ..., un, v1, ..., vn−2);

In document Berechnung und Anwendungen Approximativer Randbasen (Page 63-72)