Bounds Based on Eigenvalue Inclusion Sets

5.4 Acceleration Techniques

6.1.1 Bounds Based on Eigenvalue Inclusion Sets

where κ(T ) := kT k kT⁻¹k is the condition number of T , p^(`) denotes the `th derivative of p and the estimate kMk ≤ n maxi,j|mi,j| has been used. We always assume that the columns of T are chosen to have unit norm. Assuming A is diagonalizable, which we shall do in the sequel without (essential) loss of generality, (6.1) simplifies to

kp(A)k ≤ κ(T ) kpkΛ(A), (6.2)

where Λ(A) is the spectrum of A and we shall denote bykfkΩthe maximum of a continuous function f on a compact set Ω⊂ C.

Another approach makes use of the representation of p(A) as a contour integral (see Kato (1980, Section I.5.6))

p(A) = 1 2πi

p(ζ)(ζI − A)⁻¹ dζ,

where Γ is a Jordan curve (or a collection of Jordan curves) which contains Λ(A) in its interior. This approach has the advantage of readily generalizing to bounded operators on an infinite dimensional space. Taking norms yields

kp(A)k ≤ `(Γ) 2π max

ζ∈Γ k(ζI − A)⁻¹k kpkΓ, (6.3) where `(Γ) denotes the arc length of Γ.

A third approach for bounding kp(A)k based on the Schur decomposition of A may be found in Golub & van Loan (1996, Chapter 11).

A vast simplification occurs when A is a normal, and hence unitarily similar to a diagonal matrix: In this case κ(T ) = 1 and (6.2) further simplifies to

kp(A)k = kpkΛ(A). (6.4)

The quantity κ(T ) is actually a measure of how far A departs from normality—in particu-lar, κ(T ) = 1 if and only if A is normal. In the normal case both the contour integral and the Schur form bounds reduce to (6.4), whereas in the general case they contain measures of the non-normality of A other than κ(T ).

6.1.1 Bounds Based on Eigenvalue Inclusion Sets

For the MR residual vectors (6.2) together with the minimization property of the MR approximation result in

krm^MRk

kr⁰k ≤ κ(T ) kp^MRm kΛ(A) ≤ κ(T ) min

p∈Pm

p(0)=1

kpkΛ(A), m = 1, 2, . . . , L. (6.5)

If A is normal, i.e., if κ(T ) = 1, the right hand side of (6.5) is a standard problem of approximation theory: Among all polynomials p of degree at most m with p(0) = 1 determine p^∗_m which deviates least from zero on a compact subset Ω of the complex plane (which in our case happens to be the discrete set Λ(A)), i.e.,

kp^∗mkΩ = min

p∈Pm

p(0)=1

kpkΩ. (6.6)

This problem has a solution which is uniquely determined if 06∈ Ω and if the cardinality of Ω exceeds the polynomial degree m. Asymptotically, the numberskp^∗mkΩ behave as γ^m as m→ ∞, and the number

γ := lim

m→∞kp^∗mk^1/mΩ

is called the asymptotic convergence factor of Ω. It can be shown that γ < 1 if 06∈ Ω and if Ω does not surround the origin. We observe that this asymptotic behavior implies that bounds of this type lead to linear convergence rate bounds.

If A is normal it can be shown (Greenbaum & Trefethen 1994, Greenbaum & Gurvits 1994, Joubert 1994b) that the bound (6.5) is sharp in the sense that for any m, 1 ≤ m ≤ L, there exists an initial residual r₀, which may depend on m, such that equality is attained.

In this sense the convergence behavior of the MR method is completely determined by the eigenvalue distribution of A when A is normal.

If A is nonnormal and κ(T ) is not too large, one can expect that this statement, in essence, will still hold, although in this case (6.5) is no longer sharp, as counterexamples by Faber, Joubert, Knill & Manteuffel (1996) and Toh (1997) demonstrate. Greenbaum

& Trefethen (1994) have shown that the (matrix) polynomial approximation problem of minimizingkp(A)k subject to p(0) = 1 is also uniquely solvable in the general, nonnormal case. An algorithm for computing these optimal polynomials is given by Toh & Trefethen (1998).

An immediate consequence of (6.5) is that krm^MRk

kr0k ≤ κ(T ) min

p∈Pm

p(0)=1

kpkΩ, m = 1, 2, . . . , L (6.7)

holds for any compact set Ω ⊂ C which contains the eigenvalues of A. This is the key to most of the known bounds for MR methods with respect to Krylov spaces: Select a compact set Ω, Λ(A) ⊆ Ω, and a suitable polynomial p ∈ Pm with p(0) = 1, then κ(T )kpkΩ (or kpkΩ in the normal case) is an upper bound for krm^MRk/kr0k. Probably the best known example is the standard estimate for the conjugate gradient method, which is an OR method with respect to (·, ·) but an MR method with respect to the inner product (·, ·)A⁻¹ := (A⁻¹·, ·) if A is Hermitian and positive definite. If k · kA := (A·, ·)^1/2 and

This follows directly from (6.7) if the set Ω is chosen to be the smallest closed interval [λ_min(A), λ_max(A)] which contains Λ(A) and p to be a suitably scaled and transformed

Chebyshev polynomial of degree m. Analogous bounds are available for more complicated spectral inclusion sets Ω⊂ C, but this requires tools from conformal mapping and complex approximation theory in order to bound kp^∗mkΩ of (6.6) (cf. Eiermann, Niethammer &

Varga (1985), Driscoll, Toh & Trefethen (1998)). As a more recent example we quote from Liesen (1998) that, for Ω = Ω(φ) := {ζ = e^iθ : φ/2≤ θ ≤ 2π − φ/2}, there holds

kp^∗mkΩ ≤ 4

γ^m− 1, where γ := 1

cos(φ/4). (6.8)

The determination of suitable spectral inclusion sets is a difficult task in itself. For Krylov subspace methods, the same Krylov space used as the correction space for the linear system may also be used as a Krylov projection method for finding approximate eigenvalues, from which inclusion sets may then be constructed. This is the approach used by so-called hybrid methods, in which spectral information is gathered in an initial phase of a Krylov subspace iteration. Once this information is deemed sufficient—after, say, m steps—a spectral inclusion set Ω is constructed and a residual polynomial p^∗ which solves (6.6) is used to generate further iterates which satisfy r_m+j = (p(A))^jr_m. For examples of hybrid methods see Nachtigal, Reichel & Trefethen (1992) and Starke & Varga (1993) as well as the references therein.

All bounds for krm^MRk described above, i.e., (6.5) and (6.7), are based on spectral information available for A. There are, however, matrices which show that, in general, the spectrum has no influence on MR convergence behavior. Indeed, Greenbaum, Strakoˇs

& Ptak (1996) show that for any nonincreasing finite sequence of positive real numbers ρ₀ ≥ ρ1 ≥ · · · ≥ ρn−1 and any choice of (not necessarily distinct) nonzero complex numbers λ1, λ2, . . . , λn, one can construct a matrix A ∈ Cⁿ^×n and an initial residual r0

with Λ(A) = {λ1, λ₂, . . . , λ_n} and krm^MRk = ρm (m = 0, 1, . . . , n− 1). We illustrate this result by one of their striking examples: Any matrix A in Frobenius form,

A = be arbitrarily prescribed. If we choose b and r₀ such that r₀ = u₁ is the first unit vector, then, for m = 1, 2, . . . , n − 1, the approximation space AKm(A, r₀) is the span of the unit vectors u₂, u₃, . . . , u_m. The best approximation to r₀ from this space is obviously the null vector leading to kr0k = kr1^MRk = · · · = kr_n−1^MRk = 1 independently of the chosen spectrum. In general it is therefore impossible to predict the convergence behavior of the MR method (and of any other Krylov subspace method) on the basis of the eigen-value distribution of A alone. Although this fact has been emphasized in several recent papers it is still a widespread but nonetheless incorrect belief that spectral properties of the coefficient matrix (i.e., without any additional assumptions on its departure from normality) determine the speed of convergence of Krylov subspace methods. If A is far from normal, i.e., if κ(T ) is huge, the bound (6.5), although still valid, becomes useless because it vastly overestimates krm^MRk/kr0k, which we know to be always less than one.

The common interpretation of (6.5) in the sense that two matrices with identical spectra produce a similar MR behavior is incorrect, as the above example vividly illustrates. A systematic investigation of the class of matrices with identical MR behavior (given the same r₀) is undertaken by Greenbaum & Strakoˇs (1994). An example of a problem arising in applications where the spectrum is misleading for the prediction of MR behavior may be found in Ernst (2000).

In document Minimal and orthogonal residual methods and their generalizations for solving linear operator equations (Page 114-117)