Optimal Truncation - Acceleration Techniques

5.4 Acceleration Techniques

5.4.4 Optimal Truncation

The acceleration techniques of Sections 5.4.1 and 5.4.2 are based on restarting an MR iteration once the correction space has reached a given dimension m, and they attempt to compensate for the attendant loss of information by augmenting or preconditioning. The methods discussed in this section are related to the former in that they also attempt to retain information contained in the current correction space—in this case orthogonality constraints—which is deemed most useful for convergence.

In place of restarting, the basic scheme underlying this class of methods is a truncated MR iteration, in which, as soon as the correction space has reached a maximal dimension m, only a subset of the most recent m basis vectors of the correction space is retained, or equivalently, one or more of these basis vectors is periodically discarded during the iteration. Such a scheme for selectively discarding subspaces rather than individual basis vectors is proposed by de Sturler (1999) as a truncation scheme for his GCRO algorithm.

This selection process, however, does not rely on spectral or invariant subspace informa-tion, but rather on angles between subspaces.

To discard a subspace of dimension `, de Sturler’s subspace selection scheme compares two approximation spaces W1 and W2 associated with correction spaces C1 and C2. It assumes the availability of an orthonormal basis Wm⁽¹⁾ = [w₁⁽¹⁾, . . . , wm⁽¹⁾] ofW1, an arbitrary basis cW_k⁽²⁾ = [wb₁⁽²⁾, . . . ,wb_k⁽²⁾] ofW2 as well as a factorization

(I_k− Wm⁽¹⁾[W_m⁽¹⁾]^∗)cW_k⁽²⁾ = Z_kR

with Z_k = [z₁, . . . , z_k], Z_k^∗Z_k = I_k and R ∈ C^k×k nonsingular and upper triangular. After computing the singular value decomposition

[W_m⁽¹⁾]^∗cW_k⁽²⁾

Z_k^∗cW_k⁽²⁾−1

= XΞ bY^H (5.6)

the subspace ofW1to be retained is chosen as that spanned by the vectors Wm⁽¹⁾x1 · · · x`, where the vectors x_j are the left singular vectors associated with the ` largest singular values. The following proposition relates this choice to the results of Section 3.4.

Theorem 5.4.3. With the above notation under the assumption W1 ∩ W² = {0}, the singular values appearing in (5.6) are the cotangents of the canonical angles between the spaces W1 and W2.

Proof. Let W_k⁽²⁾ denote an orthonormal basis of W2 such that cW_k⁽²⁾ = W_k⁽²⁾S with a nonsingular matrix S ∈ C^k×k. Then the cosines of the canonical angles between W1

and W2 are the singular values of [Wm⁽¹⁾]^∗W_k⁽²⁾, and we write the associated singular value decomposition as [Wm⁽¹⁾]^∗W_k⁽²⁾ = XΓ Y^H with a diagonal matrix Γ ∈ R^m^×kand the unitary matrices X ∈ C^m×m and Y ∈ C^k×k. From

ZkR =

Ik− Wm⁽¹⁾[W_m⁽¹⁾]^∗

Wc_k⁽²⁾ =

Ik− (Wm⁽¹⁾X)(W_m⁽¹⁾X)^∗

(W_k⁽²⁾Y )Y^HS

=(W_k⁽²⁾Y )− (Wm⁽¹⁾X)ΓY^HS

we obtain Zk = (W_k⁽²⁾Y ) − (W^m⁽¹⁾X)ΓY^HSR⁻¹ and therefore, defining the diagonal matrix Σ ∈ R^k×k by I_k− Γ^HΓ = Σ², there results

I_k= Z_k^∗Z_k= (SR⁻¹)^HY Σ²Y^H(SR⁻¹) = (ΣY^HSR⁻¹)^HΣY^HSR⁻¹

which reveals that the k × k matrix ΣY^HSR⁻¹ is also unitary. Note that, in view of W1∩ W2 ={0}, none of the cosines in Γ are one, hence Σ is nonsingular. Now, inserting

[W_m⁽¹⁾]^∗cW_k⁽²⁾= [W_m⁽¹⁾]^∗W_k⁽²⁾S = XΓ Y^HS

Z_k^∗cW_k⁽²⁾= (SR⁻¹)^HY(W_k⁽²⁾Y )^∗− Γ^H(W_m⁽¹⁾X)^∗W_k⁽²⁾S = (SR⁻¹)^HY Σ²Y^HS can express the singular value decomposition (5.6) as

[W_m⁽¹⁾]^∗Wc_k⁽²⁾

Z_k^∗Wc_k⁽²⁾−1

= X(ΓΣ⁻¹)(ΣY^HSR⁻¹),

which reveals that its singular values are indeed the cotangents of the angles between W1

and W2.

The proof also shows that the left singular vectors of (5.6) coincide with those of [Wm⁽¹⁾]^∗W_k⁽²⁾, hence the selection scheme discards that subspace of W1 which lies at the largest canonical angles with W2. As shown in Section 3.4, this choice yields the greatest possible residual reduction when replacing the approximation space W1+W2 by fW1+W2

with fW1 a subspace of W1 of dimension dimW1− k.

De Sturler applies this scheme to a GMRES cycle of length m in order to determine which directions of the s-dimensional Krylov subspaceKs(A, r₀), s < m, are most impor-tant for convergence in the sense that, were one to maintain orthogonality against these directions upon restarting after the first s steps, this would most accelerate the resid-ual reduction. The subspaces to be compared are thus AKs(A, r₀) and AKm−s(A, r_s).

The subspace comparison in this case is particularly inexpensive, as both spaces lie in Km(A, r0), for which the Arnoldi process has computed an orthonormal basis. Hence, the angle computations can be performed in the coordinate space with respect to this basis, and therefore involve only small matrices.

This idea is applied to the inner GMRES iteration of GCRO in order to identify a subspace of the inner approximation space as important for convergence, and to select an orthonormal basis of that subspace—rather than just the residual update as in GCRO—to be added to the outer approximation space. Similarly, de Sturler uses this device in order to truncate the outer approximation space by comparing the angles between inner the and outer approximation spaces. For details of this truncated GCRO algorithm, known as GCROT, we refer to (de Sturler 1999).

Chapter 6 Convergence

In this chapter we turn to the question of convergence of Krylov subspace MR and OR methods for solving linear equations. In the finite-dimensional case convergence in a finite number of steps is assured due to the finite termination property of MR and OR methods.

The interesting question here is how fast the errors or residuals decrease below a tolerance sufficiently small for a specific application.

We begin with an overview of the standard techniques for deriving residual and error bounds, which are based on eigenvalues, pseudoeigenvalues, or the field of values of A. We then proceed to show how the angles formulation of Chapter 2 allows a simple derivation of the most important bounds which have been derived in the literature, and give an example where a bound on the angles allows us to obtain superlinear convergence. We conclude with a discussion of matrices which generate the same MR and OR approximations and the role of singular values.

6.1 Convergence Bounds Based on Polynomials

In Section 4.3 we saw that the residual of any approximate solution of (1.1) taken from the shifted Krylov space x₀+Km(A, r₀) has the representation

r = p(A)r₀, p∈ Pm, p(0) = 1.

This link between residuals and polynomials has inspired the search for bounds on the residual norm which are derived from analytic properties of the associated polynomials as functions defined on the complex plane. The influence of the initial residual is usu-ally suppressed in view of kp(A)r0k ≤ kp(A)k kr0k, in which case the issue simplifies to bounding kp(A)k for all p ∈ Pm normalized by p(0) = 1. Since the error e := A⁻¹b− x possesses the identical representation e = p(A)e₀in terms of the initial error e₀, bounding kp(A)k immediately also leads to bounds on the error reduction, although a polynomial which makes kp(A)r0k small obviously need not do the same for kp(A)e0k.

In this section we describe the residual bounds which can be derived from this poly-nomial representation, restricting our considerations once more to MR approximations.

Although some of the results which follow can be extended to the general case where A is a linear operator on a possibly infinite-dimensional Hilbert space H , we will assume throughout this section that H is of finite dimension n, i.e., that A is a nonsingular matrix in C^n×n. This, of course, implies the existence of a finite termination index L≤ n.

105

Nearly all polynomial bounds involve reducing the minimization ofkp(A)k to a scalar problem. The most popular is to employ the Jordan canonical form of A: if the charac-teristic polynomial of A is given by c_A(ζ) = Qk

j=1(ζ− λj)ⁿ^j and T is any similarity which transforms A to its Jordan canonical form J_A= T⁻¹AT , then

kp(A)k ≤ κ(T ) max

1≤j≤k n_j max

0≤`≤nj−1

|p^(`)(λj)|

In document Minimal and orthogonal residual methods and their generalizations for solving linear operator equations (Page 111-114)