Krylov Subspace Methods Based on Non-Orthogonal

4.2 MR and OR Approximations from Krylov Subspaces

4.2.2 Krylov Subspace Methods Based on Non-Orthogonal

The previous section showed how several well-known Krylov subspace algorithms for the iterative solution of (1.1) which use an orthogonal basis of Vm = Km(A, r₀) result from specializing the basic algorithms of Chapters 2 and 3 to Krylov subspaces as correction spaces. In the same manner, we show in this section how Krylov subspace methods such as QMR/BCG and TFQMR/CGS, which use a non-orthogonal basis of the Krylov space, naturally fit into the framework of MR/OR approximations based on non-orthogonal bases

Algorithm 4.2.3 GMRES and FOM for Krylov spaces.

1 r₀ := b − Ax0, β :=kr0k, v1 := r₀/β

2 for j := 1 to m

3 w := Av_m

4 for i = 1 to j

5 h_i,j := (w , v_i)

6 w := w − hi,jv_i

7 end

8 h_j+1,j :=kw k

9 v_j+1 := w /h_j+1,j

10 end

11 Compute y_m^MRto minimize kβu1^(m+1)− eH_myk

12 Compute y_m^OR to solve H_my = βu₁^(m)

13 x_m^MR:= x₀+ V_my_m^MR

14 x_m^OR:= x₀+ V_my_m^OR

discussed in Section 2.3.3.

To this end, let v₁, v₂, . . . , v_L ∈ H be any set of linearly independent vectors such that {v1, v₂, . . . , v_m} form a basis of Vm = Km(A, r₀) for m = 1, . . . , L (in particular, r₀ = βv₁ for some 06= β ∈ C). This basis leads naturally to a (quasi)-minimal residual as well as to a (quasi-) orthogonal residual method. Since, for every 1 ≤ m ≤ L, Avm ∈ span{v1, v₂, . . . , v_m+1}, where we set vL+1 = 0, there exists an upper Hessenberg matrix He_m ∈ C^(m+1)×m such that—just as in (2.39) and in (4.3)—

AV_m = V_mH_m+ [0, . . . , 0, η_m+1,mv_m+1] = V_m+1He_m. (4.10) Just as in the Arnoldi decomposition (4.3) in the case of an orthogonal basis, the m-th column of eH_m contains the coefficients of Av_m ∈ Km+1(A, r₀) with respect to the basis vectors v₁, . . . , v_m+1. We are thus in the situation of Section 2.3.3, with Vm = Km(A, r₀) andWm= AVm. Defining the auxiliary inner products (·, ·)V on VL and (·, ·)v

on the coordinate space C^L as in (2.68) and (2.67), respectively, we obtain the QMR approximation

x_m^QMR:= x₀+ V_my_m^QMR= x₀+ v_m^QMR

of the solution A⁻¹b by requiring that Av_m^QMR be the MR approximation to r₀ with respect to the inner product (·, ·)V. Just as in Section 2.3.3, the coefficient vector y_m^QMR ∈ C^m is characterized as the unique solution of the least-squares problem (2.44). Merely translating Theorems 2.3.9 and 2.3.10 to Krylov subspace notation, we obtain

Theorem 4.2.5. The QMR iterates are the MR iterates with respect to the inner product (·, ·)V:

kb − Axm^QMRk^V = min

x∈x0+K^m(A,r0)kb − Ax k^V. (4.11)

In terms of the original norm in H , the MR and QMR residuals may be bounded by krm^MRk ≤ krm^QMRk ≤p

κ₂(M_m)krm^MRk, (4.12) in which κ2(Mm) denotes the (Euclidean) condition number of the (Hermitian positive definite) matrix M_m.

Theorem 4.2.6. The residual vectors of the QMR iterates are characterized by r_m^QMR⊥ Um,

where Um = {v = Vm+1y ∈ Km+1(A, r₀) : y = M_m⁻¹He_mz with z ∈ C^m} is an m-dimensional subspace of Km+1(A, r₀) (for m = L, there holds UL = KL(A, r₀)) and orthogonality is understood with respect to the original inner product (·, ·) on H . Conse-quently,

r_m^QMR = I− P_W^U_m^m r₀, where P_W^U^m

m denotes the oblique projection onto Wm= AKm(A, r₀) orthogonal to Um. Again drawing from the abstract results in Section 2.3.3, the corresponding QOR approximations in the Krylov subspace case are defined by

x_m^QOR = x₀+ V_my_m^QOR, where the coordinate vector y_m^QOR ∈ C^m is determined by

0 = (βu1− eHmy_m^QOR, uj)2 (j = 1, 2, . . . , m), i.e., Hmy_m^QOR= βu1 (4.13) under the usual assumption that H_m is nonsingular. In terms of the inner product (·, ·)V

on KL(A, r₀) the QOR approximations are characterized by r_m^QOR= b − Axm^QOR ⊥^V Km(A, r0),

i.e., the QOR iterates are the OR iterates with respect to the inner product (·, ·)V. Simi-larly to Theorem 2.3.10 there holds

Theorem 4.2.7. The residual vectors of the QOR iterates satisfy r_m^QOR ⊥ Tm,

where Tm = {v = V^m+1y ∈ K^m+1(A, r0) : y = M_m⁻¹[z^T 0]^T with z ∈ C^m} is an m-dimensional subspace of Km+1(A, r₀) (for m = L, there holds TL = KL(A, r₀)) and orthogonality is understood with respect to the original inner product (·, ·) on H . Conse-quently,

r_m^QOR= I− P_W^T_m^m r₀, where P_W^T^m

m denotes the oblique projection onto Wm = AKm(A, r₀) orthogonal to Tm.

The QMR and QOR iterates being identified as MR and OR iterates with respect to the inner product (·, ·)V implies that the assertions of Corollaries 4.2.1 and 4.2.4 are valid for any pair of QMR/QOR methods. The crucial angles ϕ_m of (4.8), however, are now defined with respect to (·, ·)V.

The motivation for using non-orthogonal bases comes from the potential savings in storage and computation when using algorithms for generating a basis ofKm(A, r₀) which lead to a Hessenberg matrix in relation (4.10) with a small number of bands. A tridiag-onal Hessenberg matrix can be achieved using the non-Hermitian Lanczos process. The non-Hermitian variant of the Lanczos process is almost as economical as its Hermitian counterpart, requiring in addition the generation of a basis of a Krylov space with respect to the adjoint A^∗ and thus storage for several additional vectors and, what can be expen-sive, multiplication by A^∗ at each step. The result are two biorthogonal, in general not orthogonal, bases. The non-Hermitian Lanczos process may break down before the Krylov space becomes stationary, and in finite arithmetic this leads to numerical instabilities in the case on near-breakdowns. This problem is addressed by so-called look-ahead tech-niques for the non-Hermitian Lanczos process (cf. Freund, Gutknecht & Nachtigal (1993), Gutknecht (1997) and the references therein), which result in a pair of block-biorthogonal bases and a Hessenberg matrix which is as close to tridiagonal form as possible while maintaining stability.

Remark 4.2.8. We have mentioned that Krylov subspace QMR/QOR methods have the advantage of being able to work with an arbitrary basis of the Krylov space, in particular also with a non-orthogonal basis. As the bounds (4.12) show, how close the QMR iterates come to also minimizing the residual in the original norm depends on the Euclidean condition number of the Gram matrices M_m, i.e., on the largest and smallest singular value of V_m = [v₁, . . . , v_m] (interpreted as an operator from C^m to Km(A, r₀)). The largest singular value of V_m is bounded by √

m if the basis vectors are normalized with respect to the original norm. When the look-ahead Lanczos algorithm is used to generate the basis, bounds on the smallest singular value may be obtained depending on the look-ahead strategy being followed.

Example: QMR/BCG and TFQMR/CGS

We now consider two particular examples of Krylov subspace QMR/QOR algorithms, the QMR method of Freund & Nachtigal (1991) and the TFQMR and CGS methods due to Freund (Freund 1993, Freund 1994) and Sonneveld (1989), respectively. As another QMR/QOR pair, we mention the QMRBICGSTAB method of Chan, Gallopoulos, Si-moncini, Szeto & Tong (1994) and BICGSTAB developed by van der Vorst (1992) and Gutknecht (1993b).

The QMR algorithm of Freund and Nachtigal proceeds exactly as described above, with the basis of the Krylov space generated by the look-ahead Lanczos algorithm. The QOR counterpart of Freund and Nachtigal’s QMR is the BCG algorithm, the iterates of which are characterized by

x^BCG ∈ x⁰+Km(A, r0), r_m^BCG ⊥ K^m(A^∗,re0).

The algorithm proceeds by generating a basis of the Krylov space Km(A^∗,er₀), whereer₀ is an arbitrary starting vector, simultaneously with a basis V_m of Km(A, r₀) in such a way

that these two bases are biorthogonal. If y ∈ C^m denotes the coefficient vector of the BCG approximation with respect to V_m, then the biorthogonality requirement implies

r_m^BCG = V_m+1(βu₁− eH_my )⊥ Km(A^∗,re₀)

⇔ rm^BCG⊥V Vm

⇔ Hmy = βu₁.

The last equality identifies x^BCG as the m-th QOR iterate.

To treat the TFQMR/CGS pair, recall that the residual of any Krylov subspace ap-proximation can be expressed as r_m = φ_m(A)r₀ in terms of a polynomial φ_m of degree m satisfying φ_m(0) = 1. Sonneveld (1989) defined the CGS iterate x_m^CGS ∈ x0+K2m(A, r₀) such that

r_m^CGS = [φm(A)]²r0, where r_m^BCG = φm(A)r0.

It is shown in Freund (1993) that x_m^CGS = x0+ Y2mz2m for a coefficient vector z2m ∈ C^2m, whereK2m(A, r₀) = span{y1, . . . , y_2m}. Moreover, there exists a sequence {wn} such that r₀ = βw₁ and

AY_n= W_n+1He_n, n = 1, . . . , 2L.

In terms of this sequence, there holds

r_m^CGS = W_2m+1

βu₁− eH_2mz_2m , where H_2mz_2m= βu₁, i.e.,

r_m^CGS ⊥W span{w1, . . . , w_2m} = AK2m(A, r₀),

which identifies CGS as an OR method. The corresponding MR method is Freund’s transpose-free QMR method TFQMR (cf. Freund (1993),Freund (1994)), the iterates of which are defined as

x_n^TFQMR = x₀+ Y_nz_n, n = 1, . . . , 2L, where the coefficient vector z_n∈ Cⁿ solves the least-squares problem

kβu1− eH_nz_nk2 → min

z∈Cⁿ. In other words,

krn^TFQMRkW = min

x∈x0+Kn(A,r0)kb − Ax kW. Comparison of Residuals

The availability of a sequence of vectors{ev_j}j≥1 which is biorthogonal to the Krylov basis {vj} permits a convenient representation of the QOR residual via (2.48). If

(v_j,ve_k) = δ_jkd_j, j, k = 1, . . . , L,

and if both sequences are normalized to one, then, since the MR-residual at step m− 1 lies in Wm−1 ⊂ Vm, we have r_m^MR₋₁ = Pm

j=1(r_m^MR₋₁,ve_j/d_j)v_j. Inserting r − w = rm^MR−1 in (2.48) then yields

r_m^QOR = (r_m−1^MR,wb_m+1)_Vv_m+1 = v_m+1

j=1

g_j

d_j(r_m−1^MR,ev_j)v_j,

which, after taking norms and applying the Cauchy-Schwarz inequality, becomes krm^QORk ≤ krm^MR−1k

j=1

|g^j|

|dj|.

This bound is a slightly improved version of a bound by Hochbruck & Lubich (1998).

In document Minimal and orthogonal residual methods and their generalizations for solving linear operator equations (Page 65-70)