j+1(AMj, B) we use once again the rank-preserving assumption of the variable preconditioner. It holds that
Zj+1= span n M1B M2AM1B ... Mj+1AMj...AM1B o and K j+1(AMj, B) = range (B) + AMjKj(AMj, B) = range (B) + AZj = spannB AM1B AM2AM1B ... AMj...AM1B o .
Because of the rank-preserving assumption on the variable preconditioner, we conclude that dim(Zj+1) = dim(K j+1(AMj, B)), and as so dimrange (B) ∩ AMjKj(AMj, B) = p + sj− dim K j+1(AMj, B) = p + sj− sj+1 = p − pj. Again from the nonsingularity of A, we obtain
p − pj= dim rangeA−1B∩MjKj(AMj, B) = dim range (X∗) ∩ Zj
where X∗denotes the exact solution of AX = B, finalizing the proof.
Although the unpreconditioned case shown inProposition 2.3.3can be found in other publications (cf. [103]),Proposition 2.4.5is new to the best of our knowledge. This result will be used inSection 3.6.
For a detailed study over variable preconditioners for iterative solvers, we recommend the reading of [107, §9.4], [113,105,27,23].
2.5
The Block Arnoldi Algorithm
As mentioned previously, looking for an approximate solution inside the (preconditioned) block Krylov subspace is a common strategy. However, the (block) Krylov basis can be very ill-conditioned if it is built naïvely according to its definition (2.3.3) and it is desirable to construct an orthonormal basis to K
j (A, B)(or Kj(AMj, B)) for stability reasons. We refer to [78] for a deep study on the conditioning of Gram-Schmidt-based algorithms for generating orthonormal bases and its stability when considering finite precision arithmetic.
We consider that the reader is familiar with variable preconditioners and preconditioned Arnoldi al- gorithm (as in [107, p.256]). We introduce now the block flexible Arnoldi method (Algorithm 2.5.1) and the block flexible Arnoldi iteration (Algorithm 2.5.2), which are commonly used not only for generating a stable orthonormal basis for a block Krylov subspace, but also in a number of applications such as solving eigenvalues problems (see [79, 122] for instance).
Remark 2.5.1. Algorithm 2.5.2is presented such that it will remove linear dependent columns of S (if any; cf. line 5ofAlgorithm 2.5.2) ensuring thus that Vj+1has full rank: a rank-revealing QR (RRQR)[17] algorithm would be used to determine both the deficiency nj and the decomposition SΠc = QT (with Πc designing a column permutation matrix). In the literature, removing the linear dependent columns of S is called “Arnoldi deflation” [64].
As discussed later, a deficiency of S characterizes a breakdown in the block Arnoldi procedure. We will show inSection 3.6that this behaviour is rare in practice because it means that a linear combination of p − pj exact solutions has been found. Thus it is more realistic to consider that the relations nj = 0 and pj = pdo hold for every iteration j. Consequently a standard QR decomposition based on modified
Algorithm 2.5.1: Block flexible Arnoldi
1 Compute the QR decomposition B = V1Λ0 obtaining n0= null (B) = 0, p0= rank (B) = p, V1∈ Cn×p0 and Λ0∈ Cp0×p;
2 Define s0= 0, and V1= V1; 3 for j = 1, ... do
4 Apply one block flexible Arnoldi iteration (Algorithm 2.5.2) 5 end for
Algorithm 2.5.2: Block flexible Arnoldi iteration: completion of Zj ∈ Cn×sj, Vj+1 ∈ Cn×(sj+pj),
Hj∈ C(sj+pj)×sj with Vi, Zi∈ Cn×pi−1 for 1 ≤ i ≤ j, such that (Vj+1)HVj+1= I
1 Zj= MjVj; 2 S = AZj;
3 Hj=VjHS, where Hj∈ C(sj−1+pj−1)×pj−1; 4 S = S −VjHj;
5 Compute the QR decomposition S = Vj+1Hj+1,j obtaining nj= null (S), pj= pj−1− nj, Vj+1∈ Cn×pj and Hj+1,j ∈ Cpj×pj−1; 6 Define sj= sj−1+ pj−1, kj+1= pj; 7 Define Zj =Z1 ... Zj , Vj+1=V1 ... Vj+1 ; 8 Define Hj = H j−1 Hj 0pj×sj−1 Hj+1,j , or H1= H1 H2,1 if j = 1;
Remark 2.5.2. Steps3and4ofAlgorithm 2.5.2amount for the orthogonalization of the basis, which may lack of stability if not performed properly. Indeed inAlgorithm 2.5.2 we just present a naïve per- spective for the sake of clearness, and an advanced method is suggested for such orthogonalization when implementing this method in practice. Examples cover CGS2 (Classical Gram-Schmidt with reorthogo- nalization), or BMGS (block Modified Gram-Schmidt) or Ruhe’s variant of BMGS [104]. We refer to [59] and [78, Chapter 1] for a deep study on the stability of these methods.
Proposition 2.5.3. After j iterations ofAlgorithm 2.5.1, it holds that range Vj = Kj(AMj, B) range Zj =MjKj(AMj, B)
(2.5.1) for some nonsingular matrix Mj ∈ Cn×n representing the action of the variable preconditioner up to iteration j. Moreover,Vj∈ Cn×sj is a full rank orthonormal matrix, andZj∈ Cn×sj has full rank. Proof. It is easy to infer fromAlgorithm 2.5.2that
VjHj,j−1= (I −Vj−1Vj−1H )AMj−1Vj−1 (2.5.2) for every j ≥ 2, and because VjHj,j−1 arises from an economic QR decomposition, it always holds that
range(VjHj,j−1) = range(Vj).
for every j ≥ 2. From line1 ofAlgorithm 2.5.1we find that range (V1) = range (B). We prove then that
range Vj = range AMj−1...AM1B − range Vj−1 (2.5.3) for every j ≥ 2. From (2.5.2), we find that
2.5. THE BLOCK ARNOLDI ALGORITHM 19 and thus
range (V2) = range V2H2,1 = range
(I −V1V1H)AM1V1
= range(I −V1V1H)AM1B
= range (AM1B) − range (V1) Assuming it is correct for j − 1, we find out that
range Vj = range VjHj,j−1 = range
(I −Vj−1Vj−1H )AMj−1Vj−1
= range(I −Vj−1Vj−1H )AMj−1...AM1B
= range AMj−1...AM1B − range Vj−1
proving (2.5.3) by induction. Using this knowledge for every j show us that range Vj = range
V1 V2 . . . Vj
= range (V1) + range (V2) + . . . + range Vj
= range (B) + range (AM1B) + range (AM2AM1B) + . . . + range AMj−1...AM1B
= spannB AM1B . . . AMj−1...AM1B o
.
In the very same fashion, noticing that Zj = MjVj by definition (and the rank-preserving assumption of the variable preconditioner; seeDefinition 2.4.1), we obtain that
range Zj = span n
M1B M2AM1B . . . MjAMj−2...AM1B o
. From Theorem 2.4.3we know that there is always a nonsingular matrix Mj such that
range Zj = span n
M1B M2AM1B . . . MjAMj−2...AM1B o
=MjKj(AMj, B).
To show that range Vj = Kj(AMj, B)we use the proof of Theorem 2.4.3. To satisfy both equalities in (2.5.1) at once, a possibility is to find a nonsingular matrix Mj ∈ Cn×n such that all the equalities in (2.4.6) hold as well as AMjB = AM1B AMjAMjB = AM2AM1B AMj(AMj)2B = AM3AM2AM1B ... AMj(AMj)j−2B = AMj−1A . . . M3AM2AM1B. (2.5.4)
Using the nonsingularity of A, and multiplying from the left every equation in (2.5.4) by A−1we verify that all the conditions in (2.5.4) are already contained in (2.4.6), and thus, any nonsingular matrix Mj ∈ Cn×n satisfying (2.4.6) also satisfies (2.5.4).
To finalize the proof, we highlight that Vj is orthonormal and full rank by construction, and that Zj has full rank because of the assumption of a rank-preserving variable preconditioner.
To the best of our knowledge, Proposition 2.5.3 is the first proof that the block flexible Arnoldi algorithm indeed generates a basis for a block Krylov subspace, being this one of the contributions of this thesis. Even considering p = 1, we are unaware of such a demonstration in the flexible case, although for a fixed preconditioner (or unpreconditioned case) this result is well-known.
Remark 2.5.4. We recall that we represent the application of the flexible preconditioner Mj(.)on Vj by MjVj, that is
Mj(Vj) = MjVj. (2.5.5)
However, in a general scenario
Mj(B) 6= MjB. (2.5.6)
InProposition 2.5.3 we just clarify that there always exists a linear operator Mj such that the referred
subspaces are Krylov subspaces.
Even though Vj is an orthonormal basis to the block Krylov subspace, Zj is not an orthonormal basis to the correction subspace proposed in Section 2.4 though it is considered a reliable and stable basis [107, 112]. We also note that applying the block Arnoldi algorithm is not equivalent to apply p times the Arnoldi algorithm, because the later would generate p orthonormal basis to p different subspaces but these bases need not to be orthonormal among each other. There is an extra computation effort whenever we prefer the block methods. However, block methods can greatly improve the convergence by using information from all subspaces simultaneously. There are also computational gains whenever we are considering a massively parallel computation environment, but we detail this in the end of Chapter 3.