Deflation by Preconditioning - Acceleration Techniques

5.4 Acceleration Techniques

5.4.2 Deflation by Preconditioning

The next class of methods also attempt to utilize spectral information gained in prior restart cycles to accelerate convergence. Instead of augmenting the Krylov space, the same information is used here to construct a sequence of preconditioners which can be improved as more accurate spectral information becomes available. The first such approach was introduced by Erhel et al. (1996).

To motivate this approach, assume U is an orthonormal basis of an A-invariant sub-space U of dimension k, i.e.,

AU = U A_U, A_U ∈ C^k×k.

Note that A_U is the specific representation of the orthogonal section A_U with respect to the basis U . Denoting by U_⊥ an orthonormal basis of the orthogonal complement U^⊥, we can represent the action of A as

AU U_⊥ = U U_⊥A_U U^∗AU_⊥ O U_⊥^∗AU_⊥

Under the assumption that k is small, it is feasible solve systems involving A_U directly, and thus to precondition by M defined as

MU U_⊥ = U U_⊥A_U O O I_n−k

(5.3)

at each step of the iteration. The resulting right-preconditioned operator is given by AM⁻¹U U_⊥ = U U_⊥I_k U^∗AU_⊥

O U_⊥^∗AU_⊥

, i.e., AM⁻¹ = P_U + AP_U⊥. (5.4)

We now compare this preconditioning scheme with Morgan’s method of augmenting the Krylov space Km(A, r₀) by the A-invariant subspace U .

Theorem 5.4.2. Let r_m^M denote the MR residual with respect to the correction space U + Km(A, r₀), where U is an A-invariant subspace, and let r_m^E denote the MR residual with respect to the correction space Km(AM⁻¹, r0) resulting from preconditioning A from the right by M as defined in (5.3). Then there holds

0 =kPUr_m^Mk ≤ kPUr_mÊk and kP_U^⊥r_m^Mk ≤ kP_U^⊥r_mÊk, (5.5) and therefore krm^Mk ≤ krmÊk. If, in addition, also U^⊥ is A-invariant, then, P_Ur₀ = 0 implies r_mÊ = r_m^M.

Proof. The left set of inequalities in (5.5) follow from P_Ur_m^M = 0, which is a restatement of the fact that augmenting with an invariant subspaceU eliminates U from the residual (Lemma 5.2.3).

We next recall that A_U⊥ = P_U⊥AP_U⊥ is the orthogonal section of A onto U^⊥(cf. the remark following Lemma 4.3.3). Since r_m^E = r₀− AM⁻¹c, for some c ∈ Km(AM⁻¹, r₀) we obtain using (5.4)

P_U⊥r_m^E = P_U⊥r₀− P_U^⊥AM⁻¹c = P_U⊥r₀− PU⊥AP_U⊥c = P_U⊥r₀− A_U^⊥P_U⊥c.

Moreover, AM⁻¹U = U together with Lemma 4.3.3 yield

P_U⊥c ∈ P_U^⊥Km(AM⁻¹, r₀) =Km(P_U⊥AM⁻¹, P_U⊥r₀) =Km(A_U⊥, P_U⊥r₀).

The last two statements show that P_U⊥r_m^E is of the form P_U⊥r₀ − A_U^⊥ec with ce ∈ Km(A_U⊥, P_U⊥r₀). On the other hand, by Proposition 5.2.3, there holds

krm^Mk = min

c∈Km(A_{U ⊥},P_{U ⊥}r0)kP_U^⊥r₀− A_U^⊥ck,

i.e., krm^Mk minimizes all expressions of this form, yielding the right inequality of (5.5).

Next, assuming AU^⊥ = U^⊥, (5.4) implies AM⁻¹r₀ = A_U⊥r₀ for r₀ ∈ U^⊥, and thus Km(AM⁻¹, r₀) = Km(A_U⊥, P_U⊥r₀), which shows that in this case both methods minimize over the same subspace, hence r_m^E = r_m^M.

We note that the assumption P_Ur₀ = 0 is not restrictive, as this can be enforced by adding the correction U A⁻¹_U U^∗r₀ to x₀ and the preconditioner is built upon the premise that A_U is easily invertible. However, since P_Ur₀ = 0 by no means implies that P_Ur_m^E = 0 for m > 0, it cannot be guaranteed that krm^Ek = krm^Mk even for such a special choice of initial residual unless AU^⊥ =U^⊥. In the finite-dimensional case, the condition thatU^⊥ be invariant whenever U is invariant—i.e., that all invariant spaces also reduce A—is a characterization for A to be normal. Hence, these two approaches are equivalent when A is normal and U is invariant.

The availability of an (exactly) A-invariant subspace U , on the other hand, is an as-sumption that can rarely be satisfied in practice. For a non-invariant U one can nonethe-less still define the preconditioner as in (5.3), where now A_U is defined as A_U := U^∗AU , resulting in

AM⁻¹U U_⊥ = U U_⊥

I U^∗AU_⊥

U_⊥^∗AU A⁻¹_U U_⊥^∗AU_⊥

based on the heuristic that U_⊥^∗AU A⁻¹_U will be small whenever U is nearly A-invariant.

In Erhel et al. (1996) such nearly A-invariant spaces are obtained as the span of selected Ritz or harmonic Ritz vectors determined from Krylov spaces generated during previous cycles.

Baglama et al. (1998) propose a similar algorithm, which preconditions by (5.3) from the left, leading to the preconditioned operator

M⁻¹AU U_⊥ = U U_⊥ I A⁻¹_U U^∗AU_⊥ O U_⊥^∗AU ⊥

, or M⁻¹A = P_U + AP_U_⊥+ (A⁻¹− I)PUAP_U_⊥,

where we have again assumed that we are in the idealizd case of anwhere U is exactly A-invariant. The MR correction of the left-preconditioned system is the solution of the minimization problem

kM⁻¹r_m^Bk = min{kM⁻¹(r₀− AM⁻¹c)k : c ∈ Km(AM⁻¹, r₀)} (cf. Section 4.6).

From (5.3), it is evident that,

M⁻¹ = A⁻¹P_U + P_U^⊥ and, consequently, if AU = U ,

P_U_⊥M⁻¹v = P_U⊥v , for all v .

These are the essential ingredients for showing that Proposition 5.4.2 holds in exactly the same way with r_m^E in place of r_m^B. The construction of an approximately invariant subspace U is accomplished by Baglama et al. (1998) by employing the IRA process (cf.

Section 4.3.3).

Kharchenko & Yeremin (1995) suggest another adaptive right preconditioner fM : After each GMRES cycle the Ritz values and the corresponding left² and right Ritz vectors of

2Left Ritz vectors are defined by A^∗z˜j− ¯θjz˜j ⊥ Kmand can be obtained from the left eigenvectors of H_m.

A with respect Km are extracted. The aim is to obtain a preconditioner such that the extremal eigenvalues of A, which are approximated by the Ritz values, are translated to one (or at least to a small cluster around one) in the transition from A to A fM⁻¹.

The extremal Ritz values are partitioned into, say, k subsets Θ_j of nearby Ritz values.

For each Θ_j, a rank-one transformation of the form I + v_jv˜_j^∗ is constructed, where v_j and ˜v_j are linear combinations of the associated right and left Ritz vectors. These linear combinations are chosen to translate simultaneously all Ritz values of Θ_j into a small cluster around one, while satisfying certain stability criteria. One preconditioning step now consists of successive multiplication by these rank-one matrices, i.e.,

Mf⁻¹ = (I + v₁ve₁^∗)· · · (I + vkve_k^∗) = I + V_kVe_k^∗, V_k=v₁, . . . , v_k , Ve_k= ˜v₁. . . ˜v_k . For the last equation we have made use of the fact that ˜v_j^∗v_i = 0 for i 6= j, since all eigenvalues of H_m have geometric multiplicity one. Note that, if Θ_j has a small diameter and the Ritz values contained in Θ_j are good approximations of eigenvalues of A, then v_j and ve_j are approximate right and left eigenvectors of A. Moreover, the implementation described in Kharchenko & Yeremin (1995) ensures that the diagonal matrix D := eV_k^∗V_k ∈ C^k×k is nonsingular.

To compare this approach with the preconditioners presented thus far, we choose biorthonormal bases U and eU ofU := span{v1, . . . , v_k} and fU := span{ev₁, . . . ,ve_k} such that U^∗U = I, which are given e.g. by

U = VkSe⁻¹ and U := ee VkD^−HSe^H,

with eS any matrix that satisfies V_k^∗V_k = eS^HS. In this notation the preconditioner fe M is given by

Mf⁻¹ = I + U eSD eS⁻¹Ue^∗ = I + U S ˜U^∗, S := eSD eS⁻¹.

We let U_⊥ denote an orthonormal basis of U^⊥ and make the idealizing assumptions that both U and fU are invariant with respect to A and A^∗, respectively, i.e.,

AU = U A_U and Ue^∗A = A_UUe^∗,

and that the eigenvalues corresponding toU (respectively fU ) are translated exactly to 1.

Substituting this in the definition of the preconditioner, we obtain using the biorthonor-mality of U and eU ,

Mf⁻¹U U_⊥ = U U_⊥I + S SUe^∗U_⊥ O I_n−k

and

A fM⁻¹U U_⊥ = U U_⊥AU(I + S) A_US eU^∗U_⊥+ U^∗AU_⊥

O U_⊥^∗AU_⊥

In addition, our assumptions imply AU(I + S) = I, i.e., S = A⁻¹_U − I and A^US eU^∗ = (I− AU) eU^∗ = eU^∗(I− A), resulting in

A fM⁻¹U U_⊥ = U U_⊥ I Ue^∗(I− A)U⊥+ U^∗AU_⊥ O U_⊥^∗AU_⊥

This leads to A fM⁻¹ = P_U+P_UÛê^⊥(I−A)P_U^⊥+AP_U⊥as the analogue to (5.4), where P_UÛê^⊥ denotes the oblique projection onto U along fU . Thus, in view of PU⊥A fM⁻¹ = A_U⊥, the statement made in Theorem 5.4.2 holds also for this preconditioning approach.

In document Minimal and orthogonal residual methods and their generalizations for solving linear operator equations (Page 105-109)