Krylov methods: Conjugate Gradients - Iterative inverse and regularization methods

3.1 Iterative inverse and regularization methods

3.1.5 Krylov methods: Conjugate Gradients

To make the iteration scheme more efficient, Conjugate Gradients proposes to search each time in a different direction. This is achieved by imposingA-orthogonality to two different (i₆=j) searching vectorsµiandµj

hµj_|µi_i_A _{≡ h}Aµj_|µi_i= 0, (3.25) which are then said to be conjugated. In the preconditioned case, the searching vectors are multiplied byMso that the conjugacy has to be formulated in the following way:

3. NUMERICAL METHOD

The iteration scheme is given by substituting the residuals in eq. (3.15) by the new searching vectors_{µj_}

ψj+1=ψj+τjM µj. (3.26) By subtractingψ∗we obtain an equation for the errors,

ηj+1=ηj+τjM µj. (3.27) Taking into account the relation between the residuals and the errors

ξj+1=₋Aηj+1, (3.28)

we can derive the recurrent formula for the residuals

ξj+1 =₋A(ηj+τjM µj) =ξj₋τjAM µj. (3.29) Here again, expression (3.16) has to be used periodically with the feedback of ψj to avoid the accumulation of floating-point roundoff error. The optimal length of the step is found by minimizing the quadratic form

0 = dQA dτj (ψ

j+1_{) =}

−hξj+1_|M µj_i=_hηj+1_|M µj_i_A. (3.30) Substituting expression (3.27) in (3.30) we then obtain

τj =₋ hη j_|_{M µ}j_i A hM µj_|_{M µ}j_i A = hξ j |M µj_i hM µj_|_{M µ}j_i A . (3.31)

It can be shown that this formula is equivalent to the following expression

τj = hξ j |M ξj_i hM µj_|_{M µ}j_i A , (3.32)

using_hξj_|M µj_i=_hξj_|M ξj_i(see appendixB.1).

To generateA-orthogonal searching vectors one could think of Gram-Schmidt-conjugation

µj =ξj +

j−1 X

k=0

βjkµk. (3.33)

Here it was assumed that the residuals _{ξj_} form a set of linearly independent vectors (see appendix B.1). The expression for the factorsβjk_{can be derived by calling} _A_{-orthogonality}

3.1 Iterative inverse and regularization methods

One obtains the following formula for the factors

βji=₋hM ξ j_|_{M µ}i_i A hM µi_|_{M µ}i_i A , (3.35)

wherei < jaccording to eq. (3.33)1.

This method seems to require too much memory, as apparently all previous searching vectors must be stored to calculate the new one. However, only oneβ-factor remains in the sum in eq. (3.33), as we show in appendix B.1.3. Hence, Gram-Schmidt orthogonalization can be simplified to the following expression

µj+1=ξj+1+βj+1µj, (3.36) where β_EXPj+1 _≡βj+1_≡βj+1j =₋hM ξ j+1 |M µj_i_A hM µj_|_{M µ}j_i A , (3.37)

with EXP meaning expensive, since the nominator ofβ apparently requires an extraAoper- ation. This additional operation can be saved by taking the vector AM µj from τj or with alternative methods (see appendixB.1), like the Fletcher-Reeves method (Fletcher & Reeves,

1964)

β_FRj+1 = hξ

j+1_|_{M ξ}j+1_i

hξj_|M ξj_i , (3.38)

the Polak-Ribi`ere formula (Polak & Ribi`ere 1969)

β_PRj+1 = hξ

j+1_|_M₍_ξj+1₋_ξj₎_i

hξj_|M ξj_i , (3.39)

or the Hestenes-Stiefel expression (Hestenes & Stiefel,1952)

β_HSj+1=₋hξ

j+1

|M(ξj+1₋ξj)_i

hµj_|_M₍_ξj+1

−ξj)_i . (3.40)

However, βEXP turns out to be a very efficient scheme, which behaves far more stably than the rest (see section4). Since theβ-formulae (eq.3.37-3.40) are mathematically equivalent, one could think of combining them in a single scheme finding numerically different solutions. However, this kind of hybrid scheme remains to be thouroughly studied.

Formula (3.36) shows that new searching vectors are built from a linear combination of the current residual and the previous searching vector. Since the subsequent residuals are given by the linear combination of the previous residual and theA-operator applied to the searching vector, the manifold where the solution is being searched is spanned by the residuals and the so-called Krylov space. The latter is built by applying the A operator to the basis vector successively. In this manifold, curved quadratic forms appear to be spherical and thus the 1_{Note that the sign of}_β_{depends on the definition of the Gram-Schmidt conjugation. An alternative definition} with the negation of the residuals would cancel the minus sign in eq. (3.35). The sign ofβcan be regarded as a free parameter.

3. NUMERICAL METHOD

searching process becomes more effective. It is possible to derive the Conjugate Gradients method by minimizing theA-norm of the error: min_||η_||_A (see e.g.Marchuk, 1982). In this sense an optimal solution to the inverse problem can be found even if no unique solution exists. Conjugate Gradients works, even if the operator Ais not a positive definite (for a discussion see e.g.Shewchuk,1994). It can easily be shown that Conjugate Gradients converges at most inn-steps, withnbeing the number of pixels/vector columns (see e.g.Shewchuk,1994).

In document Kitaura Joyanes, Francisco Shu (2007): Cosmic cartography: Bayesian reconstruction of the cosmological large-scale structure. Dissertation, LMU München: Fakultät für Physik (Page 57-60)