3.1 Iterative inverse and regularization methods
3.1.5 Krylov methods: Conjugate Gradients
To make the iteration scheme more efficient, Conjugate Gradients proposes to search each time in a different direction. This is achieved by imposingA-orthogonality to two different (i6=j) searching vectorsµiandµj
hµj|µiiA ≡ hAµj|µii= 0, (3.25) which are then said to be conjugated. In the preconditioned case, the searching vectors are multiplied byMso that the conjugacy has to be formulated in the following way:
3. NUMERICAL METHOD
The iteration scheme is given by substituting the residuals in eq. (3.15) by the new searching vectors{µj}
ψj+1=ψj+τjM µj. (3.26) By subtractingψ∗we obtain an equation for the errors,
ηj+1=ηj+τjM µj. (3.27) Taking into account the relation between the residuals and the errors
ξj+1=−Aηj+1, (3.28)
we can derive the recurrent formula for the residuals
ξj+1 =−A(ηj+τjM µj) =ξj−τjAM µj. (3.29) Here again, expression (3.16) has to be used periodically with the feedback of ψj to avoid the accumulation of floating-point roundoff error. The optimal length of the step is found by minimizing the quadratic form
0 = dQA dτj (ψ
j+1) =
−hξj+1|M µji=hηj+1|M µjiA. (3.30) Substituting expression (3.27) in (3.30) we then obtain
τj =− hη j|M µji A hM µj|M µji A = hξ j |M µji hM µj|M µji A . (3.31)
It can be shown that this formula is equivalent to the following expression
τj = hξ j |M ξji hM µj|M µji A , (3.32)
usinghξj|M µji=hξj|M ξji(see appendixB.1).
To generateA-orthogonal searching vectors one could think of Gram-Schmidt-conjugation
µj =ξj +
j−1 X
k=0
βjkµk. (3.33)
Here it was assumed that the residuals {ξj} form a set of linearly independent vectors (see appendix B.1). The expression for the factorsβjkcan be derived by calling A-orthogonality
in eq. (3.33) hM µj|M µiiA = hM ξj|M µiiA+ j−1 X k=0 βjkhM µk|M µiiA 0 = hM ξj|M µiiA+βjihM µi|M µiiA. (3.34)
3.1 Iterative inverse and regularization methods
One obtains the following formula for the factors
βji=−hM ξ j|M µii A hM µi|M µii A , (3.35)
wherei < jaccording to eq. (3.33)1.
This method seems to require too much memory, as apparently all previous searching vec- tors must be stored to calculate the new one. However, only oneβ-factor remains in the sum in eq. (3.33), as we show in appendix B.1.3. Hence, Gram-Schmidt orthogonalization can be simplified to the following expression
µj+1=ξj+1+βj+1µj, (3.36) where βEXPj+1 ≡βj+1≡βj+1j =−hM ξ j+1 |M µjiA hM µj|M µji A , (3.37)
with EXP meaning expensive, since the nominator ofβ apparently requires an extraAoper- ation. This additional operation can be saved by taking the vector AM µj from τj or with alternative methods (see appendixB.1), like the Fletcher-Reeves method (Fletcher & Reeves,
1964)
βFRj+1 = hξ
j+1|M ξj+1i
hξj|M ξji , (3.38)
the Polak-Ribi`ere formula (Polak & Ribi`ere 1969)
βPRj+1 = hξ
j+1|M(ξj+1−ξj)i
hξj|M ξji , (3.39)
or the Hestenes-Stiefel expression (Hestenes & Stiefel,1952)
βHSj+1=−hξ
j+1
|M(ξj+1−ξj)i
hµj|M(ξj+1
−ξj)i . (3.40)
However, βEXP turns out to be a very efficient scheme, which behaves far more stably than the rest (see section4). Since theβ-formulae (eq.3.37-3.40) are mathematically equivalent, one could think of combining them in a single scheme finding numerically different solutions. However, this kind of hybrid scheme remains to be thouroughly studied.
Formula (3.36) shows that new searching vectors are built from a linear combination of the current residual and the previous searching vector. Since the subsequent residuals are given by the linear combination of the previous residual and theA-operator applied to the searching vector, the manifold where the solution is being searched is spanned by the residuals and the so-called Krylov space. The latter is built by applying the A operator to the basis vector successively. In this manifold, curved quadratic forms appear to be spherical and thus the 1Note that the sign ofβdepends on the definition of the Gram-Schmidt conjugation. An alternative definition with the negation of the residuals would cancel the minus sign in eq. (3.35). The sign ofβcan be regarded as a free parameter.
3. NUMERICAL METHOD
searching process becomes more effective. It is possible to derive the Conjugate Gradients method by minimizing theA-norm of the error: min||η||A (see e.g.Marchuk, 1982). In this sense an optimal solution to the inverse problem can be found even if no unique solution exists. Conjugate Gradients works, even if the operator Ais not a positive definite (for a discussion see e.g.Shewchuk,1994). It can easily be shown that Conjugate Gradients converges at most inn-steps, withnbeing the number of pixels/vector columns (see e.g.Shewchuk,1994).