Deflation Based Krylov Subspace Methods for Sequences of Linear Systems

(1)

Deﬂation Based Krylov Subspace Methods

for Sequences of Linear Systems

Zur Erlangung des akademischen Grades eines

Doktors der Naturwissenschaften (Dr. rer. nat.)

am Fachbereich Mathematik und Naturwissenschaften der Bergischen Universität Wuppertal genehmigte

Dissertation

von

Dipl.-Math. Nemanja Božović

Gutachter: Prof. Dr. Andreas Frommer Gutachter: Prof. Dr. Matthias Bolten

Prüfungskommission

Prof. Dr. Andreas Frommer Prof. Dr. Matthias Bolten Prof. Dr. Michael Günther Assoc. Prof. Vladimir Kostić Dissertation eingereicht am: 08.06.2017 Tag der Disputation: 21.07.2017

(2)

urn:nbn:de:hbz:468-20170913-142254-9

[http://nbn-resolving.de/urn/resolver.pl?urn=urn%3Anbn%3Ade%3Ahbz% 3A468-20170913-142254-9]

(3)

Acknowledgements

I would like to thank all people who helped me, in one way or another, to write down this thesis.

First, I would like to thank my advisor Prof. Andreas Frommer. Over the years he found time on numerous occasions to answer my questions, he pro-vided me with many useful advice and we had countless fruitful discussions. In addition, I thank Prof. Matthias Bolten, who was in a role of my second advisor. Disregarding my (often) constant knocking on his door, he was al-ways patient enough to help me out with any issues I had, as well as to suggest me diﬀerent insights and ways to tackle problems. It was a pleasure working with them and in general in the Applied Computer Science group at Bergis-che Universität Wuppertal. Amongst all colleagues from the group, I would like to specially thank Sebastian Birk, Matthias Rottmann and Karsten Kahl who either helped me through advice and idea exchanges or by providing me codes and matrices for diﬀerent applications. Prof. Francesco Knechtli and Björn Leder from the physics department also deserve to be mentioned, since without joint seminars and their lectures and talks, it would be a much harder task to understand basics of lattice QCD.

I had the opportunity to attend many conferences, summer schools and work-shops, where I met many interesting people, and I am grateful for that. From all these people, I would like to thank Prof. Mike Peardon from Trinity Col-lege Dublin who helped me work out gauge ﬁxing from the mathematical point of view, as well as Graham Moir (at that moment also from TCD) who was always open for any lattice QCD related question I had.

I would like to express my deepest gratitude to my former mentor and ad-visor, Prof. Ljiljana Cvetković, who was probably the most inﬂuential person throughout my academic career. Her inspiring and “out of the ordinary” lec-tures rose my interest in Applied Numerical Linear Algebra momentarily, and made my choice for the ﬁeld of a future research quite straightforward. I am also thankful to Vladimir Kostić, who was responsible for the exercise course at that moment, for his “can do” attitude and help and support whenever we met.

I would also like to thank Sebastian Birk and his family for beeing there

(4)

for us as friends, as well as helping us with many matters due to the lack of knowledge of the German language. Moreover, I would specially like to thank Dmitry Shcherbakov for beeing a true friend over the years, and hopefully beyond Ph.D.

Next, I would like to thank my parents, Vesna and Milan, for their men-tal support, not only over the past years, but rather throughout my whole academic career.

Last but not least, I dedicate this thesis to my lovely wife Milana and my two sons, Dušan and Filip, who endured with me through thick and thin. Their unwavering support at moments of weakness helped me overcome all problems and ﬁnish my work.

(5)

1 Introduction

Many applications in computational science and engineering require the solu-tion of a sequence of slowly changing linear systems

𝐴(𝑖)_𝑥(𝑖) _{= 𝑏}(𝑖)_, _𝐴(𝑖) _{∈ ℂ}𝑛×𝑛_, _𝑏(𝑖) _{∈ ℂ}𝑛_, _{𝑖 = 1, … , 𝑚.} _(1.1)

In this context, slowly changing means that the matrices and right-hand sides change slightly from one system to the next, as it will be demonstrated for some particular applications in Chapter 5. In our work, we have focused on solving (1.1) by exploiting the idea of recycling, which basically means keeping a carefully chosen subspace between systems, with the goal of reducing costs of subsequent systems. We should mention that this is not a new concept, it has been introduced and utilized by many authors in the last two decades.

Advances in technology led to a substantial growth in the size of the linear systems, e.g., lattice QCD calculations tend to have hundreds of millions of unknowns. In this case direct methods fail due to the excessive storage or com-putational time requirements and iterative methods become the only feasible option. The most popular choice are Krylov subspace methods. Not taking into account the “closeness” of the matrices, there is a huge variety of methods to choose from for solving each system separately in (1.1). Some well-known Krylov solvers are CG for the Hermitian matrices and GMRES, GCR etc. in the non-Hermitian case. However, even these methods encounter problems due to their excessive storage requirements. A straightforward remedy is to use the restarted or truncated version of these methods. Nonetheless, this is not al-ways the best option, since restarted and truncated methods often experience slow convergence and even stagnation.

Enhancing the robustness of restarted and truncated methods was, and still is, an everlasting task. Improving on restarted GMRES resulted in many ad-vanced methods that are based on two concepts: deflation and augmentation, which play an important role throughout this thesis. The idea behind defla-tion is to remove the smallest eigenvalues from the spectrum, which should lead to a better convergence of the method, whereas augmentation simply means enlarging the current subspace with a carefully chosen subspace from the pre-vious cycle. As it will be shown in Section 3.5 these two concepts come along together naturally. The way in which they are employed gives rise to different

(8)

methods. The GMRES-E and GMRES-DR methods are algebraically equiva-lent, since they use the same subspace, i.e. the eigenspace corresponding to the smallest eigenvalues for the augmentation, but are rather based on different strategies. Another example is the “loose” GMRES method where one aims to keep a few error approximations which in some sense represent the previously built subspace. On the other hand, truncation might lead to poor convergence, since we keep only a certain number of vectors that we orthogonalize against in the Arnoldi process. Improving on truncated methods leads to more efficient methods like GCRO, which is by construction a truncated method. However, an inner orthogonalization scheme which is of outmost importance throughout this thesis seems to be quite effective. Furthermore, de Sturler proposed the concept of optimal truncation, in which we choose the best possible subspace to keep, thus developing the GCROT method.

Even though some of these methods can be easily modified for solving (1.1), they all exhibit certain flaws. Nonetheless, combining some of these methods, i.e. combining ideas, concepts and frameworks, leads to elegant and efficient methods, that recycle a judiciously selected subspace between systems (cycles) and use it to reduce costs of subsequent systems in the ensemble. The GCRO-DR method uses the framework of the GCRO method, i.e. it has an inner/outer scheme. However, it performs deflation in the same way as in GMRES-DR. To clarify further, in the outer method, i.e. GCR, one computes the approx-imate eigenspace, which is later on used by the inner method, i.e. GMRES for building the augmented space, and deflating the smallest eigenvalues. An important detail is that augmentation and deflation correspond to performing the Arnoldi process with the operator (𝐼 −𝐶𝐶𝐻_{)𝐴 within GMRES, where 𝐶 is}

the approximate eigenspace and 𝐼 −𝐶𝐶𝐻 _{is the orthogonal projector. In recent}

years, one more method named “loose” GCRO-DR (LGCRO-DR) for solving (1.1) was proposed, which was developed by straightforwardly incorporating the idea of recycling a few error approximations into GCRO-DR.

Our research is mainly based on the work by M. Gutknecht. In [Gut12] he compares two techniques, namely deflated GMRES and truly deflated GM-RES, without giving any details about the subspaces that we aim to recycle. Choosing the right harmonic Ritz vectors as the recycle subspace, deflated GMRES basically corresponds to the GCRO-DR method. On the other hand, “true” deflation means deflating both, left and right, eigenspaces, and hence represents a theoretically better approach. In order to do that, one has to use the operator (𝐼 − 𝐶 ̃𝐶𝐻_{)𝐴 within Arnoldi instead of (𝐼 − 𝐶𝐶}𝐻_{)𝐴, where the}

range of 𝐶 is spanned by left harmonic Ritz vectors, which have to be com-̃ puted additionally. Therefore, on top of the Gutknecht’s theory, we propose a cheap way of computing 𝐶, thus leading to a new method which represents̃

(9)

1.1 Outline 3

the truly deflated technique. We named the method Left-Right Deflated GM-RES (LRDGMGM-RES). The idea behind “loose” methods fits naturally into our method, and gives rise to the “loose” LRDGMRES method.

Deﬂated and augmented methods usually work better when used together with preconditioning. Often, the reliability of iterative methods for various applications depends more on the choice of the preconditioner, rather than on the acceleration technique employed. For this reason, preconditioning found its way into this thesis. Since the choice of the preconditioner depends on the application considered, we will discuss further this topic for the lattice QCD application, which was, due to our involvement in the projects Marie Curie Initial Training Network STRONGnet and SFB/Transregio 55 Hadronenphysik mit Gitter-QCD, of the most interest for us.

1.1 Outline

This thesis is organised as follows.

In Chapter 2 we gathered deﬁnitions and results that are scattered through-out the literature. The ﬁrst three sections contain basic properties of projec-tors, projection methods and Krylov subspaces, so even if the reader is familiar with these terms, we suggest skimming through these sections for the sake of notation. However, we would like to point to Section 2.4, where we introduce the concept of left Ritz and harmonic Ritz vectors, which, to our knowledge, is not known from the literature, and prove some canonical properties they satisfy.

In Chapter 3 we introduce two mathematically equivalent Krylov subspace methods, GMRES and GCR, for the solution of the nonsymmetric linear sys-tems and discuss further modifications of these methods. In practice, both methods are used in restarted or truncated form. In Section 3.5 we explain the concepts of augmentation and deflation and draw a connection between them. Furthermore, we present different techniques for using them, and we describe the resulting methods GMRES-E (Section 3.5.2), GMRES-DR (Section 3.5.3) and LGMRES (Section 3.5.4). Moreover, we describe the inner/outer scheme utilized by GMRESR (Section 3.3) and GCRO (Section 3.4), and later on by more advanced methods in Chapter 4. Since augmented and deflated methods often unfold their full potential when used with preconditioning, we briefly describe the right preconditioning in Section 3.6, which we further revisit in Chapter 5 for a particular application in lattice QCD.

Chapter 4 contains the main contribution of this thesis. In Section 4.1 we build on the GCRO framework by introducing the concept of optimal

(10)

trunca-tion. In Section 4.2 we give a detailed description of the GCRO-DR method, which was a starting point in our research, as well as two convergence results. Further, in Section 4.3 we show two ways of including the error approxima-tions in the recycling process, which is the only diﬀerence between LGCRO-DR and GCRO-DR. Finally, in Section 4.4.1 we work out details for our method LRDGMRES. Moreover, we propose two results for cheaply obtaining the left harmonic Ritz vetors and discuss some disadvantages of our method when com-pared to GCRO-DR. In last Section 4.4.2 of this chapter, we briefly explain the LLRDGMRES method.

We compare the four methods described in Chapter 4 for various applica-tions in Chapter 5. We consider in Section 5.1 an application which results in a sequence of symmetric matrices, only to demonstrate that GCRO-DR and LRDGMRES are equivalent in the Hermitian case. In Section 5.2 we con-struct a 3 × 3 example whose purpose is to show the power of the approach which utilizes the oblique projections. Further, we compare the methods for one nonsymmetric system from a ﬂuid dynamics application (Section 5.3) and a sequence of nonsymmetric systems arising in the Korringa-Kohn-Rostoker method in solid-state physics. The application of most interest for us was def-initely lattice QCD. We present results for 5 consecutive systems arising from the hybrid Monte Carlo integration in Section 5.5. Moreover, we apply the red-black multiplicative Schwarz method as a preconditioner, and show results in Section 5.5.1. Finally, Section 5.5.3 describes how diagonal similarity trans-formations of the matrices in a sequence can be exploited in lattice QCD to improve all four methods.

In Chapter 6 we give some ﬁnal remarks and conclusions about the work done within this thesis and discuss some further plans.

(11)

2 Basic Concepts

In this chapter we gather some basic definitions and results that are scattered throughout the literature and that are useful for the remainder of the thesis. Projectors play an important role in numerical linear algebra, as well as in this thesis, and therefore, we give a brief overview of basic properties. Most of the iterative techniques covered in later chapters utilize a projection process. Thus, we describe a basic projection step in its general form and present some theory we found useful. Next, we also briefly recall the theory behind Krylov subspaces, as well as the Arnoldi method. While all of the results regarding projections and Krylov subspaces are known and widely used, the last section of this chapter is different, as its definitions and results are not widely used in the literature. The concept of Ritz and harmonic Ritz pairs can be found in many books and papers, but they usually refer to the right pairs. In Section 2.4 we define, in addition, the left pairs and prove some useful properties they satisfy.

2.1 Projectors

Deﬁnition 2.1. A linear operator 𝑃 ∶ ℂ𝑛 _{→ ℂ}𝑛 _{is called a projector if 𝑃}2_{= 𝑃 .}

It follows immediately from the deﬁnition that if 𝑃 is a projector, then so is (𝐼 − 𝑃 ), and the following relations hold,

Ker(𝑃 ) = Ran(𝐼 − 𝑃 ) Ran(𝑃 ) = Ker(𝐼 − 𝑃 ).

The next two lemmas show that each projector is uniquely characterized by two subspaces, its range and null space. For proofs we refer to [Saa03], e.g.

Lemma 2.2. The space ℂ𝑛 _{can be decomposed as the direct sum}

ℂ𝑛 = Ker(𝑃 ) ⊕ Ran(𝑃 ).

Lemma 2.3. Every pair of subspaces 𝑀 and 𝑆 which forms a direct sum of

ℂ𝑛 deﬁnes a unique projector 𝑃 such that Ran(𝑃 ) = 𝑀 and Ker(𝑃 ) = 𝑆. The associated projector 𝑃 maps an element 𝑥 of ℂ𝑛 _{into the 𝑀 -component 𝑥}

1 in

the unique decomposition 𝑥 = 𝑥₁+ 𝑥₂, 𝑥₁∈ 𝑀 , 𝑥₂∈ 𝑆.

(12)

It is said that the projector 𝑃 projects onto the subspace 𝑀 and along the subspace 𝑆. In the literature, projectors are usually deﬁned through the orthogonal complement 𝐿 = 𝑆⟂ _{of the subspace 𝑆. The following equations}

deﬁne the projector 𝑃 onto 𝑀 and orthogonal to 𝐿

𝑃 𝑥 ∈ 𝑀 (2.1)

𝑥 − 𝑃 𝑥 ⟂ 𝐿. (2.2)

The following lemma gives us conditions under which it is possible to deﬁne such a projector. The proof follows immediately from Lemma 2.3 with 𝑆 = 𝐿⟂_.

Lemma 2.4. Given two subspaces 𝑀 and 𝐿 of the same dimension 𝑚, the following two conditions are mathematically equivalent.

1. No nonzero vector of 𝑀 is orthogonal to 𝐿

2. For any 𝑥 ∈ ℂ𝑛 _{there is a unique vector 𝑃 𝑥 which satisﬁes (2.1) and}

(2.2).

Next, we consider matrix representations of projectors. Let us assume that the columns 𝑣_𝑖 and 𝑤_𝑖 of 𝑉 and 𝑊 form orthonormal bases for subspaces 𝑀 and 𝐿, respectively. Since 𝑃 𝑥 ∈ 𝑀 , it can be written as

𝑃 𝑥 = 𝑉 𝑦. (2.3)

The constraint (2.2) is equivalent to the condition ⟨(𝑥 − 𝑉 𝑦), 𝑤_𝑗⟩ = 0, 𝑗 = 1, … , 𝑚. This can be rewritten in the matrix form

𝑊𝐻_{(𝑥 − 𝑉 𝑦) = 0.}

The previous equation yields the expression for computing 𝑦,

𝑊𝐻_{𝑥 = 𝑊}𝐻_{𝑉 𝑦} _⇔ _{𝑦 = (𝑊}𝐻_{𝑉 )}−1_𝑊𝐻_𝑥. _(2.4)

From (2.3) and (2.4) we get the matrix representation of the projector 𝑃 , 𝑃 = 𝑉 (𝑊𝐻_{𝑉 )}−1_𝑊𝐻_.

Under the assumptions of Lemma 2.4, the nonsingularity of the matrix 𝑊𝐻_𝑉

is guaranteed. In case that the two bases are biorthogonal, i.e. 𝑊𝐻_{𝑉 = 𝐼, we}

have, as a special case, the following representation of 𝑃 , 𝑃 = 𝑉 𝑊𝐻_.

(13)

2.1 Projectors 7

We distinguish two diﬀerent classes of projectors. In the case when the subspace 𝐿 is equal to the subspace 𝑀 it is said that 𝑃 is the orthogonal projector onto 𝑀 . A projector that is not orthogonal is called oblique. In order to provide the condition under which a projector is orthogonal, we have to deﬁne the adjoint 𝑃𝐻 _{of the projector 𝑃 and consider some of the properties}

of the adjoint.

Deﬁnition 2.5. The mapping 𝑃𝐻 _{is the adjoint of 𝑃 if}

(𝑃𝐻_{𝑥, 𝑦) = (𝑥, 𝑃 𝑦),} _{∀𝑥, ∀𝑦.}

It is easily shown that 𝑃𝐻 _{is also a projector,}

((𝑃𝐻₎2_{𝑥, 𝑦) = (𝑃}𝐻_{𝑥, 𝑃 𝑦) = (𝑥, 𝑃}2_{𝑦) = (𝑥, 𝑃 𝑦) = (𝑃}𝐻_{𝑥, 𝑦).}

The following relations

Ker(𝑃𝐻_{) = Ran(𝑃 )}⟂

Ker(𝑃 ) = Ran(𝑃𝐻₎⟂

hold as a consequence of Deﬁnition 2.5 and lead to this important result, see [Saa03].

Proposition 2.6. A projector is orthogonal if and only if it is Hermitian.

We conclude this section with a few basic properties of orthogonal projectors.

Lemma 2.7. Let 𝑃 be an orthogonal projector. Then the two vectors 𝑃 𝑥 and

(𝐼 − 𝑃 )𝑥 are orthogonal and the following holds ‖𝑥‖2 2 = ‖𝑃 𝑥‖ 2 2+ ‖(𝐼 − 𝑃 )𝑥‖ 2 2. (2.5)

This is just a consequence of Pythagoras’ theorem. It follows directly from (2.5) that ‖𝑃 𝑥‖2 2 ≤ ‖𝑥‖ 2 2, i.e. ‖𝑃 𝑥‖ 2 2/ ‖𝑥‖ 2

2 ≤ 1. In addition, the value 1 is

reached for any element in Ran(𝑃 ). Thus, ‖𝑃 ‖

2 = 1, unless 𝑃 = 0.

Remark 2.8. An orthogonal projector has only two eigenvalues, 0 or 1. Vectors in the range of 𝑃 are eigenvectors associated with the eigenvalue 1, and vectors in the null space of 𝑃 are the eigenvectors corresponding to the eigenvalue 0.

Geometrically, the orthogonal projection of a vector 𝑥 ∈ ℂ𝑛 _{onto the}

sub-space 𝑀 is the shortest distance from that subsub-space, as formulated in the next theorem, see [Saa03], e.g.

(14)

Theorem 2.9. Let 𝑃 be the orthogonal projector onto a subspace 𝑀 . Then for any given vector 𝑥 ∈ ℂ𝑛_{, the following is true:}

min

𝑦∈𝑀‖𝑥 − 𝑦‖2= ‖𝑥 − 𝑃 𝑥‖2.

It is possible to reformulate this result in a form of necessary and suﬃcient conditions such that we can determine the best approximation to a given vec-tor.

Corollary 2.10. Let a subspace 𝑀 and a vector 𝑥 ∈ ℂ𝑛 _{be given and let}

𝑦∗ _{= 𝑃 𝑥. Then}

min

𝑦∈𝑀‖𝑥 − 𝑦‖2= ‖𝑥 − 𝑦 ∗_‖

2,

if and only if the following two conditions are satisﬁed, 𝑦∗ _{∈ 𝑀}

𝑥 − 𝑦∗_{⟂ 𝑀 .}

2.2 Projection Methods

The main subject of the thesis is solving sequences of linear systems. Let us ﬁrst consider solving a single linear system

𝐴𝑥 = 𝑏, 𝐴 ∈ ℂ𝑛×𝑛_, _{𝑥, 𝑏 ∈ ℂ}𝑛_. _(2.6)

Most of the existing iterative methods for solving (2.6) utilize a projection process. The main idea of a projection method is to extract an approximate solution to the problem (2.6) from a subspace 𝒦 ⊆ ℂ𝑛_{, which is called the}

search subspace. Considering that we usually want to exploit the knowledge of an initial guess 𝑥₀, the solution is, therefore, sought in an aﬃne space 𝑥₀+ 𝒦. If the dimension of the subspace 𝒦 is 𝑚, then, in general, 𝑚 constraints must be imposed in order to uniquely extract such an approximation. Typically, the residual vector 𝑟 = 𝑏 − 𝐴𝑥 is constrained to be orthogonal to another subspace ℒ, which is called the subspace of constraints. This framework is well-known as the Galerkin approach. In the special case when ℒ = 𝒦, the Petrov-Galerkin approach is often called the Petrov-Galerkin approach.

A basic projection technique onto the subspace 𝒦 and orthogonal to ℒ, as described above, can be deﬁned as follows:

(15)

2.2 Projection Methods 9

Writing ̃𝑥 = 𝑥₀+𝛿, where 𝛿 ∈ 𝒦, the orthogonality condition can be rewritten as

𝑏 − 𝐴 ̃𝑥 ⟂ ℒ ⇔ 𝑏 − 𝐴(𝑥₀+ 𝛿) ⟂ ℒ ⇔ 𝑟₀− 𝐴𝛿 ⟂ ℒ,

where 𝑟₀ = 𝑏 − 𝐴𝑥₀is the initial residual. This leads us to the basic projection step:

̃

𝑥 = 𝑥₀+ 𝛿, 𝛿 ∈ 𝒦, (2.8)

⟨𝑟₀− 𝐴𝛿, 𝑤⟩ = 0, ∀𝑤 ∈ ℒ. (2.9)

Most of the iterative methods use a succession of such projections, where, typically, in each step a new pair of subspaces 𝒦 and ℒ is used and the new initial guess 𝑥₀ is equal to the most recent approximation obtained from the previous projection step.

We distinguish two classes of projection techniques: orthogonal and oblique. In an orthogonal projection method, the subspace ℒ is equal to the subspace 𝒦, while in an oblique projection method they are diﬀerent and may be com-pletely unrelated. 𝒦 ℒ 𝑥 𝑃𝒦𝑥 𝑄ℒ 𝒦𝑥

Figure 2.1: Orthogonal and oblique projectors

Figure 2.1 (reproduced from [Saa03]) illustrates 𝑃_𝒦, the orthogonal projec-tor onto the subspace 𝒦, while 𝑄ℒ

𝒦 illustrates the oblique projector onto 𝒦,

orthogonally to ℒ, i.e.

𝑃_𝒦𝑥 ∈ 𝒦, 𝑥 − 𝑃_𝒦𝑥 ⟂ 𝒦 𝑄ℒ

𝒦𝑥 ∈ 𝒦, 𝑥 − 𝑄ℒ𝒦𝑥 ⟂ ℒ.

Choosing different subspaces ℒ gives rise to many different algorithms. Throughout this thesis, we will focus on the case ℒ = 𝐴𝒦, where 𝒦 is a Krylov subspace, which we define in the next section.

(16)

Geometrically, the orthogonality condition (2.9) for the case ℒ = 𝐴𝒦 means that the vector 𝐴𝛿 is the orthogonal projection of the vector 𝑟₀ onto the sub-space 𝐴𝒦. Hence, the following holds.

Proposition 2.11. Let 𝐴 be an arbitrary square matrix and assume that

ℒ = 𝐴𝒦. Then, the following holds:

1. A vector ̃𝑥 is the result of an oblique projection method onto 𝒦 orthog-onally to ℒ with the starting vector 𝑥₀ if and only if it minimizes the 2-norm of the residual vector 𝑏 − 𝐴𝑥 over 𝑥 ∈ 𝑥₀+ 𝒦, i.e. iﬀ

𝑅( ̃𝑥) = min

𝑥∈𝑥0+𝒦

𝑅(𝑥), (2.10)

where 𝑅(𝑥) ≡ ‖𝑏 − 𝐴𝑥‖₂.

2. Let ̃𝑟 = 𝑏 − 𝐴 ̃𝑥 be the residual associated with the approximate solution

̃

𝑥. Then

̃

𝑟 = (𝐼 − 𝑃 )𝑟₀, (2.11)

where 𝑃 denotes the orthogonal projector onto the subspace 𝐴𝒦. Con-sequently

‖ ̃𝑟‖ ≤ ‖𝑟₀‖. (2.12)

Proof. 1. It follows from Corollary 2.10 that for a vector ̃𝑥 to be the mini-mizer of 𝑅(𝑥) it is necessary and suﬃcient that 𝑏 − 𝐴 ̃𝑥 be orthogonal to all vectors of the form 𝐴𝑦, where 𝑦 belongs to 𝒦, i.e.

⟨𝑏 − 𝐴 ̃𝑥, 𝐴𝑦⟩ = 0, ∀𝑦 ∈ 𝒦,

which is exactly the Petrov-Galerkin condition that deﬁnes the approxi-mate solution ̃𝑥.

2. The inequality (2.12) is an immediate consequence of the equation (2.10), while the equation (2.11) follows from the fact that the vector 𝐴𝛿 in (2.9) is the orthogonal projection of the vector 𝑟₀ onto the subspace 𝐴𝒦.

Methods relying on Proposition 2.11 are known as residual projection meth-ods.

We complete this section with the matrix representation of the expression for the approximate solution of (2.6). Let the columns of matrices 𝑉 and 𝑊 form bases of 𝒦 and ℒ, respectively. The approximate solution can be written as

(17)

2.3 Krylov Subspaces 11

Then, the orthogonality condition (2.9) can be rewritten as 𝑊𝑇_{𝐴𝑉 𝑦 = 𝑊}𝑇_𝑟

0.

Under the assumption that the matrix 𝑊𝑇_{𝐴𝑉 is nonsingular, we obtain the}

expression for the approximate solution ̃𝑥 of (2.6)

̃

𝑥 = 𝑥₀+ 𝑉 (𝑊𝑇_{𝐴𝑉 )}−1_𝑊𝑇_𝑟 0.

The matrix 𝑊𝑇_{𝐴𝑉 does not have to be nonsingular, i.e. the assertions of}

Lemma 2.4 are not necessarily fulﬁlled, even when the matrix 𝐴 is nonsingular. Conditions under which the nonsingularity of the matrix 𝑊𝑇_{𝐴𝑉 is guaranteed}

are discussed next, see [Saa03].

Proposition 2.12. Let 𝐴, 𝒦 and ℒ satisfy either one of the following condi-tions:

1. 𝐴 is positive deﬁnite and ℒ = 𝒦, or 2. 𝐴 is nonsingular and ℒ = 𝐴𝒦.

Then the matrix 𝑊𝑇_{𝐴𝑉 is nonsingular for any bases 𝑉 and 𝑊 of 𝒦 and ℒ,}

respectively.

2.3 Krylov Subspaces

Since the 1950s, splitting methods were widely used to solve (2.6). The basic idea of splitting methods is to decompose the matrix 𝐴 = 𝑀 − 𝑁 , where 𝑀 is nonsingular. The solution is then obtained iteratively via the recurrence

𝑥_𝑚+1 = 𝑀−1_{𝑁 𝑥}

𝑚+ 𝑀−1𝑏, 𝑚 = 0, 1, ..., (2.13)

where 𝑥₀ is an arbitrary vector. Convergence is not guaranteed for arbitrary splittings. The following theorem provides the necessary and suﬃcient condi-tion for convergence.

Theorem 2.13. The iteration (2.13) converges to the solution of the system

𝑥∗ _{= 𝐴}−1_{𝑏 for any starting vector 𝑥}

0 and any right-hand side 𝑏 if and only if

𝜌(𝑀−1_{𝑁 ) < 1.}

In practice, one does not want to compute the spectral radius of a matrix, since this can be very expensive. Instead, one uses the following result, which utilizes the inequality 𝜌(𝑀−1_{𝑁 ) ≤ ‖𝑀}−1_{𝑁 ‖.}

(18)

Corollary 2.14. Let ‖𝑀−1_{𝑁 ‖ < 1 for some operator norm. Then the iteration}

(2.13) converges to the solution of the system 𝑥∗ _{= 𝐴}−1_{𝑏 for any starting vector}

𝑥₀ and any right-hand side 𝑏.

Today, splitting methods are used mostly as preconditioners or as smoothers in multigrid methods and more advanced iterative techniques are used to tackle (2.6). The most common of these advanced techniques are Krylov subspace methods, in which the subspace 𝒦 in (2.7) is a Krylov subspace.

Deﬁnition 2.15. Let 𝐴 ∈ ℂ𝑛×𝑛 _{and 𝑟 ∈ ℂ}𝑛_{. A Krylov subspace of dimension}

𝑚 is deﬁned as

𝒦_𝑚(𝐴, 𝑟) = span {𝑟, 𝐴𝑟, … , 𝐴𝑚−1_𝑟}

= {𝑥 ∈ ℂ𝑛 _{∶ 𝑥 = 𝑝}

𝑚−1(𝐴)𝑟, 𝑝𝑚−1∈ Π𝑚−1} . (2.14)

It follows from the deﬁnition that the sequence 𝒦₁(𝐴, 𝑟), … , 𝒦_𝑚(𝐴, 𝑟) of Krylov subspaces is nested. Another important property is that the dimen-sion 𝑚 cannot grow arbitrarily. To investigate this further we must recall the deﬁnition of the minimal polynomial and the grade.

Deﬁnition 2.16. The minimal polynomial of a vector 𝑟 with respect to the matrix 𝐴 ∈ ℂ𝑛×𝑛 _{is the nonzero monic polynomial 𝜒 of lowest degree such that}

𝜒(𝐴)𝑟 = 0. The degree, 𝛾, of the minimal polynomial is called the grade of 𝑟 (with respect to 𝐴).

The following proposition determines the dimension of 𝒦_𝑚 in general. For the proof consult [Saa03].

Proposition 2.17. Let 𝐴 ∈ ℂ𝑛×𝑛_{, 𝑟 ∈ ℂ}𝑛 _{and let 𝛾 be the grade of 𝑟. Then:}

a) 𝒦_𝛾 is invariant under 𝐴 and 𝒦_𝑚 = 𝒦_𝛾 for all 𝑚 ≥ 𝛾.

b) The Krylov subspace 𝒦_𝑚is of dimension 𝑚 if and only if 𝛾 ≥ 𝑚. Therefore, dim(𝒦_𝑚) = min{𝑚, 𝛾}.

The reason for Krylov subspaces being such an important concept in nu-merical linear algebra lies in the fact that 𝐴−1_{𝑟, the action of the inverse of a}

non-singular matrix 𝐴 ∈ ℂ𝑛×𝑛 _{on a vector 𝑟, can be expressed as a polynomial}

𝑝 in 𝐴 of degree 𝛾 − 1, with 𝛾 the grade of 𝑟. It follows from Deﬁnition 2.16 that

𝜒_𝛾(𝐴)𝑟₀ = 0, which further implies

𝐴−1_𝑟

(19)

2.3 Krylov Subspaces 13

where

𝑞_𝛾−1(𝐴) = 𝜒𝛾(𝐴) − 𝜒𝛾(0) 𝜒_𝛾(0) ,

and 𝜒_𝛾(0) ≠ 0, since 𝐴 is a non-singular matrix. Thus, the solution of the system (2.6) can be computed as

𝑥∗_{= 𝑥}

0+ 𝐴−1𝑟0 = 𝑥0+ 𝑞𝛾−1(𝐴)𝑟0.

On the other hand, from (2.14), we see that the iterates extracted from the Krylov subspace are exactly of this form, i.e.

𝐴−1_{𝑏 ≈ 𝑥}

𝑚= 𝑥0+ 𝑝𝑚−1(𝐴)𝑟0,

for some polynomial 𝑝_𝑚−1 of dimension 𝑚 − 1.

Due to the properties of the power iteration, the vectors 𝑟, 𝐴𝑟, … , 𝐴𝑚−1_𝑟

become almost linearly dependent and are not a good choice for the basis of the Krylov subspace 𝒦_𝑚. Therefore, methods based on Krylov subspaces usually involve some orthogonalization scheme. Considering that we will be dealing with nonsymmetric systems, we will describe the Arnoldi process.

2.3.1 The Arnoldi Process

The Arnoldi process [Arn51] is an orthogonalization procedure which builds up matrices of Hessenberg form. It turns out that the eigenvalues of the Hessenberg matrix are good approximations to some eigenvalues of the original matrix, which leads to an eﬃcient algorithm for approximating the eigenvalues of large sparse matrices. The Arnoldi method is based on the Gram-Schmidt algorithm. We will describe the version that uses modiﬁed Gram-Schmidt, a more stable algorithm than classical Gram-Schmidt. There are some other versions like Householder Arnoldi [Wal88] etc., but we will not discuss them here.

Most of the Krylov subspace methods described in the following chapters will be based on the Arnoldi method. Therefore, we give here some important basic properties of the process. The Arnoldi process is described as Algorithm 2.1. It computes a sequence of orthonormal vectors 𝑣₁, 𝑣₂, … such that 𝑣₁, … , 𝑣_𝑚 is an orthonormal basis of 𝒦_𝑚(𝐴, 𝑣₁).

Proposition 2.18. Assume that Algorithm 2.1 does not stop before the 𝑚th step, i.e. ℎ_𝑘+1,𝑘≠ 0. Then the vectors 𝑣₁, 𝑣₂, … , 𝑣_𝑚 form an orthonormal basis of the Krylov subspace 𝒦_𝑚(𝐴, 𝑣₁).

(20)

Algorithm 2.1: Arnoldi Process

Input : 𝐴 ∈ ℂ𝑛×𝑛 _{system matrix}

𝑣1∈ ℂ𝑛 starting vector

𝑚 number of basis vectors to build

Output: {𝑣1, 𝑣2, … , 𝑣𝑚} orthonormal basis of 𝒦𝑚(𝐴, 𝑣1)

1 𝛽 = ‖𝑣₁‖

2 2 𝑣₁= 𝑣₁/𝛽

3 for 𝑘 = 1, 2, … , 𝑚 do

4 𝑤 = 𝐴𝑣_𝑘

5 for 𝑖 = 1, 2, … , 𝑘 do // modiﬁed Gram-Schmidt

6 ℎ_𝑖,𝑘= ⟨𝑤, 𝑣_𝑖⟩

7 𝑤 = 𝑤 − ℎ_𝑖,𝑘𝑣_𝑖

8 ℎ_𝑘+1,𝑘= ‖𝑤‖

2

9 if ℎ_𝑘+1,𝑘= 0 then

10 set 𝑚 = 𝑘 and Stop

11 𝑣_𝑘+1= 𝑤/ℎ_𝑘+1,𝑘

Proposition 2.19. Let 𝑉_𝑚 be the 𝑛×𝑚 matrix with column vectors 𝑣₁, … , 𝑣_𝑚,

̄

𝐻_𝑚 the (𝑚 + 1) × 𝑚 Hessenberg matrix whose nonzero entries ℎ_𝑖𝑗 are deﬁned by Algorithm 2.1 and 𝐻_𝑚 the matrix obtained from𝐻̄_𝑚 by deleting its last row. Then the following relations hold:

𝐴𝑉_𝑚 = 𝑉_𝑚𝐻_𝑚+ ℎ_𝑚+1,𝑚𝑣_𝑚+1𝑒𝑇 𝑚

= 𝑉_𝑚+1𝐻̄_𝑚 (2.15)

𝑉𝑇

𝑚𝐴𝑉𝑚 = 𝐻𝑚.

The relation (2.15) is known as the Arnoldi relation.

Algorithm 2.1 may break down if ℎ_𝑘+1,𝑘 = 0 in line 8. In this case 𝑣_𝑘+1 cannot be computed and the algorithm stops. The following proposition gives the conditions under which this occurs.

Proposition 2.20. The Arnoldi process breaks down at step 𝑘, i.e. ℎ_𝑘+1,𝑘= 0 in line 8 of Algorithm 2.1 if and only if the grade of 𝑣₁ is 𝑘. Moreover, in this case the subspace 𝒦_𝑘 is invariant under 𝐴.

The eigenvalues 𝜆_𝑖, 𝑖 = 1, … , 𝑚 of the resulting Hessenberg matrix 𝐻_𝑚 are good approximations to some of the eigenvalues of the matrix 𝐴. They are known as the Ritz values of the matrix 𝐴 w.r.t. 𝒦_𝑚. The corresponding Ritz vectors are 𝑉_𝑚𝑦_𝑖, 𝑖 = 1, … , 𝑚, where 𝑦_𝑖, 𝑖 = 1, … , 𝑚 are the eigenvectors of

(21)

2.4 Ritz and Harmonic Ritz Pairs 15

the matrix 𝐻_𝑚 belonging to eigenvalues 𝜆_𝑖, 𝑖 = 1, … , 𝑚. In practice, it is usually expensive to compute the eigenvalues of 𝐴. Thus, we rather compute approximations to the wanted eigenvalues, i.e. the Ritz values or some variants. Hence, we dedicate the next section to some general deﬁnitions and properties of Ritz and harmonic Ritz pairs.

2.4 Ritz and Harmonic Ritz Pairs

Deﬁnition 2.21. Let 𝐴 ∈ ℂ𝑛×𝑛 _{and 𝒦 ⊆ ℂ}𝑛 _{be any subspace. Then 𝑦 ∈ 𝒦}

is a 𝐫𝐢𝐠𝐡𝐭 Ritz vector of 𝐴 with respect to 𝒦 with a Ritz value 𝜆 if

𝐴𝑦 − 𝜆𝑦 ⟂ 𝒦. (2.16)

The concept of a 𝐥𝐞𝐟 𝐭 Ritz pair is not widely used in the literature. Note that a left eigenpair (𝜇, 𝑧) satisﬁes 𝑧𝐻_{𝐴 − 𝜇𝑧}𝐻 _{= 0 ⇔ 𝐴}𝐻_{𝑧 − ̄}_{𝜇𝑧 = 0. Thus,}

the following deﬁnition for a left Ritz pair comes naturally.

Deﬁnition 2.22. Let 𝐴 ∈ ℂ𝑛×𝑛 _{and 𝒦 ⊆ ℂ}𝑛 _{be any subspace. Then 𝑧 ∈ 𝒦}

is a 𝐥𝐞𝐟 𝐭 Ritz vector of 𝐴 with respect to 𝒦 with a Ritz value 𝜇 if

𝐴𝐻_{𝑧 − ̄}_{𝜇𝑧 ⟂ 𝒦.} _(2.17)

The following Lemma shows that with Definition 2.22 certain useful prop-erties are satisfied. Let us first denote with Λ = {𝜆₁, … , 𝜆_𝑗} the set of all right Ritz values and with 𝑀 = {𝜇₁, … , 𝜇_𝑗} the set of all left Ritz values of the matrix 𝐴 w.r.t. to the same subspace 𝒦, where 𝑗 is the dimension of the subspace 𝒦.

Lemma 2.23. Let (𝜆, 𝑦) be a right and (𝜇, 𝑧) be a left Ritz pair of 𝐴 w.r.t. the same subspace 𝒦. Then, if 𝜆 ≠ 𝜇, we have ⟨𝑦, 𝑧⟩ = 0. In addition, Λ = 𝑀 .

Proof. We ﬁrst prove that Λ = 𝑀 . Let us assume that the columns of 𝑉 ∈ ℂ𝑛×𝑚 form an orthonormal basis of 𝒦, and let 𝑦 = 𝑉 𝑢. Then (2.16) is equivalent to

𝑉𝐻_{𝐴𝑉 𝑢 = 𝜆𝑉}𝐻_{𝑉 𝑢 = 𝜆𝑢.}

This shows that the right Ritz vectors 𝑦 are of the form 𝑦 = 𝑉 𝑢 with 𝑢 an eigenvector of 𝑉𝐻_{𝐴𝑉 , and 𝜆 the corresponding eigenvalue. Similarly, with}

𝑧 = 𝑉 𝑤, (2.17) is equivalent to

(22)

which shows that the left Ritz vectors 𝑧 are of the form 𝑧 = 𝑉 𝑤 with 𝑤 an eigenvector of 𝑉𝐻_𝐴𝐻_{𝑉 and ̄}_{𝜇 the corresponding eigenvalue. We have}

𝑀 = 𝑠𝑝𝑒𝑐(𝑉𝐻_𝐴𝐻_{𝑉 ) = 𝑠𝑝𝑒𝑐((𝑉}𝐻_𝐴𝐻_{𝑉 )}𝐻_{) = 𝑠𝑝𝑒𝑐(𝑉}𝐻_{𝐴𝑉 ) = Λ,}

which proves 𝑀 = Λ.

Now let 𝜆 ≠ 𝜇 with (𝜆, 𝑦) a right and (𝜇, 𝑧) a left Ritz pair. Then, by (2.16) we have

⟨𝐴𝑦, 𝑧⟩ = ⟨𝜆𝑦 + 𝑠, 𝑧⟩ 𝑤𝑖𝑡ℎ 𝑠 ∈ 𝒦⊥_,

and thus

⟨𝐴𝑦, 𝑧⟩ = ⟨𝜆𝑦, 𝑧⟩ = 𝜆⟨𝑦, 𝑧⟩. (2.18)

Similarly, using (2.17) we get

⟨𝑦, 𝐴𝐻_{𝑧⟩ = ⟨𝑦, ̄}_{𝜇𝑧 + 𝑡⟩} _{𝑤𝑖𝑡ℎ 𝑡 ∈ 𝒦}⊥_,

and thus

⟨𝑦, 𝐴𝐻_{𝑧⟩ = ⟨𝑦, ̄}_{𝜇𝑧⟩ = 𝜇⟨𝑦, 𝑧⟩.} _(2.19)

From (2.18) and (2.19) we get

𝜆⟨𝑦, 𝑧⟩ = ⟨𝐴𝑦, 𝑧⟩ = ⟨𝑦, 𝐴𝐻_{𝑧⟩ = 𝜇⟨𝑦, 𝑧⟩,}

and thus ⟨𝑦, 𝑧⟩ = 0 if 𝜆 ≠ 𝜇.

Ritz vectors tend to approximate the extremal eigenvalues of 𝐴 well, but can give poor approximations to the interior eigenvalues, see [Par98]. And, in our computations, we are interested in interior eigenvalues. Switching from 𝐴 to 𝐴−1_{, extremal and interior eigenvalues change their roles, so that the}

inverses of Ritz values of 𝐴−1 _{should give good approximations to the interior}

eigenvalues of 𝐴, see [Mor91]. We will now deﬁne harmonic Ritz values as the Ritz values of 𝐴−1 _{w.r.t. the subspace 𝐴𝒦, see [PPV95].}

Deﬁnition 2.24. Let 𝐴 ∈ ℂ𝑛×𝑛 _{and 𝒦 ⊆ ℂ}𝑛 _{be any subspace. Then ̃}_{𝑦 ∈ 𝐴𝒦}

is a 𝐫𝐢𝐠𝐡𝐭 harmonic Ritz vector of 𝐴 with respect to the subspace 𝐴𝒦 with a harmonic Ritz value 1/𝜃 if

𝐴−1_{𝑦 − 𝜃 ̃}_̃ _{𝑦 ⟂ 𝐴𝒦.} _(2.20)

Approximating the eigenvalues of 𝐴−1 _{using the subspace 𝐴𝒦 has the}

ad-vantage that we do not need 𝐴−1 _{explicitly, as we will see later. As before, we}

propose an appropriate deﬁnition of the left harmonic Ritz pair, which seems to be new.

(23)

2.4 Ritz and Harmonic Ritz Pairs 17

Deﬁnition 2.25. Let 𝐴 ∈ ℂ𝑛×𝑛 _{and 𝒦 ⊆ ℂ}𝑛 _{be any subspace. Then ̃}_{𝑧 ∈ 𝐴𝒦}

is a 𝐥𝐞𝐟 𝐭 harmonic Ritz vector of 𝐴 with respect to the subspace 𝐴𝒦 with harmonic Ritz value 1/𝜂 if

𝐴−𝐻_{𝑧 − ̄}_̃ _{𝜂 ̃}_{𝑧 ⟂ 𝐴𝒦.} _(2.21)

Remark 2.26. In the Hermitian case, regardless of the choice of the subspace, right and left harmonic Ritz vectors are equal, since (2.20) and (2.21) become equivalent conditions.

Deﬁnition 2.25 will be very useful throughout this thesis. It will yield a cheap way of obtaining left harmonic Ritz vectors from the Krylov subspace generated by 𝐴. This way, we will avoid additional multiplications by 𝐴𝐻_{, as}

will be seen in Chapter 4.

In a similar fashion as Lemma 2.23, Lemma 2.27 states that left harmonic Ritz pairs have canonical properties. Let us denote with Θ = {𝜃₁, … , 𝜃_𝑗} the set of all right harmonic Ritz values and with 𝑁 = {𝜂₁, … , 𝜂_𝑗} the set of all left harmonic Ritz values of the matrix 𝐴 w.r.t. the same subspace 𝐴𝒦, where, as before, 𝑗 is the dimension of the subspace 𝒦.

Lemma 2.27. Let (1/𝜃, ̃𝑦) be a right and (1/𝜂, ̃𝑧) a left harmonic Ritz pair of 𝐴 w.r.t. the same subspace 𝐴𝒦. Then, if 𝜃 ≠ 𝜂, we have ⟨ ̃𝑦, ̃𝑧⟩ = 0. In addition, Θ = 𝑁 .

Proof. The proof is similar to the proof of Lemma 2.23. First, we prove that Θ = 𝑁 . Let us assume that the columns of 𝑉 ∈ ℂ𝑛×𝑚 _{form an orthonormal}

basis of 𝒦, and let ̃𝑦 = 𝐴𝑉 𝑢. Then (2.20) is equivalent to 𝑉𝐻_𝐴𝐻_𝐴−1_{𝐴𝑉 𝑢 = 𝜃𝑉}𝐻_𝐴𝐻_{𝐴𝑉 𝑢.}

It follows that (𝑉𝐻_𝐴𝐻_{𝐴𝑉 )}−1_𝑉𝐻_𝐴𝐻_{𝑉 𝑢 = 𝜃𝑢, which shows that the right}

harmonic Ritz vectors ̃𝑦 are of the form ̃𝑦 = 𝐴𝑉 𝑢 with 𝑢 an eigenvector of (𝑉𝐻_𝐴𝐻_{𝐴𝑉 )}−1_𝑉𝐻_𝐴𝐻_{𝑉 , and 𝜃 the corresponding eigenvalue. Similarly, with}

̃

𝑧 = 𝐴𝑉 𝑤, (2.21) is equivalent to

𝑉𝐻_𝐴𝐻_𝐴−𝐻_{𝐴𝑉 𝑤 = ̄}_𝜂𝑉𝐻_𝐴𝐻_{𝐴𝑉 𝑤.}

It follows that (𝑉𝐻_𝐴𝐻_{𝐴𝑉 )}−1_𝑉𝐻_{𝐴𝑉 𝑤 = ̄}_{𝜂𝑤, which shows that the left}

har-monic Ritz vectors 𝑧 are of the form̃ 𝑧 = 𝐴𝑉 𝑤 with 𝑤 an eigenvector of̃ (𝑉𝐻_𝐴𝐻_{𝐴𝑉 )}−1_𝑉𝐻_{𝐴𝑉 , and ̄}_{𝜂 the corresponding eigenvalue. From the fact that}

((𝑉𝐻_𝐴𝐻_{𝐴𝑉 )}−1_𝑉𝐻_{𝐴𝑉 )}𝐻 _{= 𝑉}𝐻_𝐴𝐻_{𝑉 (𝑉}𝐻_𝐴𝐻_{𝐴𝑉 )}−1_{, we conclude}

𝑁 = 𝑠𝑝𝑒𝑐((𝑉𝐻_𝐴𝐻_{𝐴𝑉 )}−1_𝑉𝐻_{𝐴𝑉 )}

= 𝑠𝑝𝑒𝑐(((𝑉𝐻_𝐴𝐻_{𝐴𝑉 )}−1_𝑉𝐻_{𝐴𝑉 )}𝐻₎

(24)

Using the well-known identity 𝑠𝑝𝑒𝑐(𝐵𝐶) = 𝑠𝑝𝑒𝑐(𝐶𝐵), with 𝐵 = 𝑉𝐻_𝐴𝐻_{𝑉 and} 𝐶 = (𝑉𝐻_𝐴𝐻_{𝐴𝑉 )}−1_{, it follows} 𝑠𝑝𝑒𝑐(𝑉𝐻_𝐴𝐻_{𝑉 (𝑉}𝐻_𝐴𝐻_{𝐴𝑉 )}−1₎ = 𝑠𝑝𝑒𝑐((𝑉𝐻_𝐴𝐻_{𝐴𝑉 )}−1_𝑉𝐻_𝐴𝐻_{𝑉 )} = Θ, which proves 𝑁 = Θ.

Now let 𝜃 ≠ 𝜂 with (1/𝜃, ̃𝑦) a right and (1/𝜂, ̃𝑧) a left harmonic Ritz pair. Then, by (2.20) we have

⟨𝐴−1_{𝑦, ̃}_̃ _{𝑧⟩ = ⟨𝜃 ̃}_{𝑦 + ̃}_{𝑠, ̃}_𝑧⟩ _{𝑤𝑖𝑡ℎ} _{𝑠 ∈ (𝐴𝒦)}_̃ ⊥_,

and thus

⟨𝐴−1_{𝑦, ̃}_̃ _{𝑧⟩ = ⟨𝜃 ̃}_{𝑦, ̃}_{𝑧⟩ = 𝜃⟨ ̃}_{𝑦, ̃}_𝑧⟩. _(2.22)

Similarly, using (2.21) we get

⟨ ̃𝑦, 𝐴−𝐻_{𝑧⟩ = ⟨ ̃}_{𝑦, ̄}_{𝜂 ̃}_{𝑧 + ̃𝑡⟩ 𝑤𝑖𝑡ℎ} _{̃𝑡 ∈ (𝐴𝒦)}⊥_,

and thus

⟨ ̃𝑦, 𝐴−𝐻_{𝑧⟩ = ⟨ ̃}_̃ _{𝑦, ̄}_{𝜂 ̃}_{𝑧⟩ = 𝜂⟨ ̃}_{𝑦, ̃}_𝑧⟩. _(2.23)

From (2.22) and (2.23) we get

𝜃⟨ ̃𝑦, ̃𝑧⟩ = ⟨𝐴−1_{𝑦, ̃}_̃ _{𝑧⟩ = ⟨ ̃}_{𝑦, 𝐴}−𝐻_{𝑧⟩ = 𝜂⟨ ̃}_̃ _{𝑦, ̃}_𝑧⟩,

and thus ⟨ ̃𝑦, ̃𝑧⟩ = 0 if 𝜃 ≠ 𝜂.

In this section we developed theory for an arbitrary subspace 𝒦. Starting with the next chapter, 𝒦 will be the 𝑚-th Krylov subspace 𝒦_𝑚(𝐴, 𝑟), if not stated otherwise.

(25)

3 GMRES, GCR and Deﬂation

In this chapter we recall some of the well-known methods for solving a single, non-symmetric linear system (2.6). A common choice is the GMRES method [SS86], a projection based method which minimizes the 2-norm of the resid-ual vector in each step. Recent demands in simulations led to a signiﬁcant increase in sizes of systems to solve, which made GMRES impractical due to the excessive storage or computational time requirements. Possible remedies are to restart, an approach that we discuss in Section 3.1.1, to truncate, etc. (see, e.g., [Saa03]).

The GCR method [EES83] is a mathematically equivalent method to GM-RES. Although we do not consider GCR as a means to solve the system, the method framework plays an important role throughout this thesis. We also introduce the GMRESR method [VV94], which uses an inner method, which is GMRES (𝑚 steps of GMRES), and an outer method, which is GCR. The GMRESR method provides only suboptimal corrections to the solution and, in addition, it exhibits some of the problems of the restarted GMRES method. Preserving the GCR orthogonality relations within the inner method, i.e. within GMRES, leads to a more eﬃcient method GCRO (GCR with inner orthogonalization) [Stu96a]. Although GCRO provides the optimal correc-tions to the solution, the storage requirements might be excessive. Hence, in the next chapter we will introduce the concept of optimal truncation which together with the GCRO framework results in the GCROT method [Stu99].

Much work has been invested in enhancing the robustness of the restarted GMRES method. Most of the methods derived since, are based on two con-cepts: deflation, i.e. removing the “problematic” (usually the smallest) eigen-values from the spectrum and augmentation, i.e. enlarging the search space with carefully chosen vectors from previous cycles. The choice of the vectors and the way in which they are used gives rise to many different methods. We present GMRES-E (GMRES with eigenvectors) [Mor95] and GMRES-DR (De-flated and restarted GMRES) [Mor02], two methods developed by R. Morgan. Despite the fact that these two methods are algebraically equivalent, they are based on different approaches. As it will be shown, GMRES-E uses an augmen-tation approach and can be modified for solving sequences of linear systems, whereas GMRES-DR uses an orthogonalization approach and can be efficient

(26)

when solving one system, but cannot be adapted for solving sequences of sys-tems. A somewhat similar idea was exploited to derive the “loose” GMRES method [BJM05], which we also introduce.

Deﬂation and augmentation techniques can often show their full potential, only if they are used together with preconditioning. Preconditioning plays only a minor role in this thesis. Accordingly, in Section 3.6 we only give a brief introduction to right preconditioning. In Chapter 5, we will expand this topic by introducing a certain preconditioner for the lattice QCD application.

Proper combination of the methods from this chapter will result in elegant and eﬃcient methods for solving sequences of linear systems, which will be the topic of the next chapter.

3.1 GMRES

The Generalized Minimal Residual method (GMRES) [SS86] is a projection based Krylov subspace method which minimizes the 2-norm of the residual vector in each step. One way to derive the method is to consider the conditions (2.8) and (2.9) with 𝒦 = 𝒦_𝑘 and ℒ = 𝐴𝒦_𝑘, where 𝒦_𝑘 is the 𝑘-th Krylov subspace. We know from Proposition 2.11 that this means that the 2-norm of the residual is minimized. We now focus on more algorithmic aspects, in particular on how to obtain the iterates from the Arnoldi process (Alg. 2.1).

Any approximate solution, extracted from an aﬃne space 𝑥₀+ 𝒦_𝑘, can be written as 𝑥_𝑘 = 𝑥₀+ 𝑉_𝑘𝑦, where the columns of 𝑉_𝑘 form an orthonormal basis of 𝒦_𝑘 and 𝑦 ∈ ℂ𝑘_{. Thus, the 2-norm of the residual can be rewritten as}

‖𝑏 − 𝐴𝑥‖

2 = ‖𝑏 − 𝐴(𝑥0+ 𝑉𝑘𝑦)‖2

= ‖𝑟₀− 𝐴𝑉_𝑘𝑦‖

2.

Using the Arnoldi relation (2.15) and the fact that the matrix 𝑉_𝑘+1is orthonor-mal, it follows ‖𝑏 − 𝐴𝑥‖ 2= ‖𝑟0− 𝐴𝑉𝑘𝑦‖2 = ∥𝛽𝑣₁− 𝑉_𝑘+1𝐻̄_𝑘𝑦∥ 2 = ∥𝑉_𝑘+1(𝛽𝑒₁−𝐻̄_𝑘𝑦)∥ 2 = ∥𝛽𝑒₁−𝐻̄_𝑘𝑦∥ 2, where 𝛽 = ‖𝑟₀‖

2, as deﬁned in the Arnoldi process (Alg. 2.1), and 𝑒1 is the

(27)

least-3.1 GMRES 21 squares problem min 𝑦 ∥𝛽𝑒1− ̄ 𝐻_𝑘𝑦∥ 2 (3.1)

of smaller size (𝑘 + 1) × 𝑘. Having this in mind, the GMRES approxi-mate solution is deﬁned as the unique vector 𝑥_𝑘 = 𝑥₀+ 𝑉_𝑘𝑦_𝑘, where 𝑦_𝑘 = argmin_𝑦∥𝛽𝑒₁−𝐻̄_𝑘𝑦∥

2.

The typical way of solving the least-squares problem (3.1) is by computing the 𝑄𝑅 decomposition of the Hessenberg matrix𝐻̄_𝑘,𝐻̄_𝑘 = 𝑄_𝑘𝑅_𝑘, which allows to compute the solution of the least-squares system as a solution of a triangular linear system, see (3.2). Considering the special structure of the matrix𝐻̄_𝑘, i.e. the Hessenberg form, this can be done eﬃciently by multiplying the Hessenberg matrix 𝐻̄_𝑘 and the right-hand side 𝛽𝑒₁ by a sequence of Givens rotations

𝐺_𝑖 ∶= [ 𝐼 𝑐𝑖 𝑠𝑖 − ̄𝑠𝑖 𝑐𝑖 𝐼 ] ← ← (𝑖+1)𝑖-th row-st row where |𝑐_𝑖|2_{+ |𝑠}

𝑖|2 = 1. The coeﬃcients 𝑐𝑖 and 𝑠𝑖 are chosen such that in each

step the element ̃ℎ_𝑖+1,𝑖 of the matrix𝐻̃_𝑖−1 = 𝐺_𝑖−1⋯ 𝐺₁𝐻̄_𝑘 is eliminated. The following choice achieves that:

𝑐_𝑖= ⎧ { ⎨ { ⎩ 0 , if ̃ℎ_𝑖,𝑖 = 0, | ̃ℎ𝑖,𝑖| √| ̃ℎ𝑖,𝑖|2+| ̃ℎ𝑖+1,𝑖|2 , else , 𝑠𝑖= ⎧ { ⎨ { ⎩ 1 , if ̃ℎ_𝑖,𝑖 = 0, ̃ ℎ𝑖,𝑖 ̄̃ℎ𝑖+1,𝑖 | ̃ℎ𝑖,𝑖|√| ̃ℎ𝑖,𝑖|2+| ̃ℎ𝑖+1,𝑖|2 , else. Deﬁning 𝑄_𝑘 = 𝐺𝐻 1 𝐺𝐻2 ⋯ 𝐺𝐻𝑘 we obtain ̄ 𝐻_𝑘 = 𝑄_𝑘𝑅_𝑘 𝛾_𝑘 = 𝑄𝐻 𝑘 (𝛽𝑒1),

where 𝛾_𝑘 is the new right-hand side. Noticing that 𝑄_𝑘 is a unitary matrix, the least-squares problem (3.1) can be rewritten as

min 𝑦 ∥𝛽𝑒1− ̄ 𝐻_𝑘𝑦∥ 2⇔ min𝑦 ‖𝑄𝑘𝛾𝑘− 𝑄𝑘𝑅𝑘𝑦‖2 ⇔ min 𝑦 ‖𝛾𝑘− 𝑅𝑘𝑦‖2. (3.2)

The new least-squares problem (3.2) is solved by deleting the last row of the matrix 𝑅_𝑘 and the last element of the right-hand side 𝛾_𝑘, and then solving the resulting (upper triangular) linear system to obtain 𝑦_𝑘 and further, the next iterate 𝑥_𝑘 = 𝑉_𝑘𝑦_𝑘.

(28)

Using Givens rotations is not only an elegant way for solving the least-squares problem (3.1); it also provides the 2-norm of the residual ‖𝑟_𝑘‖

2 in each step ‖𝑟_𝑘‖ 2 = ‖𝑏 − 𝐴𝑥𝑘‖2 = ∥𝛽𝑒₁−𝐻̄_𝑘𝑦_𝑘∥ 2 = ‖𝛾_𝑘− 𝑅_𝑘𝑦_𝑘‖ 2 = |(𝛾_𝑘)_𝑘+1|.

In other words, a stopping criterion is available in every step without having to compute the residual and its norm explicitly. We end up obtaining the GMRES method, as described in Algorithm 3.1.

Algorithm 3.1: GMRES

𝑏 ∈ ℂ𝑛_{, 𝑥}

0∈ ℂ𝑛 right-hand side and initial guess

Output: approximate solution 𝑥_𝑘 to 𝐴𝑥 = 𝑏

1 𝑟₀= 𝑏 − 𝐴𝑥₀

2 𝛽₀= ‖𝑟₀‖ 2 3 𝑣₁= 𝑟₀/𝛽₀ 4 𝛾₀= 𝛽₀𝑒₁

5 for 𝑘 = 1, 2, … until convergence do

6 compute 𝑣_𝑘+1 and𝐻̄_𝑘 // Arnoldi process (Alg. 2.1)

7 apply 𝐺_𝑘−1⋯ 𝐺₁ to the last column of𝐻̄_𝑘yielding ̃𝑅_𝑘

8 compute the Givens rotation 𝐺_𝑘 using ̃𝑅_𝑘

9 apply 𝐺_𝑘 to the result ̃𝑅_𝑘 of line 7 yielding 𝑅_𝑘

10 apply 𝐺_𝑘 to the 𝑘-th and (𝑘 + 1)-st entry of (𝛾𝑘−1

0 ) yielding 𝛾𝑘

11 𝛽_𝑘= |𝛾_𝑘(𝑘 + 1)|

12 solve 𝛾_𝑘(1 ∶ 𝑘) = 𝑅_𝑘(1 ∶ 𝑘, 1 ∶ 𝑘)𝑦_𝑘for 𝑦_𝑘

13 set 𝑥_𝑘= 𝑥₀+ 𝑉_𝑘𝑦_𝑘

A major concern, when discussing iterative methods, is whether they may break down. In GMRES, the breakdown occurs when dividing by 0 in line 11 of the Arnoldi process (Alg. 2.1). The following proposition states that the GMRES method cannot break down, unless it has already converged to the exact solution, see [SS86].

Proposition 3.1. The approximate solution 𝑥_𝑘 produced by GMRES at step 𝑘 is exact if and only if the following three equivalent conditions hold:

(29)

3.1 GMRES 23

b) ℎ_𝑘+1,𝑘 = 0.

c) The grade of 𝑣₁ is equal to 𝑘.

This type of breakdown is referred to as a “lucky” breakdown in the litera-ture. Since the grade of 𝑣₁ cannot exceed 𝑛, an immediate consequence is that GMRES terminates in at most 𝑛 steps.

The convergence of GMRES can be described by a bound on the 2-norm of the residuals. We give here two well-known results. For proofs and further discussions see [Gre97].

Theorem 3.2. Assume that 𝐴 is diagonalizable and let 𝐴 = 𝑉 Λ𝑉−1 _{be an}

eigendecomposition, where Λ = diag (𝜆₁, … , 𝜆_𝑛). Then it follows ‖𝑟_𝑘‖ = min 𝑝𝑘∈Π𝑘 ‖𝑉 𝑝_𝑘(Λ)𝑉−1_𝑟 0‖ ≤ ‖𝑉 ‖ ⋅ ‖𝑉−1_{‖ ⋅ ‖𝑟} 0‖ min 𝑝𝑘∈Π𝑘 ‖𝑝_𝑘(Λ)‖ and the residuals of GMRES satisfy the equation

‖𝑟_𝑘‖

‖𝑟₀‖ ≤ 𝜅(𝑉 ) min𝑝𝑘∈Π𝑘 max

𝑖=1,…,𝑛|𝑝𝑘(𝜆𝑖)|, (3.3)

where 𝜅(𝑉 ) = ‖𝑉 ‖ ⋅ ‖𝑉−1_{‖ is the condition number of the eigenvector matrix}

𝑉 . Moreover, if 𝐴 is normal, then ‖𝑟_𝑘‖

‖𝑟₀‖ ≤ min𝑝𝑘∈Π𝑘 max

𝑖=1,…,𝑛|𝑝𝑘(𝜆𝑖)|

The bound (3.3) is not sharp in general, in particular when the matrix 𝑉 is ill-conditioned.

Theorem 3.3. Let ℱ(𝐴) = {𝑥𝐻_𝐴𝑥

𝑥𝐻_𝑥 |𝑥 ∈ ℂ

𝑛_{, 𝑥 ≠ 0} be the ﬁeld of values of}

𝐴 ∈ ℂ𝑛×𝑛_{. Assume that ℱ(𝐴) is contained in a disk 𝐷 = {𝑧 ∈ ℂ ∶ |𝑧 − 𝑎| ≤ 𝑏}}

which does not contain the origin. Then the GMRES residuals satisfy ‖𝑟_𝑘‖ ‖𝑟₀‖ ≤ 2 ( 𝑏 |𝑎|) 𝑘 .

The GMRES method becomes impractical as the number of iteration steps grows due to the excessive storage and computational time requirements: the 𝑘-th step of the underlying Arnoldi process requires the storage of 𝑘 vectors and 𝑘 inner products of vectors of length 𝑛. One remedy is to restart the method after a certain number of steps 𝑚, an approach that we will shortly discuss. There are other variants based on the truncation of the orthogonalization in the Arnoldi process. Both, restarted GMRES (GMRES(𝑚)) and Quasi-GMRES (QGMRES) are covered in detail in [Saa03].

(30)

3.1.1 Restarted GMRES

The idea of restarting is rather easy to understand and can be implemented straightforwardly. Basically, after 𝑚 steps of GMRES, where 𝑚 is usually much smaller than the dimension of the system, the method is restarted choosing the approximate solution 𝑥_𝑚 as the initial guess for the next cycle. This leads to the restarted GMRES method [SS86], termed GMRES(𝑚), as described in Algorithm 3.2.

Algorithm 3.2: GMRES(𝑚)

𝑏 ∈ ℂ𝑛_{, 𝑥}

𝑚 ∈ ℕ restart length

Output: approximate solution 𝑥_𝑚 to 𝐴𝑥 = 𝑏

1 𝑟₀= 𝑏 − 𝐴𝑥₀, 𝛽₀= ‖𝑟₀‖

2, 𝑣1= 𝑟0/𝛽0

2 Perform 𝑚 steps of the Arnoldi process (Alg. 2.1) yielding 𝑉_𝑚 and𝐻̄_𝑚

3 Compute 𝑦_𝑚which minimizes ∥𝛽𝑒₁−𝐻̄_𝑚𝑦∥

2 and 𝑥𝑚= 𝑥0+ 𝑉𝑚𝑦𝑚

4 if satisﬁed then 5 Stop

6 else

7 set 𝑥₀= 𝑥_𝑚and go to 1

It is well-known that restarted GMRES might converge signiﬁcantly more slowly than full GMRES. In some cases, even stagnation can occur which means that there is no convergence towards the solution. A simple example that illustrates such a behaviour is to consider the matrix 𝐴 and the right-hand side 𝑏

𝐴 = [0 1

1 0] , 𝑏 = [ 1 0] ,

with 𝑥₀ = 0. Then GMRES(1) will always produce the same approximate solution, i.e. 𝑥₁ = 𝑥₀, and thus 𝑥_𝑘 = 𝑥₀ for all 𝑘, meaning that the method will stagnate. Much work has been done with the same goal of improving the convergence behaviour of the restarted GMRES method. Most of the methods try to overcome the fact that in restarted GMRES at the time of restart all information built up in the previous cycle is discarded. In later sections we present some of the methods that improve the convergence of subsequent cycles and also systems by keeping some relevant information from previous cycles (systems). The choice of which information should be kept and the way in which it is used give rise to many diﬀerent methods.

(31)

3.2 GCR 25

3.2 GCR

The generalized conjugate residual method (GCR) [Stu96a] is a method al-gebraically equivalent to the GMRES method. It is a modiﬁcation of the conjugate residual method for solving nonsymmetric systems, where the sym-metric part of the matrix, i.e. (𝐴 + 𝐴𝐻_{)/2 is positive deﬁnite. If this is not}

the case, GCR may break down.

The basic idea of the method is to keep two bases 𝑈_𝑘 and 𝐶_𝑘 for 𝒦_𝑘(𝐴, 𝑟₀) and 𝐴𝒦_𝑘(𝐴, 𝑟₀), respectively, such that

range(𝑈_𝑘) = 𝒦_𝑘(𝐴, 𝑟₀) (3.4)

𝐴𝑈_𝑘 = 𝐶_𝑘 (3.5)

𝐶𝐻

𝑘 𝐶𝑘 = 𝐼. (3.6)

The method solves the same minimization problem as GMRES 𝑥_𝑘 = argmin 𝑥∈𝑥0+range(𝑈𝑘) ‖𝑏 − 𝐴𝑥‖ 2⇔ 𝑦𝑘 = argmin 𝑦∈ℂ𝑘 ‖𝑏 − 𝐴(𝑥₀+ 𝑈_𝑘𝑦)‖ 2 ⇔ 𝑦_𝑘 = argmin 𝑦∈ℂ𝑘 ‖𝑟₀− 𝐶_𝑘𝑦‖ 2. (3.7)

Because of (3.6) the solution of the problem (3.7) is given by 𝑦_𝑘 = 𝐶𝐻

𝑘 𝑟0 and

therefore

𝑥_𝑘 = 𝑥₀+ 𝑈_𝑘𝐶𝐻

𝑘 𝑟0. (3.8)

Updating the residual is straightforward 𝑟_𝑘 = 𝑏 − 𝐴𝑥₀− 𝐴𝑈_𝑘𝐶𝐻

𝑘 𝑟0= 𝑟0− 𝐶𝑘𝐶𝑘𝐻𝑟0, (3.9)

and because of the minimization property and Proposition 2.11 we have 𝑟_𝑘 ⟂ range(𝐶_𝑘).

It is worth mentioning that within the GCR method we have constructed the inverse of 𝐴 over the space range(𝐶_𝑘), in the sense that

𝐴−1_𝐶

𝑘 = 𝑈𝑘,

which one can regard as the underlying principle of the method, see Algo-rithm 3.3.

Throughout this thesis, we will not consider GCR as a means to solve a system because of possible breakdowns and its reduced numerical stability as compared to GMRES. We are rather interested in the framework of the

(32)

Algorithm 3.3: GCR

𝑏 ∈ ℂ𝑛_{, 𝑥}

Output: approximate solution 𝑥_𝑘 to 𝐴𝑥 = 𝑏

1 𝑟₀= 𝑏 − 𝐴𝑥₀, 𝑘 = 0 2 while ‖𝑟_𝑘‖ 2 > tol do 3 𝑘 = 𝑘 + 1 4 𝑢_𝑘= 𝑟_𝑘−1, 𝑐_𝑘= 𝐴𝑢_𝑘 5 for 𝑖 = 1, … , 𝑘 − 1 do 6 𝛼_𝑖= 𝑐𝐻_𝑖 𝑐_𝑘 7 𝑐_𝑘= 𝑐_𝑘− 𝛼_𝑖𝑐_𝑖 8 𝑢_𝑘= 𝑢_𝑘− 𝛼_𝑖𝑢_𝑖 9 𝑐_𝑘= 𝑐_𝑘/ ‖𝑐_𝑘‖ 2, 𝑢𝑘= 𝑢𝑘/ ‖𝑐𝑘‖2 10 𝑥_𝑘= 𝑥_𝑘−1+ 𝑢_𝑘𝑐_𝑘𝐻𝑟_𝑘−1 11 𝑟_𝑘= 𝑟_𝑘−1− 𝑐_𝑘𝑐_𝑘𝐻𝑟_𝑘−1

method, just described above, which will be the starting point for more ad-vanced methods in later sections. Having that in mind, and the mathematical equivalence to GMRES, we conclude this section by commenting that, as long as GCR is feasible (which is guaranteed when (𝐴 + 𝐴𝐻_{)/2 is positive deﬁnite),}

the results of Theorem 3.2 and of Theorem 3.3 hold for the GCR method.

3.3 GMRESR

Examining the GCR method, we conclude that in step 4 (Alg. 3.3) the choice of 𝑢_𝑘 can be modified without affecting the rest of the algorithm, and replacing 𝑟_𝑘−1 with any other vector leads to a method that solves the minimization problem (3.7) with a modified range(𝑈_𝑘), where range(𝑈_𝑘) ≠ 𝒦_𝑘(𝐴, 𝑟₀). The better the choice of 𝑢_𝑘 is, the faster the method will converge. The optimal choice would be 𝑢_𝑘 = 𝑒_𝑘−1 (in this case we would retrieve the exact solution), where 𝑒_𝑘−1 is the error vector 𝑒_𝑘−1 = 𝑥_𝑘−1 − 𝑥∗_{. Therefore, a reasonable}

approach is to ﬁnd the best possible approximation to the error 𝑒_𝑘−1, when working on the residual equation

𝐴𝑒_𝑘−1 = 𝑟_𝑘−1. (3.10)

One such approach is the recursive GMRES method (GMRESR) [VV94], see Algorithm 3.4. It consists of an inner and an outer method. The outer method

(33)

3.3 GMRESR 27

is GCR which computes the optimal approximation over a given set of direction vectors such that the residual is minimized. The inner method is GMRES, which computes an approximation to the solution of the residual equation (3.10), with initial guess 0. This gives the new direction vector needed in the outer loop.

Algorithm 3.4: GMRESR

𝑏 ∈ ℂ𝑛_{, 𝑥}

𝑚 ∈ ℕ restart length

Output: approximate solution 𝑥𝑘 to 𝐴𝑥 = 𝑏

1 𝑟₀= 𝑏 − 𝐴𝑥₀, 𝑘 = 1 2 while ‖𝑟_𝑘−1‖ 2 > tol do 3 𝑢_𝑘= 𝑝_𝑚(𝐴)𝑟_𝑘−1, 𝑐_𝑘= 𝐴𝑢_𝑘 // 𝑝_𝑚 - GMRES polynomial 4 for 𝑖 = 1, … , 𝑘 + 1 do 5 𝛼_𝑖= 𝑐_𝑖𝐻𝑐_𝑘 6 𝑐_𝑘= 𝑐_𝑘− 𝛼_𝑖𝑐_𝑖 7 𝑢_𝑘= 𝑢_𝑘− 𝛼_𝑖𝑢_𝑖 8 𝑐_𝑘= 𝑐_𝑘/ ‖𝑐_𝑘‖ 2, 𝑢𝑘= 𝑢𝑘/ ‖𝑐𝑘‖2 9 𝑥_𝑘= 𝑥_𝑘−1+ 𝑢_𝑘𝑐𝐻_𝑘𝑟_𝑘−1 10 𝑟_𝑘= 𝑟_𝑘−1− 𝑐_𝑘𝑐𝐻_𝑘𝑟_𝑘−1 11 𝑘 = 𝑘 + 1

The convergence criteria and costs of the method will not be discussed here (see, e.g., [VV94]), as in this thesis we consider the GMRESR method only as a step between GCR and the GCRO method, which is the topic of the next section. Instead, we review some disadvantages of GMRESR (see, e.g., [Stu96a]). The main ﬂaw of the method is that it solves the “wrong” mini-mization problem, which leads to suboptimal corrections to the solution. In the 𝑘-th step of GMRESR, in the inner loop GMRES solves (line 3)

min

𝑦∈ℂ𝑚‖𝑟𝑘−1− 𝐴𝑉𝑚𝑦‖2, (3.11)

with the columns of 𝑉_𝑚 spanning a basis of 𝒦_𝑚(𝐴, 𝑟_𝑘−1) and in the outer loop (lines 8 − 10) we set 𝑢_𝑘 = (𝑉_𝑚𝑦 − 𝑈_𝑘−1𝐶𝐻 𝑘−1𝐴𝑉𝑚𝑦)/ ∥(𝐼 − 𝐶𝑘−1𝐶𝑘−1𝐻 )𝐴𝑉𝑚𝑦∥₂ 𝑐_𝑘 = (𝐼 − 𝐶_𝑘−1𝐶𝐻 𝑘−1)𝐴𝑉𝑚𝑦/ ∥(𝐼 − 𝐶𝑘−1𝐶𝑘−1𝐻 )𝐴𝑉𝑚𝑦∥₂ 𝑥_𝑘 = 𝑥_𝑘−1 + 𝑢_𝑘𝑐𝐻 𝑘 𝑟𝑘−1 𝑟_𝑘 = 𝑟_𝑘−1− 𝑐_𝑘𝑐𝐻 𝑘 𝑟𝑘−1.

Deflation Based Krylov Subspace Methods for Sequences of Linear Systems

Deﬂation Based Krylov Subspace Methods

for Sequences of Linear Systems

Doktors der Naturwissenschaften (Dr. rer. nat.)

Dissertation

Dipl.-Math. Nemanja Božović

Acknowledgements

Contents

1 Introduction

1.1 Outline

2 Basic Concepts

2.1 Projectors

2.2 Projection Methods

2.3 Krylov Subspaces

2.3.1 The Arnoldi Process

2.4 Ritz and Harmonic Ritz Pairs

3 GMRES, GCR and Deﬂation

3.1 GMRES

3.1.1 Restarted GMRES

3.2 GCR

3.3 GMRESR