Preconditioners for the conjugate gradient algorithm using Gram Schmidt and least squares methods

(1)

Preconditioners for the conjugate gradient algorithm using

Gram–Schmidt and least squares methods

JULIEN STRAUBHAAR*

Institut de Mathématiques, Université de Neuchâtel, Rue Emile-Argand 11, CH-2007 Neuchâtel, Switzerland

This paper is devoted to the study of some preconditioners for the conjugate gradient algorithm used to solve large sparse linear and symmetric positive definite systems. The construction of a preconditioner based on the Gram–Schmidt orthogonalization process and the least squares method is presented.

Some results on the condition number of the preconditioned system are provided. Finally, numerical comparisons are given for different preconditioners.

Keywords: Conjugate gradient method; Preconditioner; Gram–Schmidt orthogonalization; Least squares; Condition number

AMS Subject Classifications: 65F10; 65F35; 65F50

1. Introduction

Let us first briefly present the conjugate gradient (CG) algorithm. It is an iterative method used to solve the linear system

Ax= b, (1)

where A= (aij)is a real symmetric positive definite (SPD) n× n matrix and b a vector in Rⁿ. Approximate solutions xk∈ Rⁿand their residual vector rk= b − Axk(k= 0, 1, . . .) are computed. The approximate solution xNis chosen when the norm of its associated residual is lower than a fixed tolerance tol.

The CG algorithm, due to Hestenes and Stiefel [1–3], consists of finding the minimum ˆx = A⁻¹bof the application

f (x)= x − ˆx²A= (x|Ax) − 2(b|x) + (b| ˆx),

where (.|.) denotes the usual inner product in Rⁿ, and.Athe norm derived from the inner product (x|y)A:= (x|Ay). The approximate solution xk+1 is obtained as the minimum of f ,

*Email: [email protected]

Published in International Journal of Computer Mathematics 84, issue 1, 89-108, 2007 which should be used for any reference to this work

(2)

starting from xkand in the dk-vector direction. The starting point x₀is chosen arbitrarily, and d₀ is set to r₀. Then, for k 0, the real number βk is determined in order that the descent direction dk+1:= rk+1+ βkd_kis A-orthogonal to dk(i.e. (dk+1|dk)_A= 0).

Let us give some properties of the CG algorithm (see, for example, [1–3]). Each vector dkis A-orthogonal to each other, i.e. (di|dj)_A= 0 if i = j (we say that the vectors dkare conjugate).

Each residual vector rkis orthogonal to each other (for the usual inner product); this implies that the CG algorithm converges after at most n iterations. However, in practice, it is possible that the CG algorithm does not converge after n iterations because of the fixed precision for the representation of real numbers in computers. Moreover, even if the CG algorithm converges, when n is large, the number of iterations is, in general, too large.

The following theorem gives an idea of the convergence rate of the CG algorithm.

THEOREM1.1 [3, p. 194] If ˆx = A⁻¹b is the solution of(1), then the approximate solutions xmsatisfy

xm− ˆxA 2

√√κ− 1 κ+ 1

m

x0− ˆxA,

where κ= λmax/λ_minis the condition number of the matrix A, i.e. the ratio of its largest and smallest eigenvalues.

Therefore, if the condition number κ is close to one, the convergence is fast.

In order to improve the convergence rate of the CG algorithm, system (1) is replaced by the equivalent system

TAT^t˜x = Tb, x = T^t˜x, (2)

where T is an n× n regular matrix. Then the CG algorithm is applied to this new system whose matrix T AT^t is still SPD. The matrix T is called a preconditioner of A. The aim is to construct preconditioners that reduce as much as possible the condition number of the system.

In practice (modelling many physical phenomena, for instance), the matrix A is large and sparse, that is, having a large proportion of its coefficients equal to zero. For memory reasons and time computation, the preconditioner should also be sparse.

For the preconditioned system (2), the following CG algorithm is used [2, 3]:

Preconditioned CG algorithm Let x0∈ Rⁿ

Compute r0= b − Ax0, s0= T^t· T r0and set d0= s0

Ifr0/b < tol: quit For k 0, do:

˜αk= (rk|sk)/(dk|Adk) xk+1 = xk+ ˜αkdk

rk+1= rk− ˜αkAdk

Ifrk+1/b < tol : quit the loop sk+1= T^t· Trk+1

˜βk= (rk+1|sk+1)/(rk|sk) dk+1 = sk+1+ ˜βkdk

End for k

(3)

Remark 1 Another way to precondition (1) consists of considering an n× n SPD (sparse) matrix M and replacing (1) by the equivalent system

M⁻¹Ax= M⁻¹b. (3)

This system can be rewritten like (2) with T = M^−1/2= QD^−1/2Q^t, where QDQ^tis an orthog- onal diagonalization of M. With this preconditioning methodology, a linear system of matrix M must be solved in each step of the CG algorithm (see the computation of vectors sk

in the preconditioned CG algorithm); for this reason, the CG algorithm is, in general, not parallelizable.

A classical way of preconditioning (1) consists of using the incomplete Cholesky factorization (or ilu factorization) of A.

T^HEOREM1.2 [3, 4] Let A be an n× n symmetric M-matrix, i.e. its non-diagonal coefficients are non-positive and A is monotone (regular and the coefficients of A⁻¹are non-negative), and let S be a symmetric set of indices with no diagonal index. Then there is a unique lower triangular matrix L= (lij) and a unique symmetric matrix R= (rij) with non-negative coef- ficients such that lij= 0 if (i, j) ∈ S, rij= 0 if (i, j) ∈ S and such that A = LL^t− R is a regular splitting (i.e. LL^t is monotone and the coefficients of R are non-negative).

Choosing S= {(i, j) | aij= 0}, the computation of the matrix L [3, 4] in this theorem provides a preconditioner M= LL^tfor system (3).

With the incomplete Cholesky factorization, the CG algorithm has a good convergence rate when it does not break down. However, the computation of the matrix L and the corresponding CG algorithm are not parallelizable. Hence, this methodology becomes impracticable with very large matrices.

Let us now present another preconditioner developed in [5]. The idea is to construct a fac- torized sparse approximate inverse of A. Let A= LAL^t_Abe the Cholesky factorization of A.

A sparse approximate matrix G of L⁻¹_A is constructed in the following way. Let P be a set of indices included in the lower part of the diagonal,{(i, i) | 1 i n} ⊂ P ⊂ {(i, j) | 1 i j n}.We minimize the Frobenius norm I − GLAF = (Tr((I − GLA)(I− GLA)^t))^1/2 with the constraints Gij= 0 if (i, j) ∈ P . This leads to the equations (GLAL^t_A)_ij= (L^t_A)_ij, (i, j )∈ P , i.e. (GA)ij= 0 for (i, j) ∈ P , i = j, and (GA)ii= (LA)ii for all i. Since the diagonal coefficients of LAare unknown, the matrix Gsuch that

(GA)ij= 0, if (i, j) ∈ P, i = j, (4)

(GA)_ii= 1, for i = 1, . . . , n, (5)

Gij= 0, if (i, j) ∈ P, (6)

is considered. Then, we set G= DG, where D is the diagonal matrix such that (GAG^t)ii= 1 for all i. Choosing P = {(i, j) | i j, aij= 0}, the matrix T = G is a preconditioner for (2).

Equations (4)–(6) form n independent systems: one for each row of G; thus the construction of this preconditioner is readily parallelizable. However, it is very difficult to answer the following question: how to choose the set P to improve the preconditioner? Moreover, some preconditioning techniques can be found in [6].

In the following sections, a preconditioner is presented using the Gram–Schmidt orthogonalization process and least squares method. For its construction, some parameters can be modified and the filling is controlled; this guarantees some flexibility and avoids strong filling in the preconditioner. Moreover, the algorithm for its construction is readily parallelizable.

(4)

Some results for the condition number of the preconditioned system are presented. Finally, numerical comparisons are given for the different preconditioners described.

2. Conjugate Gram–Schmidt algorithm

This section is devoted to the construction of a preconditioner based on the conjugate Gram–

Schmidt method [7].

The A-orthogonalization Gram–Schmidt process (i.e. the Gram–Schmidt process with the inner product (.|.)A) is applied to the canonical basis{e1, . . . , e_n} in Rⁿ. We then obtain the A-orthogonal basis{z1, . . . , z_n}, where

z₁= e1, z_k= ek−

k−1

i=1

(ek|zi)A

(zi|zi)A

z_i, k= 2, . . . , n. (7)

The matrix Z= (z1, . . . , z_n), whose columns are the vectors zk, is upper triangular, unitary diagonal, and satisfies the relation Z^t· A · Z = D, where D = diag(p1, . . . , p_n), with pk= (zk|zk)A= z^tk· A · zk= zk²A, k= 1, . . . , n.

Let

pik= (ei|zk)A= (ai|zk),

where ai = Aeidenotes the ith column of A. Since the vectors zkare mutually A-orthogonal, pk= pkk. The conjugate Gram–Schmidt algorithm (CGS) can be written as follows.

CGS algorithm z⁽⁰⁾_i = ei, i= 1, . . . , n For k= 1, . . . , n, do:

For i= k, . . . , n, do:

p_i^(k)= (ai| z^(kk⁻¹⁾) End for i

For i= k + 1, . . . , n, do:

z^(k)_i = z_i^(k⁻¹⁾− (p^(k)_i /p_k^(k))z^(k_k⁻¹⁾ End for i

End for k

We have p_i^(k)= pikand, finally, zk= zk^(k⁻¹⁾and pk = p^(k)k . Note that the fourth row in the CGS algorithm can be replaced by p^(k)_i = (ak| z^(k_i ⁻¹⁾). In fact, using (7) and the symmetry of A, we have

(ai | z^(kk⁻¹⁾)= (ai|ek)−

k−1

j=1

(a_k|zj) pj

(ai|zj)

=

⎛

⎝ak

ei−

k−1

j=1

(ai|zj) pj

zj

⎞

⎠

= (ak| zi^(k⁻¹⁾).

(5)

The CGS algorithm provides the inverse A⁻¹= Z · D⁻¹· Z^t of A and, with T = D^−1/2· Z^t, we have T AT^t= I.

2.1 Obtaining a preconditioner

The idea for obtaining a preconditioner of (1) is to consider an approximation of the matrix Z (still denoted Z). It provides, with the corresponding diagonal matrix D= diag((z₁|Az1), . . . , (z_n|Azn)), an approximation Z· D⁻¹· Z^t of A⁻¹ and a preconditioner T = D^−1/2· Z^tfor system (2), satisfying TAT^t ≈ I.

Let P be a set of indices included in the upper part of the diagonal,

{(i, i) | 1 i n} ⊂ P ⊂ {(i, j) | 1 i j n}, (8)

and we fix

Z_ij = 0, if (i, j) ∈ P. (9)

Several ways can be considered to determine the coefficients Zij, (i, j )∈ P , and this is the purpose of the following sections.

3. Incomplete conjugate Gram–Schmidt algorithm

One of the ideas in [7] consists of using the CGS algorithm, ignoring the coefficients in Z whose indices are not in P , where P , satisfying (8) and (9), is given. We obtain the following algorithm.

INC CGS algorithm Z= I (initialization) For k= 1, . . . , n, do:

For i= k, . . . , n, do:

pi = aik

For j = 1, . . . , k − 1, do:

If (j, i)∈ P : pi= pi+ ajkz_ji End for j

⎫⎪

⎪⎬

⎪⎪

⎭

p_i^(k)= (ak| z^(ki ⁻¹⁾)

End for i

For i= k + 1, . . . , n, do:

t = pi/p_k

For j = 1, . . . , k, do:

If (j, i)∈ P and (j, k) ∈ P : zji= zji− tzj k

End for j

⎫⎬

⎭z^(k)_i = z^(k_i ⁻¹⁾− tz^(k_k⁻¹⁾ End for i

End for k

If matrix A is sparse, a choice for P can be P = {(i, j) | 1 i j n, aij= 0}. This guarantees a reasonable computation time for the construction of this preconditioner.

(6)

4. Approximation with the least squares method

Now consider P satisfying (8); our aim is to construct the upper triangular and unitary diagonal matrix Z with the constraints (9). Assuming that the columns z1, . . . , zk−1are known, let us show how to construct the kth column zk. Let zk(k)= Zkk= 1 and let

J = Jk= {j | (j, k) ∈ P, j = k}

be the indices set of the components of zkto be determined. Assuming thatJ is non-empty, setJ = {j1<· · · < jp} ⊂ {1, . . . , k − 1} and

y = (y1, . . . , yp)^t = zk(J ) = (Zj₁k, . . . , Zj_pk)^t. (10) We look for zksuch that zkis A-orthogonal to the vectors z₁, . . . , z_k₋₁:

(zk|Azi)= 0, i = 1, . . . , k − 1. (11) With previous notation

(z_k|Azi)= (Azk|zi)= (ak|zi)+

p l=1

y_l(a_j_l|zi)

(where ajl denotes the jlth column of A). So (11) can be rewritten as

p l=1

(aj_l|zi)yl = −(ak|zi), i= 1, . . . , k − 1.

This is a linear system with k− 1 equations and p unknown variables (y1, . . . , yp); in matrix form, we have

By= c, (12)

where B= (bil)is a (k− 1) × p matrix, with bil= (aj_l|zi)and c= (c1, . . . , ck−1)^t is the vector inR^k⁻¹, with ci = −(ak|zi).

Let Zk−1 = (Zij)_1i,jk−1be the principal submatrix of Z of order k− 1. Also let Abe the (k− 1) × p matrix obtained by keeping in A the k − 1 first rows and columns j1, . . . , j_p ( A_il= aij_l) and ˜ak the vector inR^k⁻¹ made up of the k− 1 first components of ak (the kth column of A). Since the components k, k+ 1, . . . , n of the vectors z1, . . . , z_k₋₁ are equal to zero, we have

B= Zk^t−1· A, and

c= −Z^t_k₋₁· ˜ak.

The matrix Zk−1is regular (upper triangular and unitary diagonal), hence (12) is equivalent to

Ay = −˜ak. (13)

In general, this system has no solution since p k − 1. We seek a solution in the least squares sense, i.e. the vector y∈ R^pthat minimizes

Ay+ ˜ak. (14)

(7)

This solution y is given by the system (see, for example, [8, pp. 106–107])

A^t· Ay = − A^t· ˜ak. (15) Since matrix A is SPD, it is still the case for Ak−1, consequently its columns are linearly independent and A^t· Ais SPD and (15) has a unique solution. To compute it, we can use, for example, the Cholesky decomposition of the matrix A^t· A, which is of order p.

Equation (15) is solved by using the QR decomposition of A. Since the columns ˜a1, . . . ,˜ap

of Aare linearly independent, we have A= QR, where Q = (q1, . . . , q_p)is a (k− 1) × p matrix satisfying Q^t· Q = Ip and R= (rij)is a regular square and upper triangular matrix of order p. The matrices Q and R are obtained from the orthogonalization Gram–Schmidt process applied to the family{˜a1, . . . ,˜ap} (see, for instance, [3, p. 11] for the algorithm of QR decomposition).

To obtain a solution of (15) it suffices to solve

Ry= −Q^t˜ak. (16)

Indeed, since Q^t· Q = I, we have

A^t· Ay= − A^t ãk ⇐⇒ R^tQ^t· QRy = −R^tQ^tãk ⇐⇒ Ry = −Q^tãk.

Let us recall that the vector y∈ R^pminimizing (14) is y= zk(J ) (see (10)), i.e. the vector ˜zk= (Z_1k, . . . , Zk−1,k)^t, with Zj_lk= yl, l= 1, . . . , p, and Zik= 0 if i ∈ J , realizes the minimum

m= min

u∈Rk−1 ui =0, i∈J

Ak−1u+ ˜ak. (17)

It remains to choose, for each k∈ {2, . . . , n}, the set of indices Jkfor the components of the kth column of Z, which can be non-zero (save the diagonal coefficient Zkk) (or the pattern P = ∪ⁿ_k₌₂Jk∪ {(k, k) | k = 1, . . . , n}).

Remark 2 In the computation of the kth column zk of the matrix Z we obtain (13), where the previous columns z1, . . . , z_k₋₁no longer appear; thus, the columns of Z can be computed independently, which leads to an immediate parallel algorithm for this method.

4.1 Choosing the non-zero coefficients

Let us propose different ways to choose the setsJk(or the pattern P ). We remark first that, in order to obtain a reasonable computing time for QR decomposition of the system (13), it is necessary that the number of indices in eachJkis sufficiently small.

A-fillingThe simplest choice for P is to take

P = {(i, j) | 1 i j n, aij= 0}.

When the matrix A is sparse, the cardinality of eachJkis small.

Diagonal filling Another possibility is to fix the maximal cardinality pmax of eachJk and consider a filling of Z in the neighbourhood of the diagonal,

P = {(i, j) | 0 j − i pmax}.

Optimal filling Let us choose the setJkmore judiciously. For that, we use the ideas in [9]. Let us fix k, 2 k n. The set J = Jkand the vector zkare constructed so that the minimum m

(8)

of (17) is smaller or equal to a given number ε. For this, the maximal number p_maxof indices inJ and the ‘additional filling’ s (1 s pmax)are fixed; we then proceed in the following way:

(i) start withJ = φ;

(ii) compute the vector˜zkrealizing the minimum m of (17);

(iii) if m ε or |J | pmax, quit the loop; and (iv) add s indices toJ and go to (ii).

Once out of the loop, we set zk= (˜z^t_k,1, 0, . . . , 0)^t ∈ Rⁿ, then|J | pmax+ s − 1. Note that the condition m ε in (ii) avoids a strong filling in Z.

It remains to describe in point (iv) the choice of the additional indices for a given setJ . The aim is to select the indices that reduce at most the minimum of (17). Let

r= Ak−1˜zk+ ˜ak,

where ˜zk realizes the minimum of (17), i.e. r = min{Ak−1u+ ˜ak | u ∈ R^k⁻¹, ui = 0, i∈ J }. Let us consider the subspace W = Ak−1ej | j ∈ J included in R^k⁻¹; since Ak−1˜zk

is the best approximation of−˜akin W , r is orthogonal to W (see, for instance, [10, p. 217]), i.e.

(r|Ak−1e_j)= 0, ∀ j ∈ J . (18)

Let L = {1 l k − 1 | rl= 0} and, for each l ∈ L, set Ml = {1 j k − 1 | alj =

0, j ∈ J }. Then,

J =

l∈L

Ml (19)

is the set of ‘useful’ indices to add toJ in order to reduce r. Note that if r = 0, then J = φ. Indeed, assume J = φ; then, for all l ∈ L, Ml= φ, i.e. alj= 0 if j ∈ J . Hence, (r|Ak−1ej)=

l∈Lrlalj= 0, for all j ∈ J . Thus, using (18), r is orthogonal to all columns of Ak−1and, since Ak−1is regular, it follows that r= 0.

For each j ∈ J , the number μj ∈ R realizing the minimum ρ_j = min

μ∈Rr + μAk−1e_j is the real number such that the derivate of the quadratic function

fj(μ)= r + μAk−1ej²= r²+ 2μ(r|Ak−1ej)+ μ²Ak−1ej² vanishes, i.e.

μ_j = −(r|Ak−1ej)

Ak−1ej². So

ρ_j² = fj(μj)= r²− 2(r|Ak−1ej)²

Ak−1ej² +(r|Ak−1ej)²

Ak−1ej²

= r²−(r|Ak−1e_j)²

Ak−1ej² . Associate with each j ∈ J the weight

ω_j =(r|Ak−1ej)²

Ak−1ej² .

(9)

The greater the weight ωj the more ‘efficient’ the index j . Therefore, we choose s indices in J among those of largest weight and we add them to J ; if | J | s, we select J entirely.

Using the two disjoint sets J = {j1, . . . , j_p} and J = { ˜j1, . . . , ˜j_s}, let us show how to proceed to compute the vector˜zkrealizing the minimum

u∈Rk−1min

ui =0, i∈J J

Ak−1u+ ˜ak.

Let y= (Zj1k, . . . , Zjpk, Z_˜j₁_k, . . . , Z_˜j

sk)^t. In order to use (16), we compute the QR decomposition A= Q Rof the matrix

A= (Ak−1ej₁, . . . , Ak−1ej_p, Ak−1e_˜j

1, . . . , Ak−1e_˜j

s)

from the decomposition A= QR of A= (Ak−1ej₁, . . . , Ak−1ej_p), known from the previous step. Since we have (using block notation)

A= (A, A₁₂)= (Q, Q₁₂)

R R₁₂ 0 R₂₂

,

it suffices to extend the decomposition A= QR, i.e. to compute only the last s columns of Q and R. Note that j₁, . . . , jp, ˜j₁, . . . , ˜jsis not necessary in increasing order.

5. Theoretical results

The preconditioner T = D^−1/2Z^t constructed in one of the previous methods provides the preconditioned system whose matrix is

S= TAT^t = D^−1/2· Z^t· A · Z · D^−1/2. (20) Consider the matrix Z described in section 4 with optimal filling (in section 4.1); some theoretical properties concerning the matrix S are given; in particular, some upper bounds for its condition number.

T^HEOREM5.1 For k= 2, . . . , n, let mkbe the minimum of equation (17) realized by the vector formed by the first k− 1 components of the kth column of Z. Let ε > 0. Assume that mk ε for k= 2, . . . , n and let λmin= λmin(A) be the smallest eigenvalue of the matrix A; we have

|sik| ε

λ_min, if i= k, skk= 1, 1 k n, and

S − IF, S − I2

√n(n− 1) · ε λ_min ,

S − I1 (n− 1)ε λ_min ,

where XF = (Tr(XX^t))^1/2= (Tr(X^t· X))^1/2 is the Frobenius norm of X and Xi= sup_y_i₌₁X · yi is the operator norm of X derived from the norm.i inRⁿ, i= 1, 2.

(10)

Proof When the norm is not explicitly precise, the usual Euclidean norm inRⁿ is used.

According to (20) we have

sik= (zi|Azk)

ziAzkA

.

It is clear that skk = 1 for all k. Since S is symmetric, it suffices to show the inequality

|sik| ε/λmin for i < k. Fix i, k, with i < k, and for x∈ Rⁿ, let ˜x be the vector in R^k⁻¹ formed with the first k− 1 components of x. Since Z is an upper triangular and unitary diagonal matrix, we have

(zi|Azk)= (˜zi|Ak−1˜zk+ ˜ak).

Since mk= Ak−1˜zk+ ˜ak ε, using the Cauchy–Schwartz inequality we obtain

|(zi|Azk)| ˜zi · ε = zi · ε. (21) The Rayleigh quotient for the matrix A defined by μ(x)= (x|Ax)/(x|x) for x = 0 satisfies λ_min= minx=0μ(x)and λmax= maxx=0μ(x)(see, for example, [3, pp. 24–25]). Hence,

zi

ziA

= 1

√μ(zi) 1

√λ_min, (22)

and

zkA=

μ(z_k)· zk

λ_min· zk. (23)

With the estimates (21), (22) and (23), we obtain

|sik| = |(zi|Azk)|

ziAzkA

zi · ε

ziA· zkA

ε

λ_min· zk. Sincezk 1, the first part of the theorem is proved.

Let ekbe the kth vector of the usual basis inRⁿ. Using the previous results,

(S − I)ek²₂=

n i=1

(sik− δik)² =

i=k

(sik)² (n − 1) ε² λ²_min,

and

(S − I)ek1=

i=k

|sik| (n − 1) ε λ_min.

Therefore, for the Frobenuis norm, we obtain

S − I²_F = Tr((S − I)^t· (S − I)) =

n k=1

(S − I)ek²₂ n(n − 1) ε² λ²_min.

For i= 1, 2,

S − Ii = sup

xi=1(S − I)xi = sup

xi=1

n k=1

xk(S− I)ek

i

sup

xi=1

n k=1

|xk| · (S − I)eki,

(11)

so

S − I1 sup

x1=1

n k=1

|xk| · (S − I)ek1 (n − 1) ε λ_min.

Sincex1√

nx2for all x∈ Rⁿand sup_x₂₌₁x1=√

n, we obtain

S − I2 sup

x2=1

n k=1

|xk| · (S − I)ek2

(n− 1) ε λ_min sup

x2=1x1

=

n(n− 1) ε λ_min.

C^OROLLARY5.2 With the assumptions of the previous theorem, if δ= (n − 1)ε/λmin(A), we have

|λ − 1| δ

for all eigenvalues λ of S. In particular, if δ < 1, the condition number κ(S) of S verifies κ(S)1+ δ

1− δ.

Proof Let λ be an eigenvalue of S and v an eigenvector corresponding to λ withv1= 1.

Then, from the previous theorem,

|λ − 1| = (λ − 1)v1= (S − I)v1 S − I1· v1 δ.

(Note that the same proof with the norm.2provides the inequality with δ= (n(n − 1))^1/2· ε/λ_min(A), which is less precise.)

6. Variants

In this section we provide some variants for the preconditioning methodology. We first trans- form the system (1) with the diagonal preconditioner T₁= diag(a₁₁^−1/2, . . . , ann^−1/2)and then apply any other preconditioner T₂to the new system matrix A= T1AT₁^t(which has the same pattern of non-zero coefficients as A). We obtain the coupled preconditioner

T = T2T₁ of matrix A and the preconditioned system matrix is

T₂AT ₂^t= T2T₁AT₁^tT₂^t = TAT^t.

6.1 Block treatment

In this case a block decomposition of A is considered. Let A₁, . . . , A_M be the diagonal blocks, where Aiis a matrix of order mi, 1 i M, with m1+ · · · + mM = n (the order of A). For each block Ai, a preconditioner Ti is constructed. Then T = diag(T1, . . . , T_M)is a preconditioner of A.

Remark 3 Consider the preconditioner with optimal filling in section 4.1. For the computation of its kth column, the ‘best’ indices are selected among the k− 1 positions above the diagonal.

Thus, the larger k, the longer the computation of the kth column. With block treatment, the computation time is reduced and the provided preconditioner is still of good quality.

(12)

7. Numerical tests

The CG algorithm for the system Ax= b is tested with the following preconditioners T : (0) NONE;

(i) DIAG: diagonal preconditioner, T = diag(a₁₁^−1/2, . . . , ann^−1/2);

(ii) INC CHO: incomplete Cholesky factorization (see Introduction and [3, 4]);

(iii) FSPAI: factorized sparse approximate inverse (see Introduction and [5]);

(iv) INC CGS: preconditioner provided by the INC CGS algorithm (see section 3) with P = {(i, j) | 1 i j n, aij= 0};

(v) LS CGS (A): preconditioner with A-filling in section 4.1;

(vi) LS CGS (DIAG): preconditioner with diagonal filling in section 4.1;

(vii) LS CGS (OPT): preconditioner with optimal filling in section 4.1;

(viii) DIAG+LS CGS (OPT): coupled preconditioner, diagonal, then LS CGS (OPT);

(ix) LS CGS (OPT) BLOCKS: preconditioner LS CGS (OPT) by blocks;

(x) DIAG+LS CGS (OPT) BLOCKS: preconditioner DIAG+LS CGS (OPT) by blocks.

For preconditioners LS CGS (OPT) and DIAG+LS CGS (OPT), the tolerance will be set to ε≈ λmin/(n− 1), so that δ in Corollary 5.2 is close to one. For preconditioner LS CGS (OPT), λ_min= λmin(A) and for preconditioner DIAG+LS CGS (OPT), λmin= λmin(T₁AT₁^t), with T₁= diag(a^−1/2₁₁ , . . . , ann^−1/2). For preconditioner LS CGS (OPT) BLOCKS (resp. DIAG+LS CGS (OPT) BLOCKS), the same tolerance ε as for preconditioner LS CGS (OPT) (resp.

DIAG+LS CGS (OPT)) is considered. When M blocks are considered, their order is set to m₁= · · · = mr= q + 1 and mr+1= · · · = mM = q, where n = q · M + r is the Euclidean division of n by M (q, r are integers with 0 r < M).

The convergence tolerance tol is set to 10⁻⁸in the CG algorithm. We are interested in the number of iterations required to solve the system. When an error occurs during construction of the preconditioner, the notation ‘EP’ is used. If an error occurs during resolution, the notation

‘ECG’ is used. If the norm of the residual vector is larger than the tolerance tol after n iterations (n is the order of A), we write ‘>n’.

Estimates of the largest eigenvalue (λmax) and the smallest eigenvalue (λmin) are given;

this allows us to obtain the condition number (κ) of the preconditioned system matrix TAT^t (see (2)).

Finally, the computation time for the construction of the preconditioner and for the resolution are presented. The tests are performed on an IntelPentium4 machine at 2.66 GHz.

The SPD test matrices shown in table 1 are considered; the order is n and the number of non-zero coefficients in the upper (or lower) part is NNZ. These matrices can be obtained from http://math.nist.gov/MatrixMarket/ and http://www.cise.ufl.edu/research/sparse/

matrices/.

For each matrix, the results are presented in a table. The number of iterations is denoted It., the computation time (in seconds) for construction of the preconditioner is t_prec, the com- putation time (in seconds) for the resolution is t_resand the number of non-zero coefficients in the preconditioner (for the preconditioners LS CGS (OPT), DIAG+LS CGS (OPT), LS CGS

Table 1. SPD test matrices.

Matrix (A) n NNZ

nos1 237 627

bcsstk27 1224 28 675

s1rmt3m1 5489 112 505

(13)

(OPT) BLOCKS and DIAG+LS CGS (OPT) BLOCKS) is nnz. For preconditioners FSPAI, INC CGS and LS CGS (A), we have nnz= NNZ. (Matrix L of INC CHO also has nnz = NNZ non-zero coefficients.)

7.1 First test: matrix nos1

We report the results for test matrix nos1 in tables 2–9. Note that the computation times (tprec

and tres) are not very significant, since the number of non-zero coefficients in the matrix is small.

These first tests show that, in practice, the CG algorithm does not necessarily converge.

Moreover, they show the efficiency of preconditioners LS CGS (DIAG), LS CGS (OPT) and DIAG+LS CGS (OPT). They are the only ones that give convergence for this matrix. Note that, in the case of LS CGS (DIAG), for a small value of pmax, the CG algorithm does not converge; the same is true when pmax is too large. The reason for this is probably due to rounding errors.

Now, fix the parameter p_maxand take various values for the parameter s in preconditioners LS CGS (OPT) and DIAG+LS CGS (OPT). The results are given in tables 6 and 7.

Table 2. Results for test matrix nos1.

Preconditioner It. tprec tres λmin λmax κ

NONE >237 0.00E+ 00 1.00E− 02 1.23E+ 02 2.46E+ 09 1.99E+ 07

DIAG >237 0.00E+ 00 1.00E− 02 5.09E− 07 2.00E+ 00 3.93E+ 06

INC CHO EGC

FSPAI >237 0.00E+ 00 2.00E− 02 1.92E− 06 1.69E+ 00 8.79E+ 05

INC CGS >237 0.00E+ 00 0.00E+ 00 ? ? ?

LS CGS (A) >237 0.00E+ 00 3.00E− 02 1.65E− 06 2.63E+ 00 1.60E+ 06

Table 3. Preconditioner LS CGS (DIAG).

Preconditioner: LS CGS (DIAG)

pmax It. nnz tprec tres λmin λmax κ

1 >237 473 0.00E+ 00 4.00E− 02 5.09E− 07 2.08E+ 00 4.08E+ 06

5 >237 1407 1.00E− 02 3.00E− 02 2.16E− 06 3.05E+ 00 1.41E+ 06

10 115 2252 1.00E− 02 2.00E− 02 2.83E− 05 2.05E+ 00 7.25E+ 04 20 58 4767 3.00E− 02 1.00E− 02 2.04E− 04 1.89E+ 00 9.25E+ 03 50 25 10 812 1.10E− 01 0.00E+ 00 4.74E− 03 1.64E+ 00 3.46E+ 02 100 145 18 887 5.70E− 01 9.00E− 02 3.17E− 07 1.28E+ 01 4.06E+ 07

200 >237 27 537 1.84E+ 00 1.60E− 01 ? 1.84E+ 01 ?

Table 4. Preconditioner LS CGS (OPT).

Preconditioner: LS CGS (OPT) with ε= 5.23E-01, s = 1

1 >237 471 1.00E− 02 3.00E− 02 1.31E− 06 2.65E+ 00 2.02E+ 06

5 >237 1391 3.00E− 02 3.00E− 02 2.15E− 06 4.25E+ 00 1.97E+ 06

10 >237 2496 5.00E− 02 3.00E− 02 3.05E− 06 2.69E+ 00 8.80E+ 05

20 179 4556 1.30E− 01 3.00E− 02 4.43E− 06 2.47E+ 00 5.57E+ 05 50 44 9536 7.40E− 01 1.00E− 02 1.20E− 04 2.21E+ 00 1.84E+ 04 100 14 14 067 2.20E+ 00 1.00E− 02 1.75E− 02 1.73E+ 00 9.85E+ 01 200 2 15 720 3.20E+ 00 0.00E+ 00 1.00E+ 00 1.00E+ 00 1.00E+ 00

(14)

Table 5. Preconditioner DIAG+LS CGS (OPT).

Preconditioner: DIAG+LS CGS (OPT) with ε = 2.16E-09, s = 1

1 >237 471 1.00E− 02 1.00E− 02 1.20E− 06 2.02E+ 00 1.67E+ 06

5 151 1391 2.00E− 02 1.00E− 02 9.25E− 06 1.79E+ 00 1.93E+ 05 10 112 2496 4.00E− 02 1.00E− 02 1.30E− 05 1.80E+ 00 1.39E+ 05 20 98 4556 1.00E− 01 1.00E− 02 1.33E− 05 2.01E+ 00 1.51E+ 05 50 33 9536 5.00E− 01 1.00E− 02 1.61E− 04 1.88E+ 00 1.17E+ 04 100 15 14 067 1.40E+ 00 1.00E− 02 5.30E− 03 1.72E+ 00 3.25E+ 02 200 2 15 720 2.17E+ 00 0.00E+ 00 1.00E+ 00 1.00E+ 00 1.00E+ 00

The reason why the construction of the preconditioner fails (EP) is that, during the computation of a column, the set J (see (19)) is empty. Increasing the parameter ε allows us to compute the preconditioner entirely.

We report the results for preconditioners LS CGS (OPT) and DIAG+LS CGS (OPT) with block computation in tables 8 and 9.

7.2 Second test: matrix bcsstk27

The results for test matrix bcsstk27 are presented in tables 10–17. Preconditioner INC CGS has nnz= 28 675 non-zero coefficients and 99 iterations (see table 10) are needed. With preconditioner DIAG+LS CGS (OPT), pmax= 20 (table 13), the filling is nnz = 25 494 and 70 iterations are necessary to obtain convergence. Therefore, the latter method gives the best results for this kind of preconditioner.

With the same parameters pmax, ε and s in the preconditioner LS CGS (OPT) (resp.

DIAG+LS CGS (OPT)), when the numbers of blocks is increased, the number of iterations also increases (see tables 12 and 16 (resp. 13 and 17)).

Table 6. Preconditioner LS CGS (OPT).

Preconditioner: LS CGS (OPT) with ε= 5.23E-01, pmax= 50

s It. nnz tprec tres λmin λmax κ

2 37 9536 4.60E− 01 0.00E+ 00 2.40E− 04 2.09E+ 00 8.68E+ 03

3 >237 9643 3.10E− 01 5.00E− 02 ? 4.08E+ 00 ?

4 EP

5 >237 9802 1.80E− 01 6.00E− 02 ? 6.26E+ 00 ?

50 >237 9802 1.70E− 01 6.00E− 02 ? 6.26E+ 00 ?

Table 7. Preconditioner DIAG+LS CGS (OPT).

Preconditioner: DIAG+LS CGS (OPT) with ε = 2.16E-09, pmax= 50

s It. nnz tprec tres λmin λmax κ

2 25 9536 2.60E− 01 1.00E− 02 8.75E− 05 1.91E+ 00 2.19E+ 04 3 147 9643 2.30E− 01 4.00E− 02 2.44E− 06 2.54E+ 00 1.04E+ 06

4 EP

5 EP

50 EP