A matrix-free preconditioner for sparse symmetric positive definite systems and least square problems

(1)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . . ...

A matrix-free preconditioner for sparse symmetric

positive definite systems and least square problems

Stefania Bellavia

Dipartimento di Ingegneria Industriale Universit`a degli Studi di Firenze

Joint work with

Jacek Gondzio and Benedetta Morini

Lavoro svolto nellambito del Progetto INdAM-GNCS 2012 Metodi e software numerici per il precondizionamento di sistemi lineari nella risoluzione di PDE e di

problemi di ottimizzazione

(2)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

The Problem

Consider systems of the form

Hx =b,

with H∈ Rm×m s.p.d.

Special interest in the case

H =AΘAT

with A∈ Rm×n sparse and Θ∈ Rn×n diagonal s.p.d.

They arise in at least two prominent applications in the area of optimization: Newton-like methods for weighted least-squares problems , interior point methods.

(3)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

We assume that H is too large and/or too difficult to be formed and

solved directly. We will solve it using an iterative Conjugate Gradient (CG) like approach.

We are interested inpreconditioning H with areliable algorithm that

does not require forming the whole matrixH at a time (matrix-free).

We are also interested in solving sequences of linear systems arising in optimization methods.

(4)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

Preconditioning

H

Incomplete Cholesky (IC) factorizations are matrix-free in the sense

that the columns ofH can be computed one at a time, and then

discarded. Breakdown-free whenH is an H-matrix.

IC factorizations relying ondrop tolerances to reduce fill-in have

unpredictable memory requirements.

Alternative approaches with predictable memory requirements depend

on the entries of H, [Jones, Plassmann, ACM Trans. Math. Software 1995], [Lin, Mor´e, SISC 1999].

E.g., let nk =nnz(tril(H(:,k),−1)) and retain thenk +p largest

elements in the strict lower triangular part of thekth column of the

factor, for some fixed p >0.

(5)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

Preconditioning

H

Approximate Inverse preconditionersform factorized sparse

approximations for H−1.

The Stabilized Approximate Inverse preconditioner (SAINV) by

[Benzi, Cullum, Tuma, SISC 2000] is based on a modified Gram-Schmidt process.

It is matrix-free, i.e. it employsH multiplicatively andmay work

entirely withAT.

It preserves sparsity in the factors by droppingsmall elements.

In exact arithmetic, it is applicable to any SPD matrix without breakdowns.

The underlying assumption is that most entries ofH−1 are small in

(6)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

Properties of our preconditioner

Limited memory: memory bounded byO(m) rather thanO(nz(H)).

Matrix free: only the action of H on a vector is needed.

Only a small numberk ≪m of general matrix-vector products is

required.

Thediagonal of H or its approximation is needed: we expect that in many practical applications we will be able to compute or estimate

the diagonal ofH at low cost.

(7)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

Properties of our preconditioner

Limited memory: memory bounded byO(m) rather thanO(nz(H)).

Matrix free: only the action of H on a vector is needed.

Only a small numberk ≪m of general matrix-vector products is

required.

Thediagonal of H or its approximation is needed: we expect that in many practical applications we will be able to compute or estimate

the diagonal ofH at low cost.

(8)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . LMP Preconditioner

The preconditioner

“Partial” Cholesky factorization limited to a small numberk of columns of H + diagonal approximation of the Schur complement,[Gondzio, COAP 2011].

1. Choosek ≪m.

Consider the formal partition ofH

H = [ H11 H21T H21 H22 ] , H11∈ Rk×k,H21∈ R(m−k)×k,H22∈ R(m−k)×(m−k).

(9)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner

The preconditioner c.ed

3. Compute the Cholesky factorization

[ L11 L21 ] of H limited to [ H11 H21 ] .

Compute theLDLT factorization H11=L11Q11LT11. (DiscardH11.)

Solve L11Q11LT21=H21T for L21, i.e. L21=H21L−11TQ−

1 11. (Discard H21). It follows H= [ L11 L21 Im−k ] [ Q11 S ] [ LT₁₁ LT₂₁ Im−k ] , where S =H22−H21H₁₁−1H₂₁T,

(10)

The preconditioner c.ed

3. Compute the Cholesky factorization

[ L11 L21 ] of H limited to [ H11 H21 ] .

Compute theLDLT factorization H11=L11Q11LT11. (DiscardH11.)

Solve L11Q11LT21=H21T for L21, i.e. L21=H21L−11TQ−

1 11. (Discard H21). It follows H= [ L11 L21 Im−k ] [ Q11 S ] [ LT₁₁ LT₂₁ Im−k ] , where S =H22−H21H₁₁−1H₂₁T,

(11)

The preconditioner c.ed

4. Set

Q22=diag(S) =diag(H22)−diag(L21Q11LT₂₁)

and P = [ L11 L21 Im−k ] | {z } L [ Q11 Q22 ] | {z } Q [ LT₁₁ LT₂₁ Im−k ] | {z } LT

The algorithm for constructing P has some good properties:

it cannot break down in exact arithmetic;

(12)

Storage and computational cost

The complete diagonal of H is required.

If it is not available andH =AΘAT:

(H)ii =∥ATei∥22, i = 1, . . . ,m

Storage: one (sparse) vectorATei at a time and a vector for the

diagonal of H.

The firstk columns of H are computed and stored:

Hei, i = 1, . . . ,k

The additional cost of this step isk products ofH times a vector.

The productsHei are cheap if H (orA) is sparse.

Thek productsHei are expected to be cheaper than the products Hv

(13)

Factorized form of

P

−1 By P = [ L11 L21 Im−k ] [ Q11 Q22 ] [ LT₁₁ LT₂₁ Im−k ] ,it follows P−1 = [ L−₁₁T −L−₁₁TLT₂₁ 0 Im−k ] [ Q₁₁−1 0 0 Q₂₂−1 ] [ L−₁₁1 0 −L21L−111 Im−k ]

i.e. a factorized sparse approximation for H−1.

Letting R = [ L11 L21 Im−k ] [ Q₁₁1/2 Q₂₂1/2 ] we haveP =RTR.

P−1H is similar to the block diagonal matrix

[

Ik 0 0 Q₂₂−1S

(14)

Factorized form of

P

−1 By P = [ L11 L21 Im−k ] [ Q11 Q22 ] [ LT₁₁ LT₂₁ Im−k ] ,it follows P−1 = [ L−₁₁T −L−₁₁TLT₂₁ 0 Im−k ] [ Q₁₁−1 0 0 Q₂₂−1 ] [ L−₁₁1 0 −L21L−111 Im−k ]

i.e. a factorized sparse approximation for H−1.

Letting R = [ L11 L21 Im−k ] [ Q₁₁1/2 Q₂₂1/2 ] we haveP =RTR.

P−1H is similar to the block diagonal matrix

[

Ik 0 0 Q₂₂−1S

(15)

Spectral analysis of

P

−1

H

k eigenvalues ofP−1H are equal to 1.

The other eigenvalues are eigenvalues ofQ₂₂−1S and

λ(Q₂₂−1S) ≥ λmin(S) λmax(Q22) ≥ λmin(H) λmax(diag(S)) λ(Q₂₂−1S) ≤ λmax(S) λmin(Q22) ≤ λmax(H22) λmin(diag(S))

(16)

Reordering of

H

A “greedy” heuristic technique acts on the largest eigenvalues of H.

Since H is SPD, λmax(H)≤tr(H) =tr(H11) +tr(H22). IfQ22=I, then P−1H is similar to [ Ik 0 0 S ] ,and λmax(P−1H)≤tr ([ Ik 0 0 S ]) =k+tr(S).

Permuting rows and columns ofH so that H11 contains the k largest

elements of diag(H) would imply

k+tr(S)≪tr(H)

and a large reduction in the value ofλmax(P−1H) with respect to

(17)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Deflated CG

Handling small eigenvalues

Applying the greedy technique requires no extra storage.

In most cases, the “greedy” reordering takes care of the largest

eigenvalues of H andκ2(R−1HR−T)is reduced considerably with

respect to κ2(H).

On the other hand,the smallest eigenvalues of H are sligtly modified

or moved towards the origin.

When the convergence of CG (or CG-like) method is hampered by a

small number of eigenvalues of P−1_H _{close to zero, the}

Preconditioned Deflated-CG or CG-like algorithm can be useful,

(18)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Deflated CG

Preconditioned Deflated-CG

Let the eigenvalues ofP−1_H _{be labeled in increasing order:}

λ1(P−1H)≤ · · · ≤λm(P−1H).

Ideal case: Injectl exact eigenvectors ofP−1H associated to

λ1(P−1H), . . . , λl(P−1H), into the Krylov subspace . ∥x∗−xj∥H ≤2 (√_µ₋₁ √_µ_{+ 1} )j ∥x∗−x0∥H, µ= λm(P −1_H₎ λl+1(P−1H)

Therefore, convergence of CG method is improved if a few

eigenvalues are close to the origin and well separated from the others.

If thel eigenvectors of P−1H are numerically approximated, one can

expect

(19)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Deflated CG

Preconditioned Deflated-CG c.ed

Apply Deflated-CG to the split-preconditioned system

R−THR−1y=R−Tb, x =R−1y

using a few eigenvectors associated to the smallest eigenvalues of

R−THR−1

Symmetric Lanczos processes for sparse symmetric eigenvalue

problems require products of R−THR−1 times a vector. Each product has the cost of one preconditioned PCG iteration.

To amortize the cost of approximating eigenvectors, Preconditioned

Deflated-CG is suitable for solving systems with multiple right-hand

(20)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Numerical experiments

We implemented the preconditioner in Matlab,ϵm = 2.10−16.

Initial guess for PCG: x0 = (0, . . . ,0)T.

Stopping criterion: ∥Hxj −b∥2≤10−6∥b∥2 .

A failure is declared after 1000 iterations.

H =AAT, 35 matricesAfrom the University of Florida

Sparse Matrix Collection, Groups: LPnetlib, Meszarosfor Linear Programming problems.

1090≤m≤105127

2.20 10−5 ≤dens(A)≤6.50 10−3, 5.51 10−5 ≤dens(H)≤

2.51 10−1

(21)

Numerical experiments c.ed

Experiments with SAINV preconditioner

H−1 ≈ZD−1ZT

whereZ is unit upper triangular,D is diagonal.

Code from Sparselabpackage developed by M. Tuma.

First drop tolerance tested: 10−1.

(22)

Cost Comparison

Tabella : Cost of the construction and application of LMP and SAINV.

Type Construction Application LMP m sparse-to-sparse products Θ1/2₍_AT_e

i) 2 backsolves withL11

k sparse-to-sparse productsAΘ(ATei) 1 mat-vec product withD−1

m−k backsolves withL11 m−k scalar products inRk

m−k scalar products inRk k scalar products inRm−k

SAINV m sparse-to-sparse productsAΘ(AT_v₎ _{2 mat-vec products with} _Z

(23)

Comparison between LMP(50) and LMP(100)

LMP(100) outperforms LMP(50) in terms of PCG iterations.

1 1.5 2 2.5 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ πs ( τ )

Performance profile,execution time

LMP(50) LMP(100)

(24)

Comparison between LMP(50) and SAINV

SAINV solved 21 systems.Performance profile on the tests successfully solved by all preconditioners.

1 2 3 4 5 6 7 8 9 0 0.2 0.4 0.6 0.8 1 τ πs ( τ )

Performance profile, CG iterations

LMP(50) SAINV 2 4 6 8 10 12 14 16 0 0.2 0.4 0.6 0.8 1 τ πs ( τ )

Performance profile,execution time

LMP(50) SAINV

(25)

Preconditioner density

0 5 10 15 20 25 10−4 10−2 100

density of H and of the factors L and Z

L Z H 0 5 10 15 20 25 10−4 10−2 100

density of the factors L and L−1

L

(26)

Experiments with Preconditioned Deflated-CG

A few eigenvectors ofR−THR−1. are computed by the Matlab

package PROPACK [R.M. Larsen, 1998].

The symmetric Lanczos algorithm with partial reorthogonalization is applied.

A loose accuracy for the convergence criterion, 10−1, is fixed along

with aspecified maximum dimension, DIM L, of the Lanczos basis

allowed.

The number of products of matrix-vector products is at most DIM L.

In the Preconditioned Deflated-CG we injected the estimated eigenvectors.

If convergence was not achieved, the vectors associated with eigenvalues smaller than a prescribed tolerance are selected.

(27)

Solution of a single system

Prec. Prec.

H P−1_H _Defl-CG _CG

Test name λmax λmin λmax λmin IT L IT L

lp d2q06c 1.27e6 6.37e-4 6.48e0 3.39e-5 278 338 lp pilot 1.10e5 1.55e-2 1.22e1 2.58e-4 160 264 lp pilot87 1.01e6 1.52e-2 2.22e1 2.01e-4 250 294 lp stocfor2 1.60e6 1.98e-3 7.71e0 1.17e-6 97 144 lpi bgindy 8.97e3 4.07e-2 5.55e0 8.29e-3 38 53 ge 1.89e8 4.90e-5 1.21e1 8.78e-7 41 58 nl 8.26e4 7.00e-3 7.30e0 1.61e-4 388 441 scrs8-2c 1.85e3 3.49e-5 5.39e1 8.32e-5 102 140

Preconditioner formed with k = 50

Number of small eigenvalues estimated: 5

(28)

Sequences of normal equations from least-squares

problems

Sequences of normal equations arise in the solution of constrained and unconstrained least-squares problems. If the coefficient matrices

vary slowly, apreconditioner freeze strategy for LMP coupled with

Deflated-CGLS can be used.

We solved the Nonnegative Linear Least-Squares problems

min x≥0 1 2∥Bx−d∥ 2 2,

B full rank, by the interior Newton-like method [Bellavia, Macconi,

Morini, NLAA 2006].

The trial step at jth nonlinear iteration solves

min p∈IRn ( BSj Wj ) p+ ( Bxj −d 0 ) 2 2 ,

(29)

LMP in NNLS

The matrix of the normal equation is

Hj =AjATj , Aj = (

SjBT Wj )

, j = 0,1, . . .

whereSj andWj are matrices with entries in (0,1] and [0,1]

respectively.

We solve the sequence of linear systems with a frozen preconditioner.

For a seed matrix, sayH0, we form the LMP preconditioner and

computel approximate eigenvectors associated to the smallest

eigenvalues.

We reuse the preconditioner and the eigenvectors troughout the nonlinear iterations until the preconditioner deteriorates, i.e. the limit of CGLS iterations is reached.

Then, the LMP preconditioner andl eigenvectors are refreshedfor the current matrix.

(30)

LMP(100), 5 small eigs estimated, Lanczos basis dim.: 50

Prec. Defl-CGLS Prec. CGLS

Test IT NL(R) IT L IT NL(R) IT L Savings in mat-vec prod. lp pilot87 27(1) 3639 30(1) 6023 36% lp ken 11 14 512 19 720 12% lp ken 13 14 485 19 881 31% lp ken 18 24 1937 18 2449 14% lp pds 10 11 607 11 834 15% lp pds 20 13 1629 13 1877 9% lp truss 13 512 14 951 34% deter3 23 1441 28 1910 16% deter5 13 844 26 1939 51% deter7 18 1242 21 2050 33% fxm2-16 33(3) 8686 47(2) 10771 17% ge 35(3) 8425 34(3) 10021 13% nl 28(5) 7376 32(6) 10891 30% scrs8-2c 17 163 *

(31)

Final comments

Work in progress:

We are using LMP preconditioner in the solution of linear systems

arising inElectrostatic and Electromagnetic problems, in cooperation

with A. Tamburrino, S. Ventre, University of Cassino.

The matrix H is s.p.d. can be decomposed as

H=Hfar +Hnear, -Hnear is available and includes the diagonal of H

-Hfar is not available, the action of Hfar on a vector can be (approximated) computed.

S. B., J. Gondzio, B. Morini, A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems , SISC in corso di stampa. J. Gondzio, Interior point methods 25 years later, EJOR (2012)

(32)

Final comments

Work in progress:

(33)

Final comments

Work in progress: