• No results found

A matrix-free preconditioner for sparse symmetric positive definite systems and least square problems

N/A
N/A
Protected

Academic year: 2021

Share "A matrix-free preconditioner for sparse symmetric positive definite systems and least square problems"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . . ...

A matrix-free preconditioner for sparse symmetric

positive definite systems and least square problems

Stefania Bellavia

Dipartimento di Ingegneria Industriale Universit`a degli Studi di Firenze

Joint work with

Jacek Gondzio and Benedetta Morini

Lavoro svolto nellambito del Progetto INdAM-GNCS 2012 Metodi e software numerici per il precondizionamento di sistemi lineari nella risoluzione di PDE e di

problemi di ottimizzazione

(2)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

The Problem

Consider systems of the form

Hx =b,

with H∈ Rm×m s.p.d.

Special interest in the case

H =AΘAT

with A∈ Rm×n sparse and Θ∈ Rn×n diagonal s.p.d.

They arise in at least two prominent applications in the area of optimization: Newton-like methods for weighted least-squares problems , interior point methods.

(3)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

We assume that H is too large and/or too difficult to be formed and

solved directly. We will solve it using an iterative Conjugate Gradient (CG) like approach.

We are interested inpreconditioning H with areliable algorithm that

does not require forming the whole matrixH at a time (matrix-free).

We are also interested in solving sequences of linear systems arising in optimization methods.

(4)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

Preconditioning

H

Incomplete Cholesky (IC) factorizations are matrix-free in the sense

that the columns ofH can be computed one at a time, and then

discarded. Breakdown-free whenH is an H-matrix.

IC factorizations relying ondrop tolerances to reduce fill-in have

unpredictable memory requirements.

Alternative approaches with predictable memory requirements depend

on the entries of H, [Jones, Plassmann, ACM Trans. Math. Software 1995], [Lin, Mor´e, SISC 1999].

E.g., let nk =nnz(tril(H(:,k),−1)) and retain thenk +p largest

elements in the strict lower triangular part of thekth column of the

factor, for some fixed p >0.

(5)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

Preconditioning

H

Approximate Inverse preconditionersform factorized sparse

approximations for H−1.

The Stabilized Approximate Inverse preconditioner (SAINV) by

[Benzi, Cullum, Tuma, SISC 2000] is based on a modified Gram-Schmidt process.

It is matrix-free, i.e. it employsH multiplicatively andmay work

entirely withAT.

It preserves sparsity in the factors by droppingsmall elements.

In exact arithmetic, it is applicable to any SPD matrix without breakdowns.

The underlying assumption is that most entries ofH−1 are small in

(6)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

Properties of our preconditioner

Limited memory: memory bounded byO(m) rather thanO(nz(H)).

Matrix free: only the action of H on a vector is needed.

Only a small numberk ≪m of general matrix-vector products is

required.

Thediagonal of H or its approximation is needed: we expect that in many practical applications we will be able to compute or estimate

the diagonal ofH at low cost.

(7)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction

Properties of our preconditioner

Limited memory: memory bounded byO(m) rather thanO(nz(H)).

Matrix free: only the action of H on a vector is needed.

Only a small numberk ≪m of general matrix-vector products is

required.

Thediagonal of H or its approximation is needed: we expect that in many practical applications we will be able to compute or estimate

the diagonal ofH at low cost.

(8)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . LMP Preconditioner

The preconditioner

“Partial” Cholesky factorization limited to a small numberk of columns of H + diagonal approximation of the Schur complement,[Gondzio, COAP 2011].

1. Choosek ≪m.

Consider the formal partition ofH

H = [ H11 H21T H21 H22 ] , H11∈ Rk×k,H21∈ R(m−k)×k,H22∈ R(m−k)×(m−k).

(9)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner

The preconditioner c.ed

3. Compute the Cholesky factorization

[ L11 L21 ] of H limited to [ H11 H21 ] .

Compute theLDLT factorization H11=L11Q11LT11. (DiscardH11.)

Solve L11Q11LT21=H21T for L21, i.e. L21=H21L−11TQ−

1 11. (Discard H21). It follows H= [ L11 L21 Im−k ] [ Q11 S ] [ LT11 LT21 Im−k ] , where S =H22−H21H111H21T,

(10)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner

The preconditioner c.ed

3. Compute the Cholesky factorization

[ L11 L21 ] of H limited to [ H11 H21 ] .

Compute theLDLT factorization H11=L11Q11LT11. (DiscardH11.)

Solve L11Q11LT21=H21T for L21, i.e. L21=H21L−11TQ−

1 11. (Discard H21). It follows H= [ L11 L21 Im−k ] [ Q11 S ] [ LT11 LT21 Im−k ] , where S =H22−H21H111H21T,

(11)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner

The preconditioner c.ed

4. Set

Q22=diag(S) =diag(H22)−diag(L21Q11LT21)

and P = [ L11 L21 Im−k ] | {z } L [ Q11 Q22 ] | {z } Q [ LT11 LT21 Im−k ] | {z } LT

The algorithm for constructing P has some good properties:

it cannot break down in exact arithmetic;

(12)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner

Storage and computational cost

The complete diagonal of H is required.

If it is not available andH =AΘAT:

(H)ii =∥ATei∥22, i = 1, . . . ,m

Storage: one (sparse) vectorATei at a time and a vector for the

diagonal of H.

The firstk columns of H are computed and stored:

Hei, i = 1, . . . ,k

The additional cost of this step isk products ofH times a vector.

The productsHei are cheap if H (orA) is sparse.

Thek productsHei are expected to be cheaper than the products Hv

(13)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner

Factorized form of

P

1 By P = [ L11 L21 Im−k ] [ Q11 Q22 ] [ LT11 LT21 Im−k ] ,it follows P−1 = [ L−11T −L−11TLT21 0 Im−k ] [ Q111 0 0 Q221 ] [ L−111 0 −L21L−111 Im−k ]

i.e. a factorized sparse approximation for H−1.

Letting R = [ L11 L21 Im−k ] [ Q111/2 Q221/2 ] we haveP =RTR.

P−1H is similar to the block diagonal matrix

[

Ik 0 0 Q221S

(14)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner

Factorized form of

P

1 By P = [ L11 L21 Im−k ] [ Q11 Q22 ] [ LT11 LT21 Im−k ] ,it follows P−1 = [ L−11T −L−11TLT21 0 Im−k ] [ Q111 0 0 Q221 ] [ L−111 0 −L21L−111 Im−k ]

i.e. a factorized sparse approximation for H−1.

Letting R = [ L11 L21 Im−k ] [ Q111/2 Q221/2 ] we haveP =RTR.

P−1H is similar to the block diagonal matrix

[

Ik 0 0 Q221S

(15)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner

Spectral analysis of

P

1

H

k eigenvalues ofP−1H are equal to 1.

The other eigenvalues are eigenvalues ofQ221S and

λ(Q221S) λmin(S) λmax(Q22) λmin(H) λmax(diag(S)) λ(Q221S) λmax(S) λmin(Q22) λmax(H22) λmin(diag(S))

(16)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner

Reordering of

H

A “greedy” heuristic technique acts on the largest eigenvalues of H.

Since H is SPD, λmax(H)≤tr(H) =tr(H11) +tr(H22). IfQ22=I, then P−1H is similar to [ Ik 0 0 S ] ,and λmax(P−1H)≤tr ([ Ik 0 0 S ]) =k+tr(S).

Permuting rows and columns ofH so that H11 contains the k largest

elements of diag(H) would imply

k+tr(S)≪tr(H)

and a large reduction in the value ofλmax(P−1H) with respect to

(17)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Deflated CG

Handling small eigenvalues

Applying the greedy technique requires no extra storage.

In most cases, the “greedy” reordering takes care of the largest

eigenvalues of H andκ2(R−1HR−T)is reduced considerably with

respect to κ2(H).

On the other hand,the smallest eigenvalues of H are sligtly modified

or moved towards the origin.

When the convergence of CG (or CG-like) method is hampered by a

small number of eigenvalues of P−1H close to zero, the

Preconditioned Deflated-CG or CG-like algorithm can be useful,

(18)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Deflated CG

Preconditioned Deflated-CG

Let the eigenvalues ofP−1H be labeled in increasing order:

λ1(P−1H)≤ · · · ≤λm(P−1H).

Ideal case: Injectl exact eigenvectors ofP−1H associated to

λ1(P1H), . . . , λl(P1H), into the Krylov subspace . ∥x∗−xj∥H 2 (µ1 µ+ 1 )j ∥x∗−x0∥H, µ= λm(P 1H) λl+1(P−1H)

Therefore, convergence of CG method is improved if a few

eigenvalues are close to the origin and well separated from the others.

If thel eigenvectors of P−1H are numerically approximated, one can

expect

(19)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Deflated CG

Preconditioned Deflated-CG c.ed

Apply Deflated-CG to the split-preconditioned system

R−THR−1y=R−Tb, x =R−1y

using a few eigenvectors associated to the smallest eigenvalues of

R−THR−1

Symmetric Lanczos processes for sparse symmetric eigenvalue

problems require products of R−THR−1 times a vector. Each product has the cost of one preconditioned PCG iteration.

To amortize the cost of approximating eigenvectors, Preconditioned

Deflated-CG is suitable for solving systems with multiple right-hand

(20)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Numerical experiments

We implemented the preconditioner in Matlab,ϵm = 2.1016.

Initial guess for PCG: x0 = (0, . . . ,0)T.

Stopping criterion: ∥Hxj −b∥2106∥b∥2 .

A failure is declared after 1000 iterations.

H =AAT, 35 matricesAfrom the University of Florida

Sparse Matrix Collection, Groups: LPnetlib, Meszarosfor Linear Programming problems.

1090≤m≤105127

2.20 105 ≤dens(A)6.50 103, 5.51 105 ≤dens(H)

2.51 101

(21)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Numerical experiments c.ed

Experiments with SAINV preconditioner

H−1 ≈ZD−1ZT

whereZ is unit upper triangular,D is diagonal.

Code from Sparselabpackage developed by M. Tuma.

First drop tolerance tested: 101.

(22)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Cost Comparison

Tabella : Cost of the construction and application of LMP and SAINV.

Type Construction Application LMP m sparse-to-sparse products Θ1/2(ATe

i) 2 backsolves withL11

k sparse-to-sparse productsAΘ(ATei) 1 mat-vec product withD−1

m−k backsolves withL11 m−k scalar products inRk

m−k scalar products inRk k scalar products inRm−k

SAINV m sparse-to-sparse productsAΘ(ATv) 2 mat-vec products with Z

(23)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Comparison between LMP(50) and LMP(100)

LMP(100) outperforms LMP(50) in terms of PCG iterations.

1 1.5 2 2.5 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ πs ( τ )

Performance profile,execution time

LMP(50) LMP(100)

(24)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Comparison between LMP(50) and SAINV

SAINV solved 21 systems.Performance profile on the tests successfully solved by all preconditioners.

1 2 3 4 5 6 7 8 9 0 0.2 0.4 0.6 0.8 1 τ πs ( τ )

Performance profile, CG iterations

LMP(50) SAINV 2 4 6 8 10 12 14 16 0 0.2 0.4 0.6 0.8 1 τ πs ( τ )

Performance profile,execution time

LMP(50) SAINV

(25)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Preconditioner density

0 5 10 15 20 25 10−4 10−2 100

density of H and of the factors L and Z

L Z H 0 5 10 15 20 25 10−4 10−2 100

density of the factors L and L−1

L

(26)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Experiments with Preconditioned Deflated-CG

A few eigenvectors ofR−THR−1. are computed by the Matlab

package PROPACK [R.M. Larsen, 1998].

The symmetric Lanczos algorithm with partial reorthogonalization is applied.

A loose accuracy for the convergence criterion, 101, is fixed along

with aspecified maximum dimension, DIM L, of the Lanczos basis

allowed.

The number of products of matrix-vector products is at most DIM L.

In the Preconditioned Deflated-CG we injected the estimated eigenvectors.

If convergence was not achieved, the vectors associated with eigenvalues smaller than a prescribed tolerance are selected.

(27)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Solution of a single system

Prec. Prec.

H P−1H Defl-CG CG

Test name λmax λmin λmax λmin IT L IT L

lp d2q06c 1.27e6 6.37e-4 6.48e0 3.39e-5 278 338 lp pilot 1.10e5 1.55e-2 1.22e1 2.58e-4 160 264 lp pilot87 1.01e6 1.52e-2 2.22e1 2.01e-4 250 294 lp stocfor2 1.60e6 1.98e-3 7.71e0 1.17e-6 97 144 lpi bgindy 8.97e3 4.07e-2 5.55e0 8.29e-3 38 53 ge 1.89e8 4.90e-5 1.21e1 8.78e-7 41 58 nl 8.26e4 7.00e-3 7.30e0 1.61e-4 388 441 scrs8-2c 1.85e3 3.49e-5 5.39e1 8.32e-5 102 140

Preconditioner formed with k = 50

Number of small eigenvalues estimated: 5

(28)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Sequences of normal equations from least-squares

problems

Sequences of normal equations arise in the solution of constrained and unconstrained least-squares problems. If the coefficient matrices

vary slowly, apreconditioner freeze strategy for LMP coupled with

Deflated-CGLS can be used.

We solved the Nonnegative Linear Least-Squares problems

min x≥0 1 2∥Bx−d∥ 2 2,

B full rank, by the interior Newton-like method [Bellavia, Macconi,

Morini, NLAA 2006].

The trial step at jth nonlinear iteration solves

min p∈IRn ( BSj Wj ) p+ ( Bxj −d 0 ) 2 2 ,

(29)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

LMP in NNLS

The matrix of the normal equation is

Hj =AjATj , Aj = (

SjBT Wj )

, j = 0,1, . . .

whereSj andWj are matrices with entries in (0,1] and [0,1]

respectively.

We solve the sequence of linear systems with a frozen preconditioner.

For a seed matrix, sayH0, we form the LMP preconditioner and

computel approximate eigenvectors associated to the smallest

eigenvalues.

We reuse the preconditioner and the eigenvectors troughout the nonlinear iterations until the preconditioner deteriorates, i.e. the limit of CGLS iterations is reached.

Then, the LMP preconditioner andl eigenvectors are refreshedfor the current matrix.

(30)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

LMP(100), 5 small eigs estimated, Lanczos basis dim.: 50

Prec. Defl-CGLS Prec. CGLS

Test IT NL(R) IT L IT NL(R) IT L Savings in mat-vec prod. lp pilot87 27(1) 3639 30(1) 6023 36% lp ken 11 14 512 19 720 12% lp ken 13 14 485 19 881 31% lp ken 18 24 1937 18 2449 14% lp pds 10 11 607 11 834 15% lp pds 20 13 1629 13 1877 9% lp truss 13 512 14 951 34% deter3 23 1441 28 1910 16% deter5 13 844 26 1939 51% deter7 18 1242 21 2050 33% fxm2-16 33(3) 8686 47(2) 10771 17% ge 35(3) 8425 34(3) 10021 13% nl 28(5) 7376 32(6) 10891 30% scrs8-2c 17 163 *

(31)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Final comments

Work in progress:

We are using LMP preconditioner in the solution of linear systems

arising inElectrostatic and Electromagnetic problems, in cooperation

with A. Tamburrino, S. Ventre, University of Cassino.

The matrix H is s.p.d. can be decomposed as

H=Hfar +Hnear, -Hnear is available and includes the diagonal of H

-Hfar is not available, the action of Hfar on a vector can be (approximated) computed.

S. B., J. Gondzio, B. Morini, A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems , SISC in corso di stampa. J. Gondzio, Interior point methods 25 years later, EJOR (2012)

(32)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Final comments

Work in progress:

We are using LMP preconditioner in the solution of linear systems

arising inElectrostatic and Electromagnetic problems, in cooperation

with A. Tamburrino, S. Ventre, University of Cassino.

The matrix H is s.p.d. can be decomposed as

H=Hfar +Hnear, -Hnear is available and includes the diagonal of H

-Hfar is not available, the action of Hfar on a vector can be (approximated) computed.

S. B., J. Gondzio, B. Morini, A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems , SISC in corso di stampa. J. Gondzio, Interior point methods 25 years later, EJOR (2012)

(33)

... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results

Final comments

Work in progress:

We are using LMP preconditioner in the solution of linear systems

arising inElectrostatic and Electromagnetic problems, in cooperation

with A. Tamburrino, S. Ventre, University of Cassino.

The matrix H is s.p.d. can be decomposed as

H=Hfar +Hnear, -Hnear is available and includes the diagonal of H

-Hfar is not available, the action of Hfar on a vector can be (approximated) computed.

S. B., J. Gondzio, B. Morini, A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems , SISC in corso di stampa. J. Gondzio, Interior point methods 25 years later, EJOR (2012)

Figure

Tabella : Cost of the construction and application of LMP and SAINV.

References

Related documents

The current study prospectively examined the association between fetal head growth and the spectrum of autistic symptom severity in two large population-based cohorts, including

• Claims process improvements • Nordic integration Operational efficiency 1999-2004 Market effectiveness 2003-2006 Profitable growth 2005-2008. • Segment-specific value propositions

Usage Tip: Review the success of your efforts with a side-by-side comparison of pre and post alignment metrics, such as average lead follow up times and the number of qualified

This study describes the first application of the Greek versions of the Nursing Activities Score (NAS), the Com- prehensive Nursing Intervention Score (CNIS), the

What are the benefits and shortcomings of using relative dating techniques such as soil development, progressive weathering data, and stratigraphic relationships in establishing

INCONEL alloy 718 is readily welded by the gas tungsten- arc (TIG) process using INCONEL Filler Metal 718. Composition of this filler metal is shown in Table 34. Mechanical

Does the office annually compile statistical data on scene visits by medical examiners or medical examiner investigators. Does the office annually compile statistical data on bodies

Conquest shareholders the opportunity to share in an exciting technology that brings online, real time market dynamics and transparency to the property rental market in the