... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . . ...
A matrix-free preconditioner for sparse symmetric
positive definite systems and least square problems
Stefania Bellavia
Dipartimento di Ingegneria Industriale Universit`a degli Studi di Firenze
Joint work with
Jacek Gondzio and Benedetta Morini
Lavoro svolto nellambito del Progetto INdAM-GNCS 2012 Metodi e software numerici per il precondizionamento di sistemi lineari nella risoluzione di PDE e di
problemi di ottimizzazione
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction
The Problem
Consider systems of the form
Hx =b,
with H∈ Rm×m s.p.d.
Special interest in the case
H =AΘAT
with A∈ Rm×n sparse and Θ∈ Rn×n diagonal s.p.d.
They arise in at least two prominent applications in the area of optimization: Newton-like methods for weighted least-squares problems , interior point methods.
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction
We assume that H is too large and/or too difficult to be formed and
solved directly. We will solve it using an iterative Conjugate Gradient (CG) like approach.
We are interested inpreconditioning H with areliable algorithm that
does not require forming the whole matrixH at a time (matrix-free).
We are also interested in solving sequences of linear systems arising in optimization methods.
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction
Preconditioning
H
Incomplete Cholesky (IC) factorizations are matrix-free in the sense
that the columns ofH can be computed one at a time, and then
discarded. Breakdown-free whenH is an H-matrix.
IC factorizations relying ondrop tolerances to reduce fill-in have
unpredictable memory requirements.
Alternative approaches with predictable memory requirements depend
on the entries of H, [Jones, Plassmann, ACM Trans. Math. Software 1995], [Lin, Mor´e, SISC 1999].
E.g., let nk =nnz(tril(H(:,k),−1)) and retain thenk +p largest
elements in the strict lower triangular part of thekth column of the
factor, for some fixed p >0.
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction
Preconditioning
H
Approximate Inverse preconditionersform factorized sparse
approximations for H−1.
The Stabilized Approximate Inverse preconditioner (SAINV) by
[Benzi, Cullum, Tuma, SISC 2000] is based on a modified Gram-Schmidt process.
It is matrix-free, i.e. it employsH multiplicatively andmay work
entirely withAT.
It preserves sparsity in the factors by droppingsmall elements.
In exact arithmetic, it is applicable to any SPD matrix without breakdowns.
The underlying assumption is that most entries ofH−1 are small in
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction
Properties of our preconditioner
Limited memory: memory bounded byO(m) rather thanO(nz(H)).
Matrix free: only the action of H on a vector is needed.
Only a small numberk ≪m of general matrix-vector products is
required.
Thediagonal of H or its approximation is needed: we expect that in many practical applications we will be able to compute or estimate
the diagonal ofH at low cost.
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Introduction
Properties of our preconditioner
Limited memory: memory bounded byO(m) rather thanO(nz(H)).
Matrix free: only the action of H on a vector is needed.
Only a small numberk ≪m of general matrix-vector products is
required.
Thediagonal of H or its approximation is needed: we expect that in many practical applications we will be able to compute or estimate
the diagonal ofH at low cost.
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . LMP Preconditioner
The preconditioner
“Partial” Cholesky factorization limited to a small numberk of columns of H + diagonal approximation of the Schur complement,[Gondzio, COAP 2011].
1. Choosek ≪m.
Consider the formal partition ofH
H = [ H11 H21T H21 H22 ] , H11∈ Rk×k,H21∈ R(m−k)×k,H22∈ R(m−k)×(m−k).
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner
The preconditioner c.ed
3. Compute the Cholesky factorization
[ L11 L21 ] of H limited to [ H11 H21 ] .
Compute theLDLT factorization H11=L11Q11LT11. (DiscardH11.)
Solve L11Q11LT21=H21T for L21, i.e. L21=H21L−11TQ−
1 11. (Discard H21). It follows H= [ L11 L21 Im−k ] [ Q11 S ] [ LT11 LT21 Im−k ] , where S =H22−H21H11−1H21T,
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner
The preconditioner c.ed
3. Compute the Cholesky factorization
[ L11 L21 ] of H limited to [ H11 H21 ] .
Compute theLDLT factorization H11=L11Q11LT11. (DiscardH11.)
Solve L11Q11LT21=H21T for L21, i.e. L21=H21L−11TQ−
1 11. (Discard H21). It follows H= [ L11 L21 Im−k ] [ Q11 S ] [ LT11 LT21 Im−k ] , where S =H22−H21H11−1H21T,
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner
The preconditioner c.ed
4. Set
Q22=diag(S) =diag(H22)−diag(L21Q11LT21)
and P = [ L11 L21 Im−k ] | {z } L [ Q11 Q22 ] | {z } Q [ LT11 LT21 Im−k ] | {z } LT
The algorithm for constructing P has some good properties:
it cannot break down in exact arithmetic;
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner
Storage and computational cost
The complete diagonal of H is required.
If it is not available andH =AΘAT:
(H)ii =∥ATei∥22, i = 1, . . . ,m
Storage: one (sparse) vectorATei at a time and a vector for the
diagonal of H.
The firstk columns of H are computed and stored:
Hei, i = 1, . . . ,k
The additional cost of this step isk products ofH times a vector.
The productsHei are cheap if H (orA) is sparse.
Thek productsHei are expected to be cheaper than the products Hv
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner
Factorized form of
P
−1 By P = [ L11 L21 Im−k ] [ Q11 Q22 ] [ LT11 LT21 Im−k ] ,it follows P−1 = [ L−11T −L−11TLT21 0 Im−k ] [ Q11−1 0 0 Q22−1 ] [ L−111 0 −L21L−111 Im−k ]i.e. a factorized sparse approximation for H−1.
Letting R = [ L11 L21 Im−k ] [ Q111/2 Q221/2 ] we haveP =RTR.
P−1H is similar to the block diagonal matrix
[
Ik 0 0 Q22−1S
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner
Factorized form of
P
−1 By P = [ L11 L21 Im−k ] [ Q11 Q22 ] [ LT11 LT21 Im−k ] ,it follows P−1 = [ L−11T −L−11TLT21 0 Im−k ] [ Q11−1 0 0 Q22−1 ] [ L−111 0 −L21L−111 Im−k ]i.e. a factorized sparse approximation for H−1.
Letting R = [ L11 L21 Im−k ] [ Q111/2 Q221/2 ] we haveP =RTR.
P−1H is similar to the block diagonal matrix
[
Ik 0 0 Q22−1S
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner
Spectral analysis of
P
−1H
k eigenvalues ofP−1H are equal to 1.
The other eigenvalues are eigenvalues ofQ22−1S and
λ(Q22−1S) ≥ λmin(S) λmax(Q22) ≥ λmin(H) λmax(diag(S)) λ(Q22−1S) ≤ λmax(S) λmin(Q22) ≤ λmax(H22) λmin(diag(S))
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . The preconditioner
Reordering of
H
A “greedy” heuristic technique acts on the largest eigenvalues of H.
Since H is SPD, λmax(H)≤tr(H) =tr(H11) +tr(H22). IfQ22=I, then P−1H is similar to [ Ik 0 0 S ] ,and λmax(P−1H)≤tr ([ Ik 0 0 S ]) =k+tr(S).
Permuting rows and columns ofH so that H11 contains the k largest
elements of diag(H) would imply
k+tr(S)≪tr(H)
and a large reduction in the value ofλmax(P−1H) with respect to
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Deflated CG
Handling small eigenvalues
Applying the greedy technique requires no extra storage.
In most cases, the “greedy” reordering takes care of the largest
eigenvalues of H andκ2(R−1HR−T)is reduced considerably with
respect to κ2(H).
On the other hand,the smallest eigenvalues of H are sligtly modified
or moved towards the origin.
When the convergence of CG (or CG-like) method is hampered by a
small number of eigenvalues of P−1H close to zero, the
Preconditioned Deflated-CG or CG-like algorithm can be useful,
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Deflated CG
Preconditioned Deflated-CG
Let the eigenvalues ofP−1H be labeled in increasing order:
λ1(P−1H)≤ · · · ≤λm(P−1H).
Ideal case: Injectl exact eigenvectors ofP−1H associated to
λ1(P−1H), . . . , λl(P−1H), into the Krylov subspace . ∥x∗−xj∥H ≤2 (√µ−1 √µ+ 1 )j ∥x∗−x0∥H, µ= λm(P −1H) λl+1(P−1H)
Therefore, convergence of CG method is improved if a few
eigenvalues are close to the origin and well separated from the others.
If thel eigenvectors of P−1H are numerically approximated, one can
expect
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Deflated CG
Preconditioned Deflated-CG c.ed
Apply Deflated-CG to the split-preconditioned system
R−THR−1y=R−Tb, x =R−1y
using a few eigenvectors associated to the smallest eigenvalues of
R−THR−1
Symmetric Lanczos processes for sparse symmetric eigenvalue
problems require products of R−THR−1 times a vector. Each product has the cost of one preconditioned PCG iteration.
To amortize the cost of approximating eigenvectors, Preconditioned
Deflated-CG is suitable for solving systems with multiple right-hand
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Numerical experiments
We implemented the preconditioner in Matlab,ϵm = 2.10−16.
Initial guess for PCG: x0 = (0, . . . ,0)T.
Stopping criterion: ∥Hxj −b∥2≤10−6∥b∥2 .
A failure is declared after 1000 iterations.
H =AAT, 35 matricesAfrom the University of Florida
Sparse Matrix Collection, Groups: LPnetlib, Meszarosfor Linear Programming problems.
1090≤m≤105127
2.20 10−5 ≤dens(A)≤6.50 10−3, 5.51 10−5 ≤dens(H)≤
2.51 10−1
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Numerical experiments c.ed
Experiments with SAINV preconditioner
H−1 ≈ZD−1ZT
whereZ is unit upper triangular,D is diagonal.
Code from Sparselabpackage developed by M. Tuma.
First drop tolerance tested: 10−1.
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Cost Comparison
Tabella : Cost of the construction and application of LMP and SAINV.
Type Construction Application LMP m sparse-to-sparse products Θ1/2(ATe
i) 2 backsolves withL11
k sparse-to-sparse productsAΘ(ATei) 1 mat-vec product withD−1
m−k backsolves withL11 m−k scalar products inRk
m−k scalar products inRk k scalar products inRm−k
SAINV m sparse-to-sparse productsAΘ(ATv) 2 mat-vec products with Z
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Comparison between LMP(50) and LMP(100)
LMP(100) outperforms LMP(50) in terms of PCG iterations.
1 1.5 2 2.5 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ πs ( τ )
Performance profile,execution time
LMP(50) LMP(100)
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Comparison between LMP(50) and SAINV
SAINV solved 21 systems.Performance profile on the tests successfully solved by all preconditioners.
1 2 3 4 5 6 7 8 9 0 0.2 0.4 0.6 0.8 1 τ πs ( τ )
Performance profile, CG iterations
LMP(50) SAINV 2 4 6 8 10 12 14 16 0 0.2 0.4 0.6 0.8 1 τ πs ( τ )
Performance profile,execution time
LMP(50) SAINV
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Preconditioner density
0 5 10 15 20 25 10−4 10−2 100density of H and of the factors L and Z
L Z H 0 5 10 15 20 25 10−4 10−2 100
density of the factors L and L−1
L
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Experiments with Preconditioned Deflated-CG
A few eigenvectors ofR−THR−1. are computed by the Matlab
package PROPACK [R.M. Larsen, 1998].
The symmetric Lanczos algorithm with partial reorthogonalization is applied.
A loose accuracy for the convergence criterion, 10−1, is fixed along
with aspecified maximum dimension, DIM L, of the Lanczos basis
allowed.
The number of products of matrix-vector products is at most DIM L.
In the Preconditioned Deflated-CG we injected the estimated eigenvectors.
If convergence was not achieved, the vectors associated with eigenvalues smaller than a prescribed tolerance are selected.
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Solution of a single system
Prec. Prec.
H P−1H Defl-CG CG
Test name λmax λmin λmax λmin IT L IT L
lp d2q06c 1.27e6 6.37e-4 6.48e0 3.39e-5 278 338 lp pilot 1.10e5 1.55e-2 1.22e1 2.58e-4 160 264 lp pilot87 1.01e6 1.52e-2 2.22e1 2.01e-4 250 294 lp stocfor2 1.60e6 1.98e-3 7.71e0 1.17e-6 97 144 lpi bgindy 8.97e3 4.07e-2 5.55e0 8.29e-3 38 53 ge 1.89e8 4.90e-5 1.21e1 8.78e-7 41 58 nl 8.26e4 7.00e-3 7.30e0 1.61e-4 388 441 scrs8-2c 1.85e3 3.49e-5 5.39e1 8.32e-5 102 140
Preconditioner formed with k = 50
Number of small eigenvalues estimated: 5
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Sequences of normal equations from least-squares
problems
Sequences of normal equations arise in the solution of constrained and unconstrained least-squares problems. If the coefficient matrices
vary slowly, apreconditioner freeze strategy for LMP coupled with
Deflated-CGLS can be used.
We solved the Nonnegative Linear Least-Squares problems
min x≥0 1 2∥Bx−d∥ 2 2,
B full rank, by the interior Newton-like method [Bellavia, Macconi,
Morini, NLAA 2006].
The trial step at jth nonlinear iteration solves
min p∈IRn ( BSj Wj ) p+ ( Bxj −d 0 ) 2 2 ,
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
LMP in NNLS
The matrix of the normal equation is
Hj =AjATj , Aj = (
SjBT Wj )
, j = 0,1, . . .
whereSj andWj are matrices with entries in (0,1] and [0,1]
respectively.
We solve the sequence of linear systems with a frozen preconditioner.
For a seed matrix, sayH0, we form the LMP preconditioner and
computel approximate eigenvectors associated to the smallest
eigenvalues.
We reuse the preconditioner and the eigenvectors troughout the nonlinear iterations until the preconditioner deteriorates, i.e. the limit of CGLS iterations is reached.
Then, the LMP preconditioner andl eigenvectors are refreshedfor the current matrix.
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
LMP(100), 5 small eigs estimated, Lanczos basis dim.: 50
Prec. Defl-CGLS Prec. CGLS
Test IT NL(R) IT L IT NL(R) IT L Savings in mat-vec prod. lp pilot87 27(1) 3639 30(1) 6023 36% lp ken 11 14 512 19 720 12% lp ken 13 14 485 19 881 31% lp ken 18 24 1937 18 2449 14% lp pds 10 11 607 11 834 15% lp pds 20 13 1629 13 1877 9% lp truss 13 512 14 951 34% deter3 23 1441 28 1910 16% deter5 13 844 26 1939 51% deter7 18 1242 21 2050 33% fxm2-16 33(3) 8686 47(2) 10771 17% ge 35(3) 8425 34(3) 10021 13% nl 28(5) 7376 32(6) 10891 30% scrs8-2c 17 163 *
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Final comments
Work in progress:We are using LMP preconditioner in the solution of linear systems
arising inElectrostatic and Electromagnetic problems, in cooperation
with A. Tamburrino, S. Ventre, University of Cassino.
The matrix H is s.p.d. can be decomposed as
H=Hfar +Hnear, -Hnear is available and includes the diagonal of H
-Hfar is not available, the action of Hfar on a vector can be (approximated) computed.
S. B., J. Gondzio, B. Morini, A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems , SISC in corso di stampa. J. Gondzio, Interior point methods 25 years later, EJOR (2012)
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Final comments
Work in progress:We are using LMP preconditioner in the solution of linear systems
arising inElectrostatic and Electromagnetic problems, in cooperation
with A. Tamburrino, S. Ventre, University of Cassino.
The matrix H is s.p.d. can be decomposed as
H=Hfar +Hnear, -Hnear is available and includes the diagonal of H
-Hfar is not available, the action of Hfar on a vector can be (approximated) computed.
S. B., J. Gondzio, B. Morini, A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems , SISC in corso di stampa. J. Gondzio, Interior point methods 25 years later, EJOR (2012)
... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . .... . .... . ... . .... . ... . .... . .... . Numerical results
Final comments
Work in progress:We are using LMP preconditioner in the solution of linear systems
arising inElectrostatic and Electromagnetic problems, in cooperation
with A. Tamburrino, S. Ventre, University of Cassino.
The matrix H is s.p.d. can be decomposed as
H=Hfar +Hnear, -Hnear is available and includes the diagonal of H
-Hfar is not available, the action of Hfar on a vector can be (approximated) computed.
S. B., J. Gondzio, B. Morini, A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems , SISC in corso di stampa. J. Gondzio, Interior point methods 25 years later, EJOR (2012)