Parallel preconditioning for time-dependent PDE problems

(1)

Parallel preconditioning for

time-dependent PDE problems

Andy Wathen

(2)

Krylov subspace methods

For self-adjoint problems/symmetric matrices, iterative methods of choice exist: conjugate gradients for SPD, MINRES otherwise

but many possible methods for non-self-adjoint

problems/nonsymmetric matrices: GMRES , BICGSTAB , QMR , IDR , . . .

(3)

Krylov subspace methods

For self-adjoint problems/symmetric matrices, iterative methods of choice exist: conjugate gradients for SPD, MINRES otherwise

but many possible methods for non-self-adjoint

problems/nonsymmetric matrices: GMRES , BICGSTAB , QMR , IDR , . . .

For almost all need preconditioning Preconditioner P _{such that}

(4)

Krylov subspace methods

Arises because of convergence guarantees:

• for symmetric matrices: descriptive convergence

bounds based on eigenvalues ⇒ a priori estimates of iterations for acceptable convergence; good

preconditioning ensures fast convergence.

• for nonsymmetric matrices: by contrast, to date there are no generally applicable and descriptive

convergence bounds even for GMRES ; for any of the other nonsymmetric methods without a minimisation

property, convergence theory is extremely limited ⇒ no good a priori way to identify what are the desired

qualities of a preconditioner

(5)

Nonsymmetric problems

Lu = f

where

hLu, vi 6= hu, Lvi h·, ·i is any inner product

Classic examples:

• convection-diffusion (or most odd/even derivative problems)

(6)

Time-dependent problems: ODEs

y′ = ay + f, y(t0) = y0 discretise: e.g. yk+1 − yk τ = θay k+1 _{+ (1 − θ)ay}k _{+ f}k_, _y0 _{= y} 0, k = 0, 1, . . . , ℓ with ℓτ = T gives B       y1 y2 y3 .. . yℓ       | {z } y =       τ f1 + (1 + a(1 − θ)τ )y0 τ f2 τ f3 .. . τ fℓ       | {z } f ,

(7)

where the ℓ × ℓ coefficient matrix B is       b c b c b . . . ... c b       , b = 1 − aθτ, c = −1 − a(1 − θ)τ. i.e. B is a bidiagonal Toeplitz matrix.

(8)

Now use Pestana & W, 2015:

If B _{is a real Toeplitz matrix then}

     a₀ a₋₁ · · a_1−n a₁ a₀ a₋₁ · · · a₁ a₀ · · · · · · a₋₁ a_n−1 · · a₁ a₀      | {z } B      0 0 · 0 1 0 · 0 1 0 · 0 1 0 · · · · · · 1 0 · 0 0      | {z } Y

is the real symmetric (Hankel) matrix      a_1−n · · a₋₁ a₀ · · a₋₁ a₀ a₁ · · a₀ a₁ · a₋₁ · · · · a₀ a₁ · · a_n−1     

(9)

Thus MINRES can be robustly applied to BY _{— it is}

symmetric but generally indefinite — and its convergence will depend only on eigenvalues.

BUT preconditioning? – needs to be symmetric and positive definite for MINRES

Based on well-know Circulant approximations, C_{, which are}

diagonalised by an FFT in O(n log n) work: C _{= F}⋆ΛF_{, use}

|C| = F⋆|Λ|F

(10)

Theorem (Pestana & W, 2015)

|C|−1BY _{= J + R + E}

where J _{is real symmetric and orthogonal with eigenvalues} ±1, R _{is of small rank and} E _{is of small norm}

⇒ guaranteed fast convergence because MINRES convergence only depends on eigenvalues which are clustered around ±1 except for few outliers!

(11)

Theorem (Pestana & W, 2015)

|C|−1BY _{= J + R + E}

where J _{is real symmetric and orthogonal with eigenvalues} ±1, R _{is of small rank and} E _{is of small norm}

⇒ guaranteed fast convergence because MINRES convergence only depends on eigenvalues which are clustered around ±1 except for few outliers!

For the ODE problem (τ = 0.2, a = −0.3,θ = 0.8):

ℓ κ(B) Iterations

(12)

Multistep method: BDF2

yk+1 − 4₃yk + 1₃yk−1 τ = 2 3ay k+1 ₊ 2 3f k+1 ,

with y0 = y0 and y−1 = y₋₁ leads to the monolithic or

all-at-once system B       y1 y2 y3 .. . yℓ       | {z } y =        2 3τ f 1 ₊ 4 3y 0 ₋ 1 3y−1 2 3τ f 2 ₋ 1 3y 0 2 3τ f 3 .. . 2 3τ f ℓ        | {z } f

(13)

where the coefficient matrix B is        1 − 2₃aτ −4₃ 1 − 2₃aτ 1 3 − 4 3 1 − 2 3aτ . . . . 1 3 − 4 3 1 − 2 3aτ        . Same approach: ℓ κ(B) Iterations

(14)

Thus we have proved and observed fast convergence for

|C|−1BY_{y = |C|}−1_{f .}

To observe if it is more generally

Krylov friendly

(Tim Kelley),

we try GMRES simply for

C−1B_{y = C}−1_f

(15)

PDEs: diffusion problem

u_t = ∆u + f in Ω × (0, T ], Ω ⊂ _R2 _{or R}3,

u = g on ∂Ω,

u(x, 0) = u0(x) at t = 0

Discretize - finite elements, mesh size h, and n spatial dofs:

M uk − uk−1

τ + Kuk = fk, k = 1, . . . , ℓ,

(M: mass matrix, K: stiffness matrix) or

A_BEx :=     A₀ A₁ A₀ . . . ...         u1 u2 .. .     =     Mu0 + τ f1 τf2 .. .     ,

(16)

We use the block circulant preconditioner PBE :=     A₀ A₁ A₁ A₀ . . . ... A₁ A₀     .

Theorem (McDonald, Pestana & W, 2018)

P_BE−1 ABE is diagonalisable, has (ℓ − 1)n eigenvalues of 1

(17)

0 200 400 600 800 Real( λ ) 0.8 1 1.2 1.4 1.6 0 200 400 600 800 Imag( λ ) ×10-13 -4 -2 0 2 4

(18)

Kronecker Product form

If Σ =     0 1 0 . . . ... 1 0     , C =     0 1 1 0 . . . ... 1 0     = U ΛU ∗_, then ABE = Iℓ ⊗ A0 + Σ ⊗ A1, PBE = Iℓ ⊗ A0 + C ⊗ A1,

and using (W ⊗ X)(Y ⊗ Z) = (W Y ⊗ XZ)

PBE = Iℓ⊗A0+C⊗A1 = (U ⊗In)[Iℓ⊗A0+Λ⊗A1](U∗⊗In)

so

(19)

P_BE−1 = (U ⊗ In)[Iℓ ⊗ A0 + Λ ⊗ A1]−1(U∗ ⊗ In)

so P_BE−1 r requires

• multiplication of r by U ⊗ In and U∗ ⊗ In; parallel over

n processors?

• inversion of the block diagonal matrix I_ℓ ⊗ A0 + Λ ⊗ A1:

easier if A_i are symmetric as for diffusion, but could use AMG when A_i are non symmetric.

(20)

Heat Equation

Times (seconds) for solving P−1_{AU = P}−1_b _{with GMRES}

(tol = 10−5_). _p _{is the number of processors.}

ℓ n p = 1 p = 2 p = 4 p = 8 p = 16 p = 32 320 _77.72 _29.26 _15.32 _8.95 _5.11 _3.34 768 512 _152.64 _57.54 _32.71 _17.52 _11.54 _6.69 768 _245.47 _97.77 _50.81 _30.71 _16.66 _9.65 320 _146.67 _54.68 _28.40 _17.059 _10.35 _6.07 1024 512 _265.22 _107.07 _60.86 _34.13 _20.40 _11.75 768 _459.12 _198.94 _101.23 _55.85 _28.55 _16.12 320 _325.14 _124.67 _63.64 _39.78 _22.74 _13.06 1440 512 _646.81 _239.65 _123.44 _72.44 _40.95 _22.50 768 _979.85 _432.46 _215.77 _114.99 _59.80 _32.41 1440 1568 _2119.91 _815.93 _431.13 _218.24 _118.62 _63.30

(21)

Wave Equation

Times (seconds) for solving R−1_BD2CBD2U = R−1_BD2bBD2

with GMRES (tol = 10−5_). _p _{is the number of processors.}

ℓ n p = 1 p = 2 p = 4 p = 8 p = 16 p = 32 320 _79.07 _31.29 _16.20 _9.53 _6.11 _4.36 768 512 _163.68 _61.33 _34.33 _19.76 _11.54 _7.09 768 _251.37 _100.99 _53.14 _27.31 _14.64 _11.09 320 _153.37 _53.39 _30.47 _20.99 _12.38 _7.39 1024 512 _287.23 _119.72 _65.84 _39.81 _23.23 _12.08 768 _497.12 _222.93 _115.24 _60.84 _32.46 _18.01 320 _328.17 _125.71 _65.64 _41.70 _23.32 _14.21 1440 512 _680.15 _243.53 _124.65 _73.92 _41.42 _24.51

(22)

In a similar manner for multistep methods: A :=          A₀ A₁ A₀ .. . . . . ... A_p . . . ... . . . A₁ A₀ A_p · · · A1 A0          , P = (U ⊗ In)G(U∗ ⊗ In),

(23)

Can use Y :=    I_n · I_n I_n    = Y ⊗In, Y =   1 · 1   as before

to symmetrize any block Toeplitz matrix with symmetric blocks and use a SPD absolute value preconditioner as before.

When A_i are not symmetric (e.g. convection-diffusion problems) GMRES/FGMRES are necessary

(24)

Theory: MINRES for |P|−1_YA _{guaranteed to converge in a}

number of iterations independent of ℓ

Practice:

• very few MINRES iterations required

• GMRES with P does better, but no guarantee!

(25)

Numerics: Heat Eqn, Backwards Euler

n ℓ _DoF _GMRES _P−1A MINRES |P|−1YA _FGMRES P−1

MGA 289 ₂4 4624 3 11 8 26 18496 3 13 8 28 73984 3 15 8 210 295936 3 19 8 212 1183744 3 18 7 214 4734976 3 16 7 1089 ₂4 17424 3 10 8 26 69696 3 13 8 28 278784 3 14 8 210 1115136 3 18 8 212 4460544 3 20 7 214 17842176 3 19 6 4225 ₂4 67600 3 10 15 26 270400 3 11 16

(26)

Numerics: Heat Eqn, BDF2

n ℓ _DoF _GMRES _P−1A MINRES |P|−1YA _FGMRES P−1

MGA 289 ₂4 4624 3 13 7 26 18496 3 16 8 28 73984 3 19 8 210 295936 3 21 7 212 1183744 3 24 7 214 4734976 3 22 6 1089 ₂4 17424 3 13 8 26 69696 3 15 8 28 278784 3 18 8 210 1115136 3 22 7 212 4460544 3 24 7 214 17842176 3 25 6 4225 ₂4 67600 3 11 15 26 270400 3 13 16 28 1081600 3 18 16 210 4326400 3 21 17 212 17305600 3 24 17 214 69222400 3 25 16

(27)

Numerics: Convection-diffusion, BE

n ℓ _DoF _GMRES _P−1A FGMRES P−1

MGA 289 ₂4 4624 13 12 26 18496 13 12 28 73984 13 12 210 295936 13 12 212 1183744 13 12 214 4734976 13 12 1089 ₂4 17424 12 12 26 69696 13 12 28 278784 13 12 210 1115136 13 12 212 4460544 13 12 214 17842176 13 12 4225 ₂4 67600 12 22 26 270400 12 22

(28)

References and Acknowledgement

McDonald, E, Pestana, J. & Wathen, A.J., 2018,

‘Preconditioning and iterative solution of all-at-once systems for evolutionary partial differential equations’,

accepted for publication in SIAM J. Sci. Comput.

Goddard, A & Wathen, A.J., ‘A note on parallel preconditioning for all-at-once evolutionary PDEs’,

submitted to ETNA, 2018.

Pestana, J. & Wathen, A.J., 2015,

‘A preconditioned MINRES method for nonsymmetric Toeplitz matrices’, SIAM J. Matrix Anal. Appl. 36, pp. 273–288.

Wathen, A.J., 2015, ‘Preconditioning’,

Acta Numerica (2015) 24, pp. 329–376.