• No results found

The Scaling and Squaring method for computing e X

5.3 Exponential Integrators with only Matrix-Vector Multiplication

5.3.1 The Scaling and Squaring method for computing e X

The Pad´e and Taylor series approximations for the matrix exponential eX have compu-tational cost and roundoff errors that increase with the normkX k and the spread of its eigenvalues [118, 120]. However, for small normkX k or small spectral radius ρ(X ), which is the largest absolute value of the eigenvalues of X , these are two of the most

5.3 Exponential Integrators with only Matrix-Vector Multiplication 98

effective approximations both in accuracy and computational cost. Based on the iden-tity eX = (eX/τ)τ, a matrix is first scaled so that its norm or spectral radius is small and its approximation is then raised to the same power. The scaling and squaring method of [119, 120] uses a scaleτ= 2s, with some positive integer s, so that the Pad´e approx-imation of eX/2s can be raised to the same power via s successive squarings.

The parameter s is chosen so that 2−sXhas a norm of order 1. Then, an appropriate order diagonal Pad´e approximation (i.e. Rl(2−sX) = Dl(2−sX)−1Nl(2−sX)) is chosen to guar-antee specified backward error bounds on the exponential; see Appendix C for explicit expressions of Dl(·) and Nl(·). For an appropriately scaled matrix X , it is shown [120, Thm. 2.1] that, with Rl(2−sX)2s = eX+∆X, the backward error ∆X in the scaling and squaring computation can be bounded by

k∆Xk

kX k− log(1 − f (θ))

θ , (5.17)

where θ = k2−sAk, f (θ) := ∑

k=2l+1

|cii and ci are known exactly from the Pad´e ex-pansion and the scaling is such that the power series converges [119, (2.2) and (2.6)].

The backward error for algorithms is defined as follows [121, Sec. 1.5]. Assume an al-gorithm computes the approximation ˆy for y= f (x), where f (·) is some function. Then the backward error is defined as

min{k∆xk| ˆy = f (x + ∆x)}.

In other words, it is the smallest perturbation in the data x whose exact solution is the same as the computed solution ˆy. The forward error is the error in the computation y− ˆy.

Therefore, if we require a given accuracy k∆kXkXk≤ tol (for example, double precision level accuracy tol= 2−53), upper bounds onθ for a range of orders l can be computed offline and stored asθl in algorithm design.

Generatingθl for arbitrary tolerances would require symbolic computation to get reli-able values. The authors of [113] generated them using symbolic computations with a zero-finder for the polynomial ∑150k=l+1|ckk−1 = tol, a 250 decimal place arithmetic was used with tol equal to the unit roundoff level (or eps) of IEEE half, single, dou-ble and quadruple precision floating point arithmetic. Since symbolic computations are very expensive, we have found it impractical to generate these parameters for an arbi-trary tolerances. We have therefore opted to use the values generated by the authors of [113].

In an optimal algorithm implementation, an important element in choosing s and l for a given error bound tol is to use a combination that minimizes the cost of computation.

5.3 Exponential Integrators with only Matrix-Vector Multiplication 99

Let C(l, s) represent the cost in number of matrix-matrix multiplications. Then, it can be shown that C(l, s) =πl+ s, whereπl(≈ l/2) represents the number of matrix-matrix multiplications required to evaluate Dl(X ) and Nl(X ) such that Dl(2−sX)Rl(2−sX) = Nl(2−sX). The number of squarings needed is shown to be s = max{⌈log2kX k/θl⌉, 0}, whereθl are maximal values thatk2−sXk can take such that the backward error bound

k∆Xk

kXk < tol = 2−53, is still guaranteed.

It is also shown in [120] that l= 13 is the optimal degree that achieves the least number of multiplications s for a backward error bounded by the IEEE double precision unit roundoff — l= 7 and l = 17 are also shown to be optimal for the IEEE single (tol = 2−24≈ 6.0 × 10−8) and quadruple precision arithmetic eps (tol= 2−105≈ 2.5 × 10−32), respectively. The version of the scaling and squaring method with this latest error analy-sis is implemented as the function expm in Matlab versions 2010a and newer. Although any matrix p−norm, p ∈ {1, 2, ∞}, will work in the above analysis, the 1-norm is used for its relative computational ease (kX k1:= max

1≤ j≤n

Of course, there is also a fixed cost associated with solving the linear equation Dl(·)Rl(·) = Nl(·), where Dl(·), Nl(·) are evaluated as given in [120, Eq. 3.4-3.6]; Dl is also shown to be non-singular under some mild conditions (e.g. in the necessary case ofk2−sXk ≤ θl.)

However, exponential integrators require not the matrix exponential, but its action on a vector. Although the improved squaring and scaling method of [120] is presented as possibly the best choice to compute the matrix exponential, it does not solve the need to compute the action on a vector efficiently. Figure 5.1a shows the sparsity of the dynamics matrix from a model of the metal slab cooling bilinear control sys-tem in Section 5.5. Considering a Pad´e approximation of order l = 13, we can see from Figure 5.1b that the linear systems that the approximation needs to solve (i.e.

Dl(2−sX)Rl(2−sX) = Nl(2−sX) are full. Moreover, the solution Rl(2−sX) ends up 100%

full. In other words, even for the sparsest problems coming from finite difference mod-els of PDEs, the matrix exponential ends up being full and its computational cost is O(n3). This is undesirable for large systems. Where the new modified scaling and squaring method has to be used (for example, to find the exponential of the Hessenberg matrix within Krylov subspace methods), we refer the reader to [123] for a discussion on efficient computation of the Pad´e approximations and the corresponding computa-tional cost analysis. Below we consider alternative methods that exploit sparsity in the problem.

5.3 Exponential Integrators with only Matrix-Vector Multiplication 100

(a) 2−sXhas 4.6% non-zero elements

0 50 100

Figure 5.1: The sparsity pattern for the sparse dynamic matrix from a bilinear PDE and the sparsity of the matrix Dl(2−sX), l = 13 solved by the Pad´e approximation.

5.3.2 An efficient scheme by Al-Mohy and Higham [113] for computing eXy