Given n∈ N, A ∈ Rn×n, and B∈ Rn×m, we consider the following n× n Lyapunov equation
AX+ X AT = −BBT. (2.17)
In the vectorized form, (2.17) is equivalent to the following linear system
A vec(X ) = (I ⊗ A + A ⊗ I)vec(X ) = −(B ⊗ B)vec(Im), (2.18)
whereA = I ⊗ A + A ⊗ I. Using the spectral properties of the Kronecker sum, we obtain that (2.17) has a unique solution if and only ifλi+ λj= 0,∀λi,λj∈ λ(A), which we assume
in the following. By transposing the whole equation (2.17), we see that both X and XT are solutions, implying that the solution is necessarily symmetric. Furthermore, if A∈ Rn×nis stable (i.e. the spectrum lies in the left half of the complex planeλ(A) ⊂ C−), the solution X can be represented in the following way
X=
∞
0
eAτB BTeATτdτ,
2.2. Lyapunov equations
In the following, we discuss the conditions that ensure low-rank structure in X and present an approach that exploits this to efficiently solve large-scale Lyapunov equations. Finally, we describe the role of Lyapunov equations in model reduction of linear dynamical systems with control, one of the most important applications.
2.2.1 Low-rank solutions of Lyapunov equations
We have shown that if A is stable, (2.17) has a unique positive semidefinite solution X . Addi- tionally, it has been shown in [Sab06, KT10] that if A is symmetric, then X exhibits a singular value decay if m n. More precisely, there exists a matrix Xk∈ Rn×nof rank km such that
X − Xk F≤ 8 B F λmax(A) exp −kπ2 log(8κ(A) ,
whereκ(A) is the condition number of A. This implies that X has an exponential eigenvalue decay: λk(X )γk, withγ = exp −π2 m log(8κ(A)) ,
whereλk(X ) denotes the k-th largest eigenvalue of X . We see that the decay rate deteriorates
and vanishes asκ(A) → ∞. This issue has been resolved in [GK14], where the authors show for certain situations that asκ(A) → ∞ the eigenvalue decay becomes exponential with respect to
k, instead of k:
λk(X )γ
k, withγ = exp− π/2m).
2.2.2 Solving large-scale Lyapunov equations
For n 5000, a classical approach to solving (2.17) is using a direct method, such as the Bartels-Stewart algorithm [BS72], which requiresO(n3) operations. For larger values of n, these methods are not computationally feasible as they require the Schur decomposition of
A. Instead, various iterative approaches have been proposed, that achieve computational
advantage by exploiting sparsity in A and the low-rank structure in the solution. In the following, we follow [Pen00] and describe one of the most popular approaches, the alternating direction implicit (ADI) iteration.
In the ADI method, the solution X is generated as a limit of the iterates Xi, defined in the
following way:
(A+ piI )Xi−1/2 = −BBT− Xi−1(AT− piI ),
(A+ piI )XiT = −BBT− XiT−1/2(AT− piI ),
Chapter 2. Preliminaries
iteration step
Xi= (A−piI )(A+piI )−1Xi−1(AT−piI )(AT+piI )−1−2pi(A+piI )−1B BT(AT+piI )−1. (2.19)
It can be shown that the errors Ei= X − Xisatisfy the following expression Ei= (ri(A)ri(−A)−1)E0(ri(A)ri(−A)−1)T,
where riis the polynomial ri(x)= (x − p1I )···(x − p2I )· ··· · (x − piI ). Thus, to ensure conver-
gence, the shifts p1, p2, . . . need to be chosen in a way that will guarantee ri(A)ri(−A)−1≈ 0.
Assuming that A is diagonalizable, minimizing the spectral radius of ri(A)ri(−A)−1leads to
the following ADI minimax problem {p1, . . . , pi}= argmin p1,...,pi∈C− max x∈λ(A) |ri(x)| |ri(−x)| , (2.20)
which indicates criteria for choosing the shifts. As the spectrumλ(A) is usually not available, in practice, (2.20) is often relaxed by replacingλ(A) with E (compact subset of C such that
λ(A) ⊂ E): {p1, . . . , pi}= argmin p1,...,pi∈C− max x∈E |ri(x)| |ri(−x)| . (2.21)
The relaxed ADI minimax problem has been solved exactly (see [Wac63]) only for the case of symmetric A. For the general case, several heuristic strategies for choosing close to optimal shifts have been proposed, see, e.g. [Pen00, Wac88, FG13].
The ADI method can be implemented in a way that exploits positive definiteness in X as well as the low-rank structure in X described in Section 2.2.1. In the low-rank version of the ADI method (LR-ADI), the iterates are substituted by their Cholesky decompositions Xi= ZiZiT,
while the iteration step (2.19) can be written in the following way
Zi= [(A − piI )(A+ piI )−1Zi−1
−2pi(A+ piI )−1B ],
with Z1=
−2pi(A+ p1I )−1B . A drawback of LR-ADI is that the memory requirements and
the computational cost per iteration are increasing with each iteration, since the low-rank factor Zi is enlarged by m in each iteration (rank(Zi)≤ mi, where m = rank(B)). However,
in practice, LR-ADI is an efficient method since the required number of iterations is usually low. Furthermore, the effect of this drawback can be further reduced by performing low-rank truncation of the iterates.
Other popular methods for solving large-scale Lyapunov equations include the Rational Krylov projection method [HR92] and the extended Arnoldi method [Sim07]. In these methods, the approximate solution of the original Lyapunov equation is computed by projecting (2.17) onto k-dimensional (rational) Krylov subspaces. Solving the projected problem is equivalent to solving a small-scale k× k Lyapunov equation which can be solved efficiently using the
2.2. Lyapunov equations
Bartels-Stewart algorithm, since, in practice, we usually have k n. Projection techniques can also be used to accelerate the convergence of the ADI method. For example, in [BLT09], the Galerkin projection onto subspace Vk⊗Vk, where Vkis an orthonormal basis for the column
space of the current ADI iterateVk= range(Zk), is used for computing an approximate solution
of the form X= VkRkVk∗.
Remark 2.13. As shown in [HS95, KPT14], Krylov subspace methods for solving Lyapunov equations can be effectively preconditioned with a few steps of the ADI method. For example, one step of the ADI method with a single shift p defines the following preconditioner for (2.18)
PADI−1 = (A − pI)−1⊗ (A − pI). (2.22)
Finding the optimal shift p in (2.22) is equivalent to solving (2.21) with i = 1. As shown
in [Sta91], for the case of a symmetric A, the optimal shift p equalsλmax(A)λmin(A).
In a similar fashion, it is possible to derive a preconditioner for (2.18) based on the first steps
of the sign function iteration for Lyapunov equations [KPT14]. In particular, for = 1, this gives
rise to the following preconditioner Psign−1 = 1
2c(I⊗ I + c
2A−1⊗ A−1), (2.23)
with the scaling factor c=
A 2
A−1 2, which can be approximated using M 2≈ M 1 M ∞,
see, e.g., [SB08]. Other known choices of preconditioners for (2.18) include the classical Jacobi and SSOR preconditioning [HS95].
Remark 2.14. The ADI method can be extended to address generalized Lyapunov equations of the form
AX ET+ E X AT = −BBT,
where A, E∈ Rn×n, B∈ Rn×m, with E symmetric positive definite andλE − A a stable pencil.
Similarly as in LR-ADI, this extension can be formulated in terms of the low-rank Cholesky
factors Zi, which is also known as the generalized low-rank ADI [Sty08].
2.2.3 Lyapunov equation for Gramians of linear control systems
Suppose we are given the following continuous linear time-invariant dynamical system with control
x(t ) = Ax(t) + Bu(t),
y(t ) = C x(t),
with system matrices A∈ Rn×n, B ∈ Rn×m,C ∈ R×n, state vector x(t )∈ Rn, input control vector u(t )∈ Rm and output function y(t )∈ R. Furthermore, we assume that A is stable
Chapter 2. Preliminaries
given dynamical system, which can be very difficult to compute for very large values of n. To address this problem, we aim to find a reduced-order model
x(t) = Ax(t) + B u(t ),
y(t) = Cx(t),
with A∈ Rk×k, B∈ Rk×m, C∈ R×k,x(t) ∈ Rk,y(t) ∈ Rand k n.
Ideally, when reducing the state space, we would like to remove states that are either
• hard to reach: input energy to guide the system into the state is very high;
• hard to observe: output energy generated from system being in the state is very low.
This idea is implemented in the balanced truncation algorithm [Moo81, PS82], which pre- serves stability of the dynamical system and provides computable error bounds. In order to provide the reduced model, the balanced truncation algorithm relies upon computation of the controllability Gramian P and the observability Gramian Q which are defined as the unique symmetric positive semidefinite solutions P,Q∈ Rn×nof the following Lyapunov equations:
AP+ PAT = −BBT,
ATQ+Q A = −CTC .
Given the Cholesky decompositions of the computed Gramians P= PCTPCand Q= QCTQC, the
optimal projection bases W,V ∈ Rn×k are extracted as the dominant left and right singular vectors of PCQCT, respectively, while the resulting reduced-order model is constructed as
follows
A= WTAV, B= WTB, C= CV, x(t) = V x(t), and y(t) = Cx(t).