Performance Bounds for
Constrained Linear Stochastic Control
Yang Wang and Stephen Boyd Electrical Engineering Department
Stanford University
Outline
• constrained linear stochastic control problem
• some heuristic control schemes – projected linear control
– control-Lyapunov
– certainty-equivalent model predictive control (MPC)
• the linear quadratic case
• performance bound
• performance bound parameter choices for control schemes
• numerical examples
Linear stochastic system
• linear dynamical system with process noise:
x
t+1= Ax
t+ Bu
t+ w
t, t = 0, 1, . . . , – x
t∈ R
nis the state
– u
t∈ U is the control input
– U ⊂ R
mis the input constraint set, with 0 ∈ U
– w
t∈ R
nis zero mean IID process noise, E w
tw
Tt= W
• state feedback control policy:
u
t= φ(x
t), t = 0, 1, . . . ,
φ : R
n→ U is the state feedback function
Objective
• objective is average stage cost:
J = lim sup
T →∞
1 T E
T −1
X
t=0
(ℓ
x(x
t) + ℓ
u(u
t))
– ℓ
x: R
n→ R is state stage cost function – ℓ
u: U → R is the input state cost function
• ℓ
x, ℓ
u, U need not be convex
Stochastic control problem
• stochastic control problem: choose feedback function φ to minimize J
• infinite dimensional nonconvex optimization problem
• problem data:
– dynamics and input matrices A, B – distribution of process noise w
t– state and input cost functions ℓ
x, ℓ
u– input constraint set U
• φ
⋆denotes an optimal feedback function
• J
⋆denotes optimal objective value
‘Solution’ via dynamic programming
• find V
⋆: R
n→ R and α with V
⋆(z) + α = min
v∈U
(ℓ
u(v) + E V
⋆(Az + Bv + w
t)) (expectation is over w
t)
• optimal feedback function is then φ
⋆(z) = argmin
v∈U
(ℓ
u(v) + E V
⋆(Az + Bv + w
t))
• optimal value of stochastic control problem is J
⋆= α
Stochastic control problem
• generally very hard to solve
(even more: how would we represent a general function φ?)
• can be effectively solved
– when the problem dimensions are very small, e.g., n = m = 1 – when U = R
mand ℓ
x, ℓ
uare convex quadratic;
in this case optimal policy is linear: φ
⋆(z) = Kz
• many suboptimal methods have been proposed
– can evaluate J for a given φ via Monte Carlo simulation – but how suboptimal is it?
• this talk: an effective method for finding a (good) lower bound on J
⋆Projected linear state feedback
• a simple suboptimal policy:
φ
pl(z) = P(K
plz) – K
pl∈ R
m×nis a gain matrix (to be chosen) – P is projection onto U
• when U is a box, i.e., U = {u | kuk
∞≤ U
max}, reduces to saturated linear state feedback
φ
pl(z) = U
maxsat ((1/U
max)K
plz)
sat is (entrywise) unit saturation
Control-Lyapunov policy
• control-Lyapunov policy is φ
clf(z) = argmin
v∈U
(ℓ
u(v) + E V
clf(Az + Bv + w
t))
– V
clf: R
n→ R (which is to be chosen) is the control-Lyapunov function
– when V
clf= V
⋆, this is optimal policy
• when V
clfis quadratic, the control-Lyapunov policy simplifies to φ
clf(z) = argmin
v∈U
(ℓ
u(v) + V
clf(Az + Bv))
since E w
t= 0, and term involving E w
tw
tT= W is constant
Certainty-equivalent model predictive control (MPC)
• φ
mpc(z) is found by solving (possibly approximately) minimize P
T −1τ=0
(ℓ
x(˜ x
τ) + ℓ
u(v
τ)) + V
mpc(˜ x
T) subject to x ˜
τ+1= A˜ x
τ+ Bv
τ, τ = 0, . . . , T − 1
v
τ∈ U, τ = 0, . . . , T − 1
˜
x
0= z
– variables are v
0, . . . , v
T −1, ˜ x
0, . . . , ˜ x
T– V
mpc: R
n→ R is the terminal cost (to be chosen) – T is the planning horizon (also to be chosen)
• let solution be v
0⋆, . . . , v
T −⋆ 1, ˜ x
⋆0, . . . , ˜ x
⋆T• MPC policy is φ
mpc(z) = v
0⋆Parameters in heuristic control policies
• performance of suboptimal policies depends on choice of parameters (K
pl, V
clf, V
mpcand T )
• one choice for V
clf, V
mpc: (quadratic) value function for some unconstrained linear quadratic problem
• one choice for K
pl: optimal gain matrix for some unconstrained linear quadratic problem
• we will suggest some parameters later . . .
The performance bound
our method:
• computes a lower bound J
lb≤ J
⋆using convex optimization (hence is tractable)
• bound is computed for each specific problem instance
• (at this time) cannot guarantee tightness of bound
Judging a heuristic policy
• suppose we have a heuristic policy φ with objective J (evaluated by Monte Carlo, say)
• since J
lb≤ J
⋆≤ J, if J − J
lbis small, then – policy φ is nearly optimal
– bound J
lbis nearly tight
• if J − J
lbis big, then for this problem instance, either – policy is poor, or,
– bound is poor (or both)
• examples suggest that J − J
lbis often small
Unconstrained linear quadratic control
• can effectively solve stochastic control problem when – U = R
m(no constraints)
– ℓ
x(z) = z
TQz, ℓ
u(v) = v
TRv, Q 0, R 0
• optimal cost is J
lq⋆= Tr(P
lq⋆W )
• optimal state feedback function is φ
⋆(z) = K
lq⋆z, where K
lq⋆= −(R + B
TP
lq⋆B)
−1B
TP
lq⋆A
• P
lq⋆is positive semidefinite solution of ARE
P
⋆= Q + A
TP
⋆A − A
TP
⋆B(R + B
TP
⋆B)
−1B
TP
⋆A
Linear quadratic control via LMI/SDP
• can characterize J
lq⋆and P
lq⋆via the semidefinite program (SDP) maximize Tr (P W )
subject to P 0
R + B
TP B B
TP A
A
TP B Q + A
TP A − P
0 – variable is P
– optimal point is P = P
lq⋆; optimal value is J
lq⋆• solution does not depend on W , as long as W ≻ 0
• constraints are convex in (P, Q, R), so J
lq⋆(Q, R) is a concave function
of (Q, R)
Basic bound
• suppose Q 0, R 0, s satisfy
z
TQz + v
TRv + s ≤ ℓ
x(z) + ℓ
u(v) for all z ∈ R
n, v ∈ U i.e., quadratic stage costs are everywhere smaller than ℓ
x+ ℓ
v• then J
lq⋆(Q, R) + s is a lower bound on J
⋆• follows from monotonicity of stochastic control cost w.r.t. stage costs
• lefthand side is optimal value of unconstrained quadratic problem
Optimizing the bound
• can optimize the lower bound over Q, R, s by solving maximize J
lq⋆(Q, R) + s
subject to Q 0, R 0,
z
TQz + v
TRv + s ≤ ℓ
x(z) + ℓ
u(v) for all z ∈ R
n, v ∈ U
• a convex optimization problem – objective is concave
– constraints are convex
– last constraint is convex in Q, R, s for each z and v
• last constraint is semi-infinite, parameterized by the (infinite) set
z ∈ R
n, u ∈ U
Optimizing the bound
• semi-infinite constraint makes problem difficult in general
• can solve exactly in a few cases
• in other cases, can replace semi-infinite constraint with conservative
approximation, which still gives a lower bound
Quadratic stage cost and finite input set
• can solve optimization problem exactly when
– ℓ
x(z) = z
TQ
0z, ℓ
u(v) = v
TR
0v, Q 0, R 0 – U = {u
1, . . . , u
K} (finite input constraint set)
• constraint
z
TQz + v
TRv + s ≤ ℓ
x(z) + ℓ
u(v) for all z ∈ R
n, v ∈ U becomes
Q Q , u
TRu + s ≤ u
TR u , i = 1, . . . , K
• to optimize the bound we solve SDP (with variables P , Q, R, s) maximize Tr (P W ) + s
subject to P 0, Q 0, R 0, Q Q
0R + B
TP B B
TP A
A
TP B Q + A
TP A − P
0 u
TiRu
i+ s ≤ u
TiR
0u
i, i = 1, . . . , K
• monotone in Q, so we can set Q = Q
0w.l.o.g.
S-procedure relaxation
• suppose stage costs are quadratic
• suppose we can find R
1, . . . , R
Mand s
1, . . . , s
Mfor which U ⊆ ˜ U = {v | v
TR
iv + s
i≤ 0, i = 1, . . . , M }
• a sufficient condition for
z
TQz + v
TRv + s ≤ ℓ
x(z) + ℓ
u(v) for all z ∈ R
n, v ∈ U is
z
TQz + v
TRv + s ≤ z
TQ z + v
TR v for all z ∈ R
n, v ∈ ˜ U
• equivalent to Q Q
0and
v
TR
iv + s
i≤ 0, i = 1, . . . , M =⇒ v
TRv + s ≤ v
TR
0v
• which is implied by Q Q
0and the existence of λ
1, . . . , λ
M≥ 0 with
R − R
0M
X
i=1
λ
iR
i, s ≤
M
X
i=1
λ
is
i(by the S-procedure)
• so J
⋆(Q, R) + s is a still a lower bound on J
⋆• to optimize the bound we solve the SDP maximize Tr (P W ) + s
0subject to P 0, Q 0, R 0, Q Q
0R + B
TP B B
TP A
A
TP B Q + A
TP A − P
0 R − R
0P
Mi=1
λ
iR
i, s
0≤ P
Mi=1
λ
is
iλ
i≥ 0, i = 1, . . . , M
with variables P , Q, R, λ
1, . . . , λ
M, s
0, . . . , s
M• can set Q = Q
0w.l.o.g.
Suboptimal control policies
• optimizing the lower bound gives P
lb• can interpret Tr(P
lbW ) as optimal cost of an unconstrained quadratic problem that approximates (and underestimates) our problem
• suggests that
V
lb(z) = z
TP
lbz, and
K
lb= −(R
lb+ B
TP
lbB)
−1B
TP
lbA
are good choices of parameters for suboptimal control policies
• examples show this is the case
Numerical examples
• illustrate bounds for 3 examples
– small problem with trilevel inputs – large problem with box constraints – discretized mechanical control system
• compare lower bound with various heuristic policies – projected linear state feedback
– model predictive control
– control-Lyapunov policy
Small problem with trilevel inputs
• n = 8, m = 2
• A, B matrices randomly generated; A scaled so max
i|λ
i(A)| = 1
• quadratic stage costs with R
0= I, Q
0= I
• w
t∼ N (0, 0.25I)
• finite input set: U = {−0.2, 0, 0.2}
2Large problem with box constraints
• n = 30, m = 10
• A, B matrices randomly generated; A scaled so max
i|λ
i(A)| = 1
• quadratic stage costs with R
0= I, Q
0= I
• w
t∼ N (0, 0.25I)
• box input constraints: U = {v ∈ R
m| kvk
∞≤ 0.1}
Discretized mechanical control system
u1 u2
u3