• No results found

Constrained Linear Stochastic Control

N/A
N/A
Protected

Academic year: 2021

Share "Constrained Linear Stochastic Control"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Performance Bounds for

Constrained Linear Stochastic Control

Yang Wang and Stephen Boyd Electrical Engineering Department

Stanford University

(2)

Outline

• constrained linear stochastic control problem

• some heuristic control schemes – projected linear control

– control-Lyapunov

– certainty-equivalent model predictive control (MPC)

• the linear quadratic case

• performance bound

• performance bound parameter choices for control schemes

• numerical examples

(3)

Linear stochastic system

• linear dynamical system with process noise:

x

t+1

= Ax

t

+ Bu

t

+ w

t

, t = 0, 1, . . . , – x

t

∈ R

n

is the state

– u

t

∈ U is the control input

– U ⊂ R

m

is the input constraint set, with 0 ∈ U

– w

t

∈ R

n

is zero mean IID process noise, E w

t

w

Tt

= W

• state feedback control policy:

u

t

= φ(x

t

), t = 0, 1, . . . ,

φ : R

n

→ U is the state feedback function

(4)

Objective

• objective is average stage cost:

J = lim sup

T →∞

1 T E

T −1

X

t=0

(ℓ

x

(x

t

) + ℓ

u

(u

t

))

– ℓ

x

: R

n

→ R is state stage cost function – ℓ

u

: U → R is the input state cost function

• ℓ

x

, ℓ

u

, U need not be convex

(5)

Stochastic control problem

• stochastic control problem: choose feedback function φ to minimize J

• infinite dimensional nonconvex optimization problem

• problem data:

– dynamics and input matrices A, B – distribution of process noise w

t

– state and input cost functions ℓ

x

, ℓ

u

– input constraint set U

• φ

denotes an optimal feedback function

• J

denotes optimal objective value

(6)

‘Solution’ via dynamic programming

• find V

: R

n

→ R and α with V

(z) + α = min

v∈U

(ℓ

u

(v) + E V

(Az + Bv + w

t

)) (expectation is over w

t

)

• optimal feedback function is then φ

(z) = argmin

v∈U

(ℓ

u

(v) + E V

(Az + Bv + w

t

))

• optimal value of stochastic control problem is J

= α

(7)

Stochastic control problem

• generally very hard to solve

(even more: how would we represent a general function φ?)

• can be effectively solved

– when the problem dimensions are very small, e.g., n = m = 1 – when U = R

m

and ℓ

x

, ℓ

u

are convex quadratic;

in this case optimal policy is linear: φ

(z) = Kz

• many suboptimal methods have been proposed

– can evaluate J for a given φ via Monte Carlo simulation – but how suboptimal is it?

• this talk: an effective method for finding a (good) lower bound on J

(8)

Projected linear state feedback

• a simple suboptimal policy:

φ

pl

(z) = P(K

pl

z) – K

pl

∈ R

m×n

is a gain matrix (to be chosen) – P is projection onto U

• when U is a box, i.e., U = {u | kuk

≤ U

max

}, reduces to saturated linear state feedback

φ

pl

(z) = U

max

sat ((1/U

max

)K

pl

z)

sat is (entrywise) unit saturation

(9)

Control-Lyapunov policy

• control-Lyapunov policy is φ

clf

(z) = argmin

v∈U

(ℓ

u

(v) + E V

clf

(Az + Bv + w

t

))

– V

clf

: R

n

→ R (which is to be chosen) is the control-Lyapunov function

– when V

clf

= V

, this is optimal policy

• when V

clf

is quadratic, the control-Lyapunov policy simplifies to φ

clf

(z) = argmin

v∈U

(ℓ

u

(v) + V

clf

(Az + Bv))

since E w

t

= 0, and term involving E w

t

w

tT

= W is constant

(10)

Certainty-equivalent model predictive control (MPC)

• φ

mpc

(z) is found by solving (possibly approximately) minimize P

T −1

τ=0

(ℓ

x

(˜ x

τ

) + ℓ

u

(v

τ

)) + V

mpc

(˜ x

T

) subject to x ˜

τ+1

= A˜ x

τ

+ Bv

τ

, τ = 0, . . . , T − 1

v

τ

∈ U, τ = 0, . . . , T − 1

˜

x

0

= z

– variables are v

0

, . . . , v

T −1

, ˜ x

0

, . . . , ˜ x

T

– V

mpc

: R

n

→ R is the terminal cost (to be chosen) – T is the planning horizon (also to be chosen)

• let solution be v

0

, . . . , v

T − 1

, ˜ x

0

, . . . , ˜ x

T

• MPC policy is φ

mpc

(z) = v

0

(11)

Parameters in heuristic control policies

• performance of suboptimal policies depends on choice of parameters (K

pl

, V

clf

, V

mpc

and T )

• one choice for V

clf

, V

mpc

: (quadratic) value function for some unconstrained linear quadratic problem

• one choice for K

pl

: optimal gain matrix for some unconstrained linear quadratic problem

• we will suggest some parameters later . . .

(12)

The performance bound

our method:

• computes a lower bound J

lb

≤ J

using convex optimization (hence is tractable)

• bound is computed for each specific problem instance

• (at this time) cannot guarantee tightness of bound

(13)

Judging a heuristic policy

• suppose we have a heuristic policy φ with objective J (evaluated by Monte Carlo, say)

• since J

lb

≤ J

≤ J, if J − J

lb

is small, then – policy φ is nearly optimal

– bound J

lb

is nearly tight

• if J − J

lb

is big, then for this problem instance, either – policy is poor, or,

– bound is poor (or both)

• examples suggest that J − J

lb

is often small

(14)

Unconstrained linear quadratic control

• can effectively solve stochastic control problem when – U = R

m

(no constraints)

– ℓ

x

(z) = z

T

Qz, ℓ

u

(v) = v

T

Rv, Q  0, R  0

• optimal cost is J

lq

= Tr(P

lq

W )

• optimal state feedback function is φ

(z) = K

lq

z, where K

lq

= −(R + B

T

P

lq

B)

1

B

T

P

lq

A

• P

lq

is positive semidefinite solution of ARE

P

= Q + A

T

P

A − A

T

P

B(R + B

T

P

B)

1

B

T

P

A

(15)

Linear quadratic control via LMI/SDP

• can characterize J

lq

and P

lq

via the semidefinite program (SDP) maximize Tr (P W )

subject to P  0

 R + B

T

P B B

T

P A

A

T

P B Q + A

T

P A − P



 0 – variable is P

– optimal point is P = P

lq

; optimal value is J

lq

• solution does not depend on W , as long as W ≻ 0

• constraints are convex in (P, Q, R), so J

lq

(Q, R) is a concave function

of (Q, R)

(16)

Basic bound

• suppose Q  0, R  0, s satisfy

z

T

Qz + v

T

Rv + s ≤ ℓ

x

(z) + ℓ

u

(v) for all z ∈ R

n

, v ∈ U i.e., quadratic stage costs are everywhere smaller than ℓ

x

+ ℓ

v

• then J

lq

(Q, R) + s is a lower bound on J

• follows from monotonicity of stochastic control cost w.r.t. stage costs

• lefthand side is optimal value of unconstrained quadratic problem

(17)

Optimizing the bound

• can optimize the lower bound over Q, R, s by solving maximize J

lq

(Q, R) + s

subject to Q  0, R  0,

z

T

Qz + v

T

Rv + s ≤ ℓ

x

(z) + ℓ

u

(v) for all z ∈ R

n

, v ∈ U

• a convex optimization problem – objective is concave

– constraints are convex

– last constraint is convex in Q, R, s for each z and v

• last constraint is semi-infinite, parameterized by the (infinite) set

z ∈ R

n

, u ∈ U

(18)

Optimizing the bound

• semi-infinite constraint makes problem difficult in general

• can solve exactly in a few cases

• in other cases, can replace semi-infinite constraint with conservative

approximation, which still gives a lower bound

(19)

Quadratic stage cost and finite input set

• can solve optimization problem exactly when

– ℓ

x

(z) = z

T

Q

0

z, ℓ

u

(v) = v

T

R

0

v, Q  0, R  0 – U = {u

1

, . . . , u

K

} (finite input constraint set)

• constraint

z

T

Qz + v

T

Rv + s ≤ ℓ

x

(z) + ℓ

u

(v) for all z ∈ R

n

, v ∈ U becomes

Q  Q , u

T

Ru + s ≤ u

T

R u , i = 1, . . . , K

(20)

• to optimize the bound we solve SDP (with variables P , Q, R, s) maximize Tr (P W ) + s

subject to P  0, Q  0, R  0, Q  Q

0

 R + B

T

P B B

T

P A

A

T

P B Q + A

T

P A − P



 0 u

Ti

Ru

i

+ s ≤ u

Ti

R

0

u

i

, i = 1, . . . , K

• monotone in Q, so we can set Q = Q

0

w.l.o.g.

(21)

S-procedure relaxation

• suppose stage costs are quadratic

• suppose we can find R

1

, . . . , R

M

and s

1

, . . . , s

M

for which U ⊆ ˜ U = {v | v

T

R

i

v + s

i

≤ 0, i = 1, . . . , M }

• a sufficient condition for

z

T

Qz + v

T

Rv + s ≤ ℓ

x

(z) + ℓ

u

(v) for all z ∈ R

n

, v ∈ U is

z

T

Qz + v

T

Rv + s ≤ z

T

Q z + v

T

R v for all z ∈ R

n

, v ∈ ˜ U

(22)

• equivalent to Q  Q

0

and

v

T

R

i

v + s

i

≤ 0, i = 1, . . . , M =⇒ v

T

Rv + s ≤ v

T

R

0

v

• which is implied by Q  Q

0

and the existence of λ

1

, . . . , λ

M

≥ 0 with

R − R

0



M

X

i=1

λ

i

R

i

, s ≤

M

X

i=1

λ

i

s

i

(by the S-procedure)

• so J

(Q, R) + s is a still a lower bound on J

(23)

• to optimize the bound we solve the SDP maximize Tr (P W ) + s

0

subject to P  0, Q  0, R  0, Q  Q

0

 R + B

T

P B B

T

P A

A

T

P B Q + A

T

P A − P



 0 R − R

0

 P

M

i=1

λ

i

R

i

, s

0

≤ P

M

i=1

λ

i

s

i

λ

i

≥ 0, i = 1, . . . , M

with variables P , Q, R, λ

1

, . . . , λ

M

, s

0

, . . . , s

M

• can set Q = Q

0

w.l.o.g.

(24)

Suboptimal control policies

• optimizing the lower bound gives P

lb

• can interpret Tr(P

lb

W ) as optimal cost of an unconstrained quadratic problem that approximates (and underestimates) our problem

• suggests that

V

lb

(z) = z

T

P

lb

z, and

K

lb

= −(R

lb

+ B

T

P

lb

B)

1

B

T

P

lb

A

are good choices of parameters for suboptimal control policies

• examples show this is the case

(25)

Numerical examples

• illustrate bounds for 3 examples

– small problem with trilevel inputs – large problem with box constraints – discretized mechanical control system

• compare lower bound with various heuristic policies – projected linear state feedback

– model predictive control

– control-Lyapunov policy

(26)

Small problem with trilevel inputs

• n = 8, m = 2

• A, B matrices randomly generated; A scaled so max

i

i

(A)| = 1

• quadratic stage costs with R

0

= I, Q

0

= I

• w

t

∼ N (0, 0.25I)

• finite input set: U = {−0.2, 0, 0.2}

2

(27)

Large problem with box constraints

• n = 30, m = 10

• A, B matrices randomly generated; A scaled so max

i

i

(A)| = 1

• quadratic stage costs with R

0

= I, Q

0

= I

• w

t

∼ N (0, 0.25I)

• box input constraints: U = {v ∈ R

m

| kvk

≤ 0.1}

(28)

Discretized mechanical control system

u1 u2

u3

• 6 masses connected by springs; 3 input tensions between masses

• quadratic stage costs with R

0

= I, Q

0

= I

• w

t

uniform on [−0.5, 0.5]

• box input constraints: U = {v ∈ R

m

| kvk

≤ 0.1}

(29)

Heuristic policies

• projected linear state feedback with K

pl

= K

lq

• control-Lyapunov policy with V

clf

(z) = z

T

P

lb

z

• model predictive control (MPC) with T = 30, V

mpc

(z) = z

T

P

lb

z

(for trilevel example we solve convex relaxation with u(t) ∈ [−0.2, 0.2],

then round value to {−0.2, 0, 0.2})

(30)

Results

small trilevel large random masses

PLSF 12.9 31.3 269.8

CLF 10.8 25.6 61.1

MPC 10.9 25.7 58.9

J

lb

9.1 23.8 43.2

• control-Lyapunov with P

lb

and MPC achieve similar performance

• control-Lyapunov policy can be computed very fast (in tens of microseconds); MPC policy can be computed in milliseconds

• bound J

lb

is reasonably close to J for these examples

(31)

Conclusions

• we’ve shown how to find lower bounds on optimal performance for constrained linear stochastic control problems

• requires solution of convex optimization problem, hence is tractable

• provides only provable lower bound on optimal performance that we are aware of

• as a by-product, provides excellent choice for quadratic control-Lyapunov function

• in many cases, gives everything you want:

– a provable lower bound on performance

– a relatively simple heuristic policy that comes close

(32)

References

• Wang & Boyd, Performance Bounds for Linear Stochastic Control, Systems Control Letters 58(3):178–182, 2008

• Wang & Boyd, Performance Bounds and Suboptimal Policies for Linear Stochastic Control via LMIs, (manuscript)

• Cogill & Lall, Suboptimality Bounds in Stochastic Control: A Queueing Example, Proceedings ACC p1642–1647, 2006

(similar results in MDP setting)

• Lincoln & Rantzer, Relaxing Dynamic Programming, IEEE T-AC

51(8):1249-2006, 2006

References

Related documents

The Effect of Consumption of Unhealthy Snacks on Diet and the Risk of Metabolic Syndrome in Adults: Tehran Lipid and Glucose Study, Iran. Zahra

Base on the results of the probit regression model, I conclude that online bookmakers’ odds are efficient forecasts of soccer match outcomes, but even greater

Following an application made to the Licensing Team for the grant of a Private Hire Drivers Licence, the Committee is requested to consider the report and determine if they consider

Inferior appendage 2-segmented; when viewed laterally, basal segment stout, subrectangular, elongate, broadest and convex at midlength, apical segment stout, rectangularly

“5µW-to-10mW input power range inductive boost converter for indoor pho- tovoltaic energy harvesting with integrated maximum power point tracking algorithm,” in Solid-State

 Effects on Cultural Resources, including but not limited to, prehistoric and homestead sites in undeveloped areas of JBER – This EA does not evaluate the potential for effects

Based on surveys and the researchof data from lighting producers to these programs we found out, that the DIALux program is used more frequently than the Relux program (see

Advanced Placement courses are taught at an advanced level and provide a student the opportunity to earn high school credit as well as the opportunity to earn