Controller Design for Hybrid Systems
4. Because time has been abstracted away and φ(q) = V, the non-blocking assumption eliminates all possibilities of cheating by the controller (zeno executions are
meaning-less in this setting).
5. Players u and d play simultaneously. Subsumes turn based play [70] and priority play [71] (see [70] and [57]).
c
Figure 6.4: Example of discrete controller synthesis 6.6.2 Computation of Controlled Invariant Sets
Consider a set of states F ⊆ Q. Try to establish the largest set of initial states for which there exists a controller that manages to keep all executions inside F . Easiest to demonstrate by means of an example.
Consider a finite automaton with Q ={q1, . . . , q10}, U = D = {1, 2}, F = {q1, . . . , q8}, and the transition structure of Figure 6.4 (the inputs are listed in the order (u, d) and * denotes a “wild-card”). Try to establish the largest set of initial conditions, W∗ such that there exists a feedback controller that keeps the execution inside F . Clearly:
W∗ ⊆ F = {q1, . . . , q8}
Next, look at states that can end up in F in one transition (q4, q5, and q8). Notice that from q4 if u = 1 whatever d chooses to do we remain in F , while from q5 and q8 whatever u chooses to do we leave F . Therefore:
W∗⊆ {q1, q2, q3, q4, q6, q7}
Next look at states that can leave the above set in one transition (q4 and q6). Notice that from q4 if d = 1 whatever u chooses we leave the set, while from q6 if u = 1 d can choose d = 2 to force us to leave the set while if u = 2 d can choose d = 1 to force us to leave the set. Therefore:
W∗⊆ {q1, q2, q3, q7}
From the remaining states, if u chooses according to the following feedback map:
g(q) =
we are guaranteed to remain in{q1, q2, q3, q7} for ever. Therefore, W∗={q1, q2, q3, q7}
and g is the least restrictive controller that renders W∗ invariant (proof of both facts is by enumeration).
More generally this scheme can be implemented by the following algorithm [60]:
Algorithm 6.0 (Controlled Invariant Set) Initialization:
W0= F , W1 =∅, i = 0 p while Wi = Wi+1 do
begin
Wi−1= Wi∩ {q ∈ Q : ∃u ∈ U ∀d ∈ D δ(x, (u, d)) ⊆ Wi} i = i− 1
end
The index decreases as a reminder that the algorithm involves a predecessor operation. This is a real algorithm (can be implemented by enumerating the set of states and inputs). The algorithm terminates after a finite number of steps since:
Wi−1 ⊆ Wi and |W0| = |F | ≤ |Q| < ∞
6.6.3 Relation to Gaming
What does any of this have to do with gaming? Introduce a value function:
J : Q× Z−→ {0, 1} (6.5)
Consider the difference equation:
J (q, 0) =
1 q ∈ F 0 q ∈ Fc
J (q, i− 1) − J(q, i) = min{0, maxu∈Umind∈D[minq∈δ(q,(u,d))J (q, i)− J(q, i)]}
(6.6)
We will refer to equation (6.18) as a discrete Hamilton-Jacobi equation.
Proposition 6.7 (Winning States for u) A fixed point J∗ : Q → {0, 1} of (6.18) is reached in a finite number of iterations. The set of states produced by the algorithm is W∗={x ∈ X|J∗(x) = 1}.
Proof: We show more generally that Wi={x ∈ X|J(x, i) = 1}. The proof is by induction (see [58]).
This is beginning to look more game-like. Consider what happens when a fixed point of (6.18) is reached. Ignoring the outermost min for the time being leads to:
J∗(q) = max
u∈Umin
d∈D min
q∈δ(q,(u,d))J∗(q) (6.7)
Notice the similarity between this and (excuse the overloading of the notation): The equations look very similar. The only difference is that instead of having to worry about all feedback controllers, all disturbance trajectories and all possible executions we reduce the problem to a pointwise argument. Clearly equation (6.7) is much simpler to work with that equation (6.8). This is a standard trick in dynamic programming. Equation (6.7) is a special case of what is known in dynamic programming as Bellman’s equation, while the difference equation (6.18) is a form of value iteration for computing a fixed point to Bellman’s equation.
What about the outer min? This is an inevitable consequence of turning the sequence argument of equation (6.8) to a pointwise argument. It is there to prevent states that have been labeled as unsafe at some point from being relabeled as safe later on. If it were not there, then in the example of Figure 6.4 state q10would be labeled as safe after one step of the algorithm (i.e. J (q10,−1) = 1). The extra min operation implements the intersection with Wi in the algorithm of the previous section, ensures the monotonicity of Wi with respect to i and guarantees termination.
The fixed point of equation (6.18) also provides a simple characterization of a least restrictive controller that renders W∗ invariant:
Proposition 6.8 (Characterization of W∗ and g) W∗ is the maximal controlled invari-ant subset of F and g is the unique, least restrictive feedback controller that renders W∗ invariant.
6.7 Dynamic Programming
Recall that we would like to find functions:
(x, u) :R → Rn× Rm
where L :Rn× Rm → R, φ : Rn → R, f : Rn× Rm → Rn Lipschitz continuous in x and continuous in u and ψ :Rn→ R.
In Lecture 24 we discussed how this problem can be related to the solution of a partial differential equation, known as the Hamilton-Jacobi-Bellman equation, by introducing the value function (or “cost to go” function), J∗ :Rn× R → R, given by:
We argued that if a continuously differentiable solution to the Hamilton-Jacobi-Bellman partial differential equation [22]:
Equation (6.12) can be written more compactly if we introduce the Hamiltonian, J∗ : Rn× Rn× Rm → R:
Notice that we have effectively turned the problem of optimizing a cost over the space of curves (an infinite dimensional vector space), to optimizing a cost pointwise over a finite dimensional vector space. Of course to achieve this we are still required to solve a partial differential equation.
Remarks:
1. The solution is close to the solution obtained through the calculus of variations. Sim-ply replace the co-state p by ∂J∂x∗(x, t).
2. On the positive side, dynamic programming (unlike the calculus of variations) inher-ently leads to feedback solutions.
3. On the negative side, dynamic programming requires one to assume differentiability of the value functions.
4. Differentiability is difficult to guarantee. Even if all the data is “smooth”, the so-lution to the PDE may still develop “corners” (known as shocks). The situation is exasperated by the fact that the min operation is continuous but not smooth.
5. Optimal controls are often bang-bang. Consider the system:
Notice that u∗ switches between its extreme values whenever ∂J∂x∗(x, t)g(x) changes sign. Therefore, even if J∗ is continuously differentiable, H∗ is continuous, but not continuously differentiable.
6. If U is not compact, then the optimal controls may be undefined pointwise (consider for example the previous situation with U = (−∞, ∞) or U = (U1, U2)).
7. The situation may be even worse if U is not convex. The optimal controls may only