Because time has been abstracted away and φ(q) = V, the non-blocking assumption eliminates all possibilities of cheating by the controller (zeno executions are

Controller Design for Hybrid Systems

4. Because time has been abstracted away and φ(q) = V, the non-blocking assumption eliminates all possibilities of cheating by the controller (zeno executions are

meaning-less in this setting).

5. Players u and d play simultaneously. Subsumes turn based play [70] and priority play [71] (see [70] and [57]).

Figure 6.4: Example of discrete controller synthesis 6.6.2 Computation of Controlled Invariant Sets

Consider a set of states F ⊆ Q. Try to establish the largest set of initial states for which there exists a controller that manages to keep all executions inside F . Easiest to demonstrate by means of an example.

Consider a ﬁnite automaton with Q ={q₁, . . . , q₁₀}, U = D = {1, 2}, F = {q₁, . . . , q₈}, and the transition structure of Figure 6.4 (the inputs are listed in the order (u, d) and * denotes a “wild-card”). Try to establish the largest set of initial conditions, W^∗ such that there exists a feedback controller that keeps the execution inside F . Clearly:

W^∗ ⊆ F = {q₁, . . . , q₈}

Next, look at states that can end up in F in one transition (q₄, q₅, and q₈). Notice that from q₄ if u = 1 whatever d chooses to do we remain in F , while from q₅ and q₈ whatever u chooses to do we leave F . Therefore:

W^∗⊆ {q₁, q₂, q₃, q₄, q₆, q₇}

Next look at states that can leave the above set in one transition (q₄ and q₆). Notice that from q₄ if d = 1 whatever u chooses we leave the set, while from q₆ if u = 1 d can choose d = 2 to force us to leave the set while if u = 2 d can choose d = 1 to force us to leave the set. Therefore:

W^∗⊆ {q₁, q₂, q₃, q₇}

From the remaining states, if u chooses according to the following feedback map:

g(q) =

we are guaranteed to remain in{q₁, q₂, q₃, q₇} for ever. Therefore, W^∗={q₁, q₂, q₃, q₇}

and g is the least restrictive controller that renders W^∗ invariant (proof of both facts is by enumeration).

More generally this scheme can be implemented by the following algorithm [60]:

Algorithm 6.0 (Controlled Invariant Set) Initialization:

W⁰= F , W¹ =∅, i = 0 p while Wⁱ = Wⁱ⁺¹ do

begin

Wⁱ⁻¹= Wⁱ∩ {q ∈ Q : ∃u ∈ U ∀d ∈ D δ(x, (u, d)) ⊆ Wⁱ} i = i− 1

end

The index decreases as a reminder that the algorithm involves a predecessor operation. This is a real algorithm (can be implemented by enumerating the set of states and inputs). The algorithm terminates after a ﬁnite number of steps since:

Wⁱ⁻¹ ⊆ Wⁱ and |W⁰| = |F | ≤ |Q| < ∞

6.6.3 Relation to Gaming

What does any of this have to do with gaming? Introduce a value function:

J : Q× Z₋→ {0, 1} (6.5)

Consider the diﬀerence equation:

J (q, 0) =

1 q ∈ F 0 q ∈ F^c

J (q, i− 1) − J(q, i) = min{0, maxu∈Umin_d_∈D[min_q∈δ(q,(u,d))J (q, i)− J(q, i)]}

(6.6)

We will refer to equation (6.18) as a discrete Hamilton-Jacobi equation.

Proposition 6.7 (Winning States for u) A ﬁxed point J^∗ : Q → {0, 1} of (6.18) is reached in a ﬁnite number of iterations. The set of states produced by the algorithm is W^∗={x ∈ X|J^∗(x) = 1}.

Proof: We show more generally that Wⁱ={x ∈ X|J(x, i) = 1}. The proof is by induction (see [58]).

This is beginning to look more game-like. Consider what happens when a ﬁxed point of (6.18) is reached. Ignoring the outermost min for the time being leads to:

J^∗(q) = max

u∈Umin

d∈D min

q∈δ(q,(u,d))J^∗(q) (6.7)

Notice the similarity between this and (excuse the overloading of the notation): The equations look very similar. The only difference is that instead of having to worry about all feedback controllers, all disturbance trajectories and all possible executions we reduce the problem to a pointwise argument. Clearly equation (6.7) is much simpler to work with that equation (6.8). This is a standard trick in dynamic programming. Equation (6.7) is a special case of what is known in dynamic programming as Bellman’s equation, while the difference equation (6.18) is a form of value iteration for computing a fixed point to Bellman’s equation.

What about the outer min? This is an inevitable consequence of turning the sequence argument of equation (6.8) to a pointwise argument. It is there to prevent states that have been labeled as unsafe at some point from being relabeled as safe later on. If it were not there, then in the example of Figure 6.4 state q₁₀would be labeled as safe after one step of the algorithm (i.e. J (q₁₀,−1) = 1). The extra min operation implements the intersection with Wⁱ in the algorithm of the previous section, ensures the monotonicity of Wⁱ with respect to i and guarantees termination.

The ﬁxed point of equation (6.18) also provides a simple characterization of a least restrictive controller that renders W^∗ invariant:

Proposition 6.8 (Characterization of W^∗ and g) W^∗ is the maximal controlled invari-ant subset of F and g is the unique, least restrictive feedback controller that renders W^∗ invariant.

6.7 Dynamic Programming

Recall that we would like to ﬁnd functions:

(x, u) :R → Rⁿ× R^m

where L :Rⁿ× R^m → R, φ : Rⁿ → R, f : Rⁿ× R^m → Rⁿ Lipschitz continuous in x and continuous in u and ψ :Rⁿ→ R.

In Lecture 24 we discussed how this problem can be related to the solution of a partial diﬀerential equation, known as the Hamilton-Jacobi-Bellman equation, by introducing the value function (or “cost to go” function), J^∗ :Rⁿ× R → R, given by:

We argued that if a continuously diﬀerentiable solution to the Hamilton-Jacobi-Bellman partial diﬀerential equation [22]:

Equation (6.12) can be written more compactly if we introduce the Hamiltonian, J^∗ : Rⁿ× Rⁿ× R^m → R:

Notice that we have effectively turned the problem of optimizing a cost over the space of curves (an infinite dimensional vector space), to optimizing a cost pointwise over a finite dimensional vector space. Of course to achieve this we are still required to solve a partial differential equation.

Remarks:

1. The solution is close to the solution obtained through the calculus of variations. Sim-ply replace the co-state p by ^∂J_∂x^∗(x, t).

2. On the positive side, dynamic programming (unlike the calculus of variations) inher-ently leads to feedback solutions.

3. On the negative side, dynamic programming requires one to assume diﬀerentiability of the value functions.

4. Diﬀerentiability is diﬃcult to guarantee. Even if all the data is “smooth”, the so-lution to the PDE may still develop “corners” (known as shocks). The situation is exasperated by the fact that the min operation is continuous but not smooth.

5. Optimal controls are often bang-bang. Consider the system:

Notice that u^∗ switches between its extreme values whenever ^∂J_∂x^∗(x, t)g(x) changes sign. Therefore, even if J^∗ is continuously diﬀerentiable, H^∗ is continuous, but not continuously diﬀerentiable.

6. If U is not compact, then the optimal controls may be undeﬁned pointwise (consider for example the previous situation with U = (−∞, ∞) or U = (U1, U₂)).

7. The situation may be even worse if U is not convex. The optimal controls may only

In document Hybrid Systems: Modeling, Analysis and Control (Page 121-126)