Synthesis of relaxed optimal inputs - Numerical Methods for Nonlinear Optimal Control Problems

Now that we have established a theoretical foundation for the equivalence between computing relaxed optimal control inputs and the Pontryagin Minimum Principle, we focus our attention on the development of numerical algorithms to synthesize approximated optimal control inputs. As we show in this section, the use of relaxed inputs is beneficial not only from a theoretical point of view, but also in our numerical algorithms, since the optimal control problem becomes equivalent to solving a sequence of convex optimization problems, even when the dynamical system is nonlinear.

6.3.1 Vector field representation

As we show in section 6.2.2, it is possible to formulate an iterative gradient descent method using relaxed inputs that converges to Pontryagin-optimal points. This theoretical iterative method revolves around computing the value of θh, as defined in equation (6.13), which aims

to find a new stochastic process that locally reduces the cost function of the problem in equation (6.7).

Proposition 6.6. The optimization problem in equation (6.13) is convex.

We omit a detailed proof, but it can be shown that the feasible set and cost function of problem (6.13) are both convex.

We now focus our attention on methods to numerically represent the feasible set of the problem in equation (6.13).

Proposition 6.7. Let x_{∈ R}n_{. Then}

Z Rm f (x, u) dµ(u)| µ ∈ M(Rm_{), supp(µ)} _{⊂ U} = = co{f(x, u) | u ∈ U}, (6.17)

where co(S) is the convex hull of a set S _{⊂ R}n_.

We omit a detailed proof, but the result follows since the convex hull is clearly contained in the left-hand side set of all possible expected values, and every expected value can be written as the limit of a sequence of points in the convex hull since all Lebesgue-Stieltjes integrals can be approximated using finite Riemann integrals [44].

The result in Proposition 7.5 suggests that to synthesize optimal inputs, we need only to be able to compute the convex hull of the vector field at any given state x _{∈ R}n_{. Now, some}

points in the convex hull of the vector field have a particular input vector u_{∈ U associated} with them, but others cannot be directly realized.

On the other hand, every point in the convex hull is, by definition, a convex combination of a finite set of vectors in the vector field. Hence, we can approximate the convex hull of the vector field at each x_{∈ R}n_{with the convex hull of a finite collection of samples of the vector}

field for a set_{ˆui}N_i=1s . Moreover, as shown in [125] and [126], we can arbitrarily approximate

the trajectories resulting from any convex combination of vectors in a finite vector field by using a projection operation that results in switching between those vectors. In the next section we overview the main results of this method.

6.3.2 Sampled-Based Synthesis

First we are going to show that by using samples we can obtain approximations of the convex hull and the stochastic process. Given a control’s stochastic process, µ, and its trajectory x(µ) _{for the system in problem (6.7) at time t} _{∈ [0, T ], the sampling points for the vector}

field are f t, x(µ)_{(t), u} i

i=1, where {ui} Ns

i=1 ⊂ U. Each sampling point has a corresponding

weight, and the sequence of _{wi(t)}N_i=1s satisfies Ns

wi(t) = 1. (6.18)

wi(t)≥ 0, ∀i ∈ {1, · · · , Ns}. (6.19)

We use these sampling points and their weights to represent the stochastic process. Asymp- totically, the sampling and approximation converge to µ.

Proposition 6.8. Ns X i=1 wi(t)f t, x(µ)(t), ui → Z Rm f t, x(µ)(t), u dµt(u), asymptotically as Ns → ∞, for a. e. t ∈ [0, T ].

We omit a detailed proof, but the result is based on the Monte Carlo principle and particle filtering [5, 9].

A similar result holds for approximating the stochastic process’s variance, δµ, with sampling points in the vector field.

Since numerically the number of sampling points is finite and, at each time step, the only dif- ference between each samples is the control value in the vector field, we get a relaxed switched system. At time t, the chance to perform control under value ui is its weight wi(t)∈ [0, 1]. In

order to reconstruct a deterministic control, we need to project the weights _{wi(t)}N_i=1s from

[0, 1]Ns _into{0, 1}Ns_{, and at the same time, preserve the trajectory, x}(µ)_{, and the sum of w}

i(t)

to be 1. We use wavelet transformation and pulse-width modulation (PWM) to perform the projection. After wavelet transformation, the weight, wi(t), becomes a piecewise constant in

energy spectrum space, and by using the PWM operator, we project the weight back into time space with only 0/1 values. Details about reconstructing a deterministic control from a relaxed switch system can be found in section 4.4 [125].

Now we present the numerical version of algorithm 2. In the algorithm, the sampling values for the control are fixed as {ui}N_i=1s ⊂ U. At each iteration, the control’s stochastic process,

µk, and its vector field at time t are approximated by the samples’ weights,{wk,i(t)}N_i=1s . Also,

we define the approximated weights to δµkfrom equation (6.13) as{ ˆwk,j(t)}j=1Nk . { ˆwk,j(t)}Nj=1k

are derived from the discrete version of equation (6.13) with a regularity term formulated by taking the l1 norm of the weights,

θh x(µk),{wk,i}N_i=1s = = min ˆ wk,i(t)∈R Z T t=0 p0(t)0 _XNs i=1 f t, x(µk)_{(t), u} i ˆwk,i(t) dt+ + ηw Z T t=0k ˆ wk,i(t)k₁dt,

subject to: for a. e. t ∈ [0, T ],

wk,i(t) + ˆwk,i(t)≥ 0, ∀i ∈ {1, · · · , Ns},

and Ns X i=1 ˆ wk,i(t) = 0. (6.20)

We add a regularity term_{k ˆ}wk,j(t)k₁ in equation (6.20) and ηw > 0 is the weight. The addition

of the regularity term is due to numerical computation and the following proposition Proposition 6.9. For a signed measure ν_{∈ M(R}m_{), if we define its norm as} _{kνk, then the}

θh in equation (6.13) with a regularity term is still an optimality function of problem (6.7),

and it does not lose its convergence property. We write this new optimality function as

θn x(µ0), µ0 = = min δµ : [0,T ]→M(Rm₎ Z T 0 p0(t)0 Z Rm f t, x(µ0)(t), udδµ t(u)+ +kδµtk dt,

subject to: supp(δµt+ µ0,t)⊂ U,

dδµt(u) = 0 for a. e. t∈ [0, T ].

(6.21)

We give a sketch of proposition 6.9 here. First, by construction we can show θn in equa-

tion (6.21) is always non-positive. Second, when θh reaches zero, θn is also zero. Hence, θn

is an optimality function as well, and adding a regularity term does not change the critical points indicated by the optimality function, and therefore the convergence holds.

Now we can show the numerical implementation of algorithm 2 as follows:

In document Numerical Methods for Nonlinear Optimal Control Problems and Their Applications in Indoor Climate Control (Page 101-105)