Alternative Approaches - Solution Methodologies for Optimal Control Problems Driven by Stochast

7 Optimal Control Problems Driven by Stochastic Differential Equations

7.2 Solution Methodologies for Optimal Control Problems Driven by Stochastic Differential Equations

7.2.4 Alternative Approaches

Despite the previously mentioned methods are the most important and influential tools for solving SOCPs, there are more, partially relatively restrictive, ideas.

A first idea presented in[151] addresses the optimal control of nonlinear stochastic systems by considering the corresponding FOKKER-PLANCK-KOLMOGOROVequation. It provides a computational method based on policy iterations in the original infinite dimensional function space and on finite dimensional approximations of the controlled diffusion operator. Hence, low order approximations of this operator allow for significant reductions of the dimensionality.

Related to the MCA method the approach of[68, 69] approximates the diffusion process by a finite-dimensional MARKOVchain through the application of generalized cell to cell mappings.

Unfortunately, the idea suffers from the curse of dimensionality as well.

As both the HJB and MCA methods do not work well for problems that are linear in the control and include stopping the process optimally,[64] provides an approximation of the solution by

polynomial interpolationand solves for the optimal strategy switching between a maximum

S T O C H A S T I C O P T I M A L C O N T R O L CHAPTER 7 t Rh t0 t1 t2 t3 t4 t5 t6 . . . tNh₋₁ τ x0 (a) t Rh t0 t1 x0 (b)

Figure 7.1: Visualization of the MARKOVchain approximation method. The left Figure (a) depicts two sample trajectories of a (controlled) stochastic process together with a chain approximat- ing them. The right Figure (b) shows the difference between the original MARKOVchain approximation of HAROLDKUSHNER[156, 157] and the weak approximation idea of JACEK

KRAWCZYK[148]: While in the original method in one time step transitions to each state y∈ Rh

of the lattice are possible with a positive probability ph(x, y | u) ¾ 0—visualized by the solid and dashed arrows—, with the weak approximation approach—depicted by only the dashed arrows—only a specific small number of grid points are reachable from x , de- pending on the chosen weak scheme. Hence, the transition probabilities are only positive for those grid points that are neighbored to the state points, where the weakly approximated process moves to under decision u.

Note that in this illustration these grid points of the weak approximation are chosen as to match with the sample trajectories of the process. In general this will not be the case: The appropriate grid points are determined by the weak scheme and the decision u depending on the actual time t and state point x .

Especially for high-dimensional biological systems[225] presents an idea of approximating the solution to the continuous problem by using continuous function approximators includ- ing a state augmentation. However, this method is very sensitive to the choice of parameters adjusting the function approximation.

[131] considers a special class of nonlinear, non-quadratic control problems where the nonlinear HJB equation can be transformed into a linear equation. Then, the usual backward integration of the dynamic programming recursion can be replaced by computing expectations and a forward diffusion process. This requires stochastic integration over trajectories that can be described by a path integral.

Another alternative is based on the optimal quantization of stochastic processes, cf.[173, 174, 197], which analyzes the optimal approximation of random vectors by quantized vectors taking only a finite number of values. In financial applications of stochastic control where the under- lying state process can be split into a (multi-dimensional) uncontrolled process_{Yt_}t_∈T and a—usually one-dimensional—controlled process{Zt}t∈T,[198] describes a method in which

the original process is firstly approximated by the EULER-MARUYAMAscheme. Afterwards, the discretized parts { ˜Yt}t∈T and { ˜Zt}t∈T are quantized by an optimal grid making use of the

CHAPTER 7 S T O C H A S T I C O P T I M A L C O N T R O L

quantization of the BROWNian motion (for { ˜Yt}t_∈T), and using an orthogonal lattice and a

simple closest neighbor projection onto it (for{ ˜Z_t}t∈T), respectively. This procedure allows to

preserve the MARKOVstructure of the considered original process to be able to apply general

control theory, in particular dynamic programming, but with largely reduced dimensions.

7.3 Finite Horizon Stochastic Optimal Control and the W

IENER

Chaos Approach

In this section we apply the WIENERchaos approach that we have introduced in Chapter 5 and applied to numerically solve SDEs in the previous one to finite horizon OCPs (7.1) that are driven by controlled stochastic processes. To keep notations clear, we restrict ourselves again to the one-dimensional case n_X = nB= 1. Generalizations to multi-dimensionality can be done straightforwardly.

We consider the SDE (7.1b), where we proceed in a similar way as before to obtain the propagator of the system. Besides the expansion of the state process_{X_t}t_∈T we have to include a second chaos expansion determining the control process_{ut}t_∈T, i.e., (with the basis polyno- mialsΨα(η) defined as before), i.e.,

u_t=X

α∈I

u_α(t)Ψα(η). (7.7)

Remark 7.4

By incorporating expansion (7.7) directly in the propagator obtained for a controlled SDE, we cannot guarantee the assumed feedback character of the MARKOV control u_t = u(t, X_t) anymore.

From a computational point of view, there is another disadvantage of directly implementing the expansion (7.7) of the control process: The final deterministic optimal control problem we want to deduce would contain the same number of state and control functions x_α(t) and u_α(t) in that case. Hence, the resulting problem would be very hard to solve numerically.

The remedy to both problems lies in the following theorem, cf.[116].

Theorem 7.2

Assume that theMARKOVcontrol u_t= u(t, X_t) can be TAYLOR-expanded in terms of X_t. Then by considering the q-th order polynomial

uq(t, X_t) = q X i=0 ˆ u_i(t) X_ti, (7.8)

the original control coefficients u_α(t) of (7.7) are characterized completely by the q+1 new control functionsuˆ_i(t), i = 0, . . . , q, and the state coefficients x_α(t). Furthermore, the resulting control

uq_t is automatically non-anticipative and tends to utfor q→ ∞. 4

Proof In contrast to expanding u(t, Xt) in t, for calculating the expansion in terms of X_t

S T O C H A S T I C O P T I M A L C O N T R O L CHAPTER 7

order numerical integration schemes for SDEs. It suffices to apply a standard (infinite) TAYLOR

expansion in X_t= ξ, yielding u_t= ∞ X n=0 1 n! ∂n_u ∂ xn(t, Xt) Xt=ξ(Xt− ξ) n

which can always be rewritten in powers of X_t. Hence, with defining new control functions ˆ

u_i(t), i ∈ N, as the coefficient terms of these powers, one arrives at the infinite version of (7.8). Similarly, a finite version up to order q can be defined with the q-th term corresponding to the remaining error. The convergence to u_t follows directly and so does the non-anticipativity as we express u_t through the state process which fulfills the property by definition.

Now if we compare (7.7) and (7.8) X α∈I u_α(t)Ψα(η) = q X i=0 ˆ ui(t) X_ti (7.9)

by inserting the chaos expansion (5.30) of X_tand projecting the resulting expression onto the chaos bases, we obtain a system describing the original control coefficients u_α(t) by the new control functions ˆui(t) and the state coefficients xα(t), while having the feedback character of

the MARKOVcontrol included implicitly.

Example 7.1

Assume q = 2. Then the quadratic and non-anticipative expansion of the control process {ut}t∈T is given by (7.7), where the coefficients uα(t) are defined by the system

u_α(t) = û0(t) ·1{α=0}+ û1(t) · xα(t) + û2(t) · X β∈I X 0¶γ¶α C(α, γ, β) x_α−γ+β(t) x_γ+β(t) (7.10)

for allα ∈ I and C(α, γ, β) given by (compare Theorem 5.6, in particular Equation (5.34), [172]) C(α, γ, β) = v tα γ γ + β β α − γ + β β .

All multi-index operations are defined component-wise again, including the binomial coefficient that is calculated as the product of the component’s binomial coefficients. Note as well that by 0 ¶ γ ¶ α it holds α + γ − β ∈ I and γ + β ∈ I.

Combining Theorems 6.1 and 7.2 we obtain a deterministic reformulation of the controlled SDE (7.1b). Hence, the only missing part of our transformation method is the objective function (7.1a) of the original SOCP. But as this is already formulated as an expectation value, it can be rewritten directly in terms of the deterministic coefficients x_α(t) of the state process {Xt}t∈T and the (new) control functions ˆui(t). Hence, confining again to the one-dimensional case for notational reasons, we obtain a deterministic OCP that fully captures the original stochastic problem if we do not apply any form of truncation, i.e., if we do not restrict p, k, or q.

CHAPTER 7 S T O C H A S T I C O P T I M A L C O N T R O L

Corollary 7.1

The deterministic OCP min

{ˆui(·)}i∈N

J(t0, x0,{ˆui(·)}i∈N) (7.11a)

s.t. ˙x_α(t) = b_α t, X_t, ∞ X i=1 ˆ ui(t) Xti + ∞ X j=1 pα jmj(t)σα−_(j) t, X_t, ∞ X i=1 ˆ ui(t) Xti ∀t ∈ T , ∀α ∈ I, (7.11b) x_α(t0) =1{α=0}x0. (7.11c)

is equivalent to the original SOCP (7.1). Having deduced the set of optimal control coefficient

functions{ˆui(t)}t∈T ,i∈N, the optimal MARKOV control policy u(t, Xt) is obtainable immediately

through Theorem 7.2. ₄

In applications, the controlled SDE (7.1b) generally includes nonlinear drift or diffusion terms— e.g., square roots or fractions of stochastic processes—that cannot be treated by the multipli- cation formula (5.33) of chaos expansions and a basic comparison of coefficients. In the field of general Polynomial Chaos (PC), [70] describes an idea to reformulate those nonlineari- ties as complex equation systems. This requires the introduction of auxiliary algebraic states and results in a propagator Differential-Algebraic Equation (DAE) system instead of the Ordi- nary Differential Equation (ODE) system (7.11b). However, this DAE system can be efficiently solved in the context of direct optimal control. Additionally, this allows the consideration of algebraic equations in the original SOCP (7.1) as well. We will illustrate this procedure in the following Chapter 8.3.

The following Algorithm 7.1 summarizes the steps to solve a SOCP by the introduced WIENER

chaos method.

In document Numerical Methods for Random Parameter Optimal Control and the Optimal Control of Stochastic Differential Equations (Page 152-156)