Separated Solution to the HJB - Efficient Methods for Stochastic Optimal Control

The modeling assumption is made that the problem data of Equation (1.18) can be accurately represented, or approximated, with a low rank separated representation.

fi(t, x) = r_fi X l=1 d O k=1 (fi)l_d (3.4)

where rfi is assumed to be small.

There is then the need to approximate the relevant operators present in (1.18), specifically the gradient and Hessian, in a low rank representation. A number of options exist, with varying levels of complexity in the analysis and accuracy, rang- ing from simple finite difference schemes to spectral differentiation techniques [130]. Specifically, the gradient along dimension k can be simply represented as

∇k=I1⊗ · · · ⊗ ∇ ⊗ · · · ⊗Id

while the Hessian has entries∇k,j =∇k· ∇j, and the estimates of the derivative along

an individual coordinate are simply a suitably high order finite difference scheme in one dimension. For instance, the first and second order central finite difference matrices are given by the tri-diagonal matrices

∇x = 1 2h      . .. −1 0 1 . ..      , ∇xx = 1 h2      . .. 1 −2 1 . ..     

Thus, the directional gradient and second order terms may simply be constructed out of rank one representations. For example, using of sums of these rank one terms yields a representation for the Laplacian that has separation rank d. However, such a representation may be not have minimal separation rank for a given accuracy. Other constructions specifically targeting the separated representation exist [120], for example a Laplacian approximation may be made with separation rank two, rather than requiring a full rank-d sum of second order terms.

3.4.1 Separation Rank of the HJB

Determining the separation rank of the Hamilton Jacobi Bellman operator is straight- forward. Denote the separation rank of a vector or operatorX asrX, i.e.,rX are the

number of additive terms in X. Recalling (1.27) and neglecting the time dependent component for the first-exit case, the operator consists of three additive terms.

1 λqΨ =f T₍_∇ xΨ) + 1 2T r((∇xxΨ) Σt).

The state-cost term qΨ is a diagonal operator along each dimension, and thus con- tributes rq. The second, advection term is an inner product between the dynamics

f and the gradient of the desirability, resulting in the multiplication of each element fi by a rank one operator, and then their summation. The contribution from this

component results in separation rank Pd

k=1rfi where rfi is the separation rank of

fi. Finally, the second-order term requires the construction of Σt in (1.25). Here

the growth in the separation rank may be significant, due to the multiplicative contribution of G. However, given diagonal cost matrix R or noise covariance Σ the

number of terms may collapse significantly. The separation rank of the Hamilton Jacobi Bellman operator is simply the sum of these three terms’ rank.

The result is that the separation rank for individual problems may vary over a wide range, depending on the problem data. However, in many problems of interest it remains low. For even apparently complex systems, complexity typically manifests as nonlinear multiplicative terms in the dynamics. This form of complexity, specifically the presence of nonlinearities, effectively adds no cost in terms of separation rank, and it is instead the number of additive terms that are of concern, which is typically small. Furthermore, in many applications the control or noise matrix Σt typically

contain constant terms, corresponding to tensors of separation rank one. Finally, for systems where a high separation rank accumulates, it remains possible to search for low rank structure by performing ALS on the operator before attempting to solve the linear system.

Algorithm 2 Assignment of boundary conditions in an operator _A with separated representation for hypercube boundaryR.

Inputs: Operator _A, hypercube R = x∈_Rd_|_x

i ∈[ai, bi] , boundary value tensor

T, grid points{xi,j}representing the domain of A.

Output: Modified operator _A¯. 1. Define _A˜ :=_A

2. Let˜_I be the identity operator tensor 3. For k = 1. . . d

(a) Set _A˜l

k(i, j) = 0 for all xi,j ∈/ [ai, bi]

(b) Set˜_Il

k(i, j) = 0 for all xi,j ∈/ [ai, bi]

4. Set _A¯ =_A−_A˜ + ˜_I

3.4.2 Representation of Interior Boundary Conditions

Optimal control applications impose irregular boundary conditions on many problems of interest. For example, stabilization to the origin corresponds to a zero-cost point- boundary at the origin. Obstacles or unsafe regions are boundary conditions as well, and typically have value according to some penalty. In temporal problems, dynamic programming can be used to show that the cost of achieving sub-goals in the future relative to the current task manifest as boundary conditions along exit points of the current task. The value along these boundaries equals to the cost-to-go [131], computed from value functions in subsequent tasks, i.e., a set of Dirichlet boundary conditions within the domain.

Essential boundary conditions are imposed by setting the value of grid points to some desired value via linear equalities within the domain. Although in other settings it is desirable to remove the degrees of freedom from within the boundaries to save computational effort, in the context of this work maintaining the symmetry of the discretization grid is a far greater concern. Specifically, Dirichlet boundary conditions are imposed only on regions composed of hypercubes in the domain, allowing us to modify the domain with only a modest increase in the separation rank of the operator. The operator’s initial effect on the hypercube is first extracted, then subtracted,

(a) (b) (c)

Figure 3.1: Illustrations of an inverted pendulum, a VTOL aircraft, and a quadcopter.

leaving a “hole” which can then be filled with the identity and allowing for Dirichlet conditions to be imposed. Specifically, Given an operator _A, first extract the operator’s current effect upon grid points in the hypercube R = x∈_Rd_|_x

i ∈[ai, bi] ,

giving the operator within the domain _A˜, with separation rankr_A˜ ≤r_A. As Dirichlet

boundary conditions are assigned, the values on the discretization grid withinR are set to the identity, yielding the operator ˜_I, which has r˜_I ≤ d. The desired operator

is then formed by subtracting the effects of the original operator within the region and then adding the identity, yielding an operator with bounded separation rank r_A¯ ≤d+ 2r_A

A=A−A˜+ ˜I. (3.5)

In document Efficient Methods for Stochastic Optimal Control (Page 99-102)