Optimal Control Problem (OCP) formulation

Optimization has become an essential tool in many disciplines, from economics to engineering. A system is modeled with variables, objectives and constraints and the goal is to find the set of variables that optimize the specified objective within the constraints. Control design of a system is a trial and error process to find parameters for the system suitable for certain processes. The combination of the two is the optimal control theory: optimal control problems (OCP) consist in solving optimization problems for time dependent processes where states, controls and parameters are simultaneously computed with respect to certain objective functions and set of constraints.

In this thesis, optimal control is used to analyze human movements as well as to generate motions for the humanoid robot HeiCub, where we treat both as dynamic systems that can be described with the rigid multi-body dynamics illustrated in the previous section. Due to the complexity of the dynamics, the optimal control problems are nonlinear and require numerical methods to be solved.

In this section we will first illustrate numerical methods to solve nonlinear optimal control problems, then in particular the direct multiple shooting method is illustrated with more de- tails, with reference to how it was implemented in the software package MUSCOD-II [68],

which is used in this thesis to carry out the computations.

2.2.1 Formulation and numerical solutions of optimal control problems

A generic nonlinear optimal control problem can be defined as follows: min x,u,p,T RT 0 ΦL(t, x(t), u(t), p)d t + ΦM(T, x(T), p) (2.13) subject to: ˙ x(t) = f(t, x(t), u(t), p), t ∈ [0, T] g(t, x(t), u(t), p) ¾ 0 req(x(t₀), .., x(T), p) = 0 rineq(x(t0), .., x(T), p) ¾ 0 . (2.14)

In this formulation, the objective function can be of Lagrangian typeΦ_L(·) or a Mayer type ΦM(·), or a combination of the two, in this case it is commonly called Bolza type. The time

horizon is represented by T , over which the lagrangian type objective function is integrated. The objective function is minimized with respect to the states x(t) ∈ Rnx d_{, the controls u}(t) ∈ Rnu _{and the model parameters p}_{∈ R}np _{(if any).}

The right hand side of ˙x(·) = f(·) is a set of ordinary differential equations (ODE) describing the

dynamics of the considered system, such as described in the previous section as in Eq. (2.9). The continuous path constraints are formulated with g(·) and boundary and point constraints are formulated with r(·), where req are the equality constraints and rineq are the inequality constraints.

In this thesis we treat mainly the problem of walking, which is a multiphase problem with discontinuous dynamics which is described using rigid multi-body dynamics as in the previous section. The same applies to motion generation problems involving different contacts. There- fore multiphase optimal control problems need to be formulated for these classes of problems. Modifications are introduced to the single phase optimal control problem as illustrated above

for the multiphase formulation: min x,u,p,s0,s1,...,s_nph nph−1 P j=0 Rsj+1 sj ΦL(t, x(t), u(t), p)d t + ΦM(sj+1, x(sj+1), p) (2.15) subject to: ˙ x(t) = f_j(t, x(t), u(t), p), t∈ [sj, sj+1] j= 0, . . . , nph− 1, s0= 0, snph= T x(s+_j) = Je_j(x(s−_j), p), j = 0, .., n_ph− 1 g_j(t, x(t), u(t), p) ¾ 0, t ∈ [s_j, s_j₊₁] req(x(0), .., x(T), p) = 0 rineq(x(0), .., x(T), p) ¾ 0 (2.16)

where the objective function, differential equations f_j(·) and constraints can be defined for each phase j = 0, ..., n_ph_{− 1, with n}_ph the total number of phases. The transition function between two subsequent phases is defined byJe_j(x(s−_j), p), where s_j is the time boundary of the phase j, s+_j and s−_j are the time before and after the transition.

In the case of dynamic systems that will be studied in this thesis, the transition functions correspond to the impact dynamics and the continuous phases of different phases correspond to set of dynamic equations with different contact constraints.

Among the existing methods for solving nonlinear optimal control problems numerically, we can identify three main methods: the dynamic programming, indirect methods and direct meth- ods.

Dynamic programming consists in solving the problem by finding the value function that satisfies the Hamilton-Jacobi-Bellman equation (HJB) which is a partial differential equation (PDE), by discretizing the state and control spaces[14]. The optimization process is based on

the principle that the remainder of an optimal trajectory is also optimal, therefore the solution is found iteratively by minimizing the sum of the costs for the next time step and the cost to go from the sampling point reached in the next time step to the goal, where the state sequence is also optimized. The disadvantage of dynamic programming is that the complexity increases exponentially with the number of states and controls, which means that for complex dynamic systems as the ones we treat in this thesis this method is not applicable in a feasible way. In the so called indirect methods the problem is first transformed into a boundary value problem by means of the Pontryagin’s Maximum principle[86], then this boundary value problem

is numerically solved, e.g. with Newton method. Indirect shooting methods were also proposed to solve problems under this formulation. This method provides highly accurate solutions but initialisation is non intuitive and the convergence domain is small, therefore it results to be easy to use with simple cases but difficult for complex systems.

In direct methods first the states and controls are discretized with respect to time (“first dis- cretize, then optimize”), i.e. the infinite dimensional optimization problem is transformed into a finite dimensional one by means of appropriate function approximation. In this way the optimal control problem is ported to a Nonlinear Programming problem (NLP) which can then

be solved using well established optimization techniques such as the Sequential Quadratic Programming (SQP) method. The disadvantage of direct methods is that solutions are local optima, however it has several advantages over the previous two methods, such as the larger convergence domain, a larger variety of problems can be treated and the computational complexity increases with low polynomial order with the complexity of the problem. There are different methods to perform discretization, the main ones are collocation, direct shooting and direct multiple shooting.

A direct method using multiple shooting is chosen for this thesis, which is explained in de- tail in the following.

2.2.2 Direct multiple shooting method

The software package MUSCOD (MUltiple Shooting CODe for optimization) of IWR1, Heidel- berg University, is based on the theory of direct multiple shooting described in [16] and the

first version was released in 1981, implemented in Fortran. It was later improved and ported to C/C++ in the release of MUSCOD-II in [68].

In the direct multiple shooting method states and controls are discretized, the phase times d_j are divided into m_jintervals, over which simple functions (e.g. constant or linear) are chosen for the controls. In this way the continuous optimal control problems is reformulated as an NLP problem solved using the SQP algorithm. The differential equations are solved independetly on each of the multiple shooting intervals in parallel to the NLP. We illustrate in the following how the problem is discretized and solved in MUSCOD-II.

Discretization of controls

The controls of each phase j is divided into subintervals by parametrizing the control function with a piecewise function, e.g. piecewise constant. The time interval of each phase j is divided into a grid: sj= t j 0< t j 1< ... < t j mj−1< t j mj = sj+1, for j= 0, ..., nph− 1. (2.17) On this grid the controls u_j(·) are discritized by means of the piecewise function ϕ(·) and parameter a_ij:

u_j(t, a_ij) = ϕ_j(t, a_ij), for i = 0, ..., m_j− 1 (2.18) where ϕ_j(·) can be chosen to be the same function for all phases or different for each. In the current version of MUSCOD-II, the user can choose the implementation ofϕ_j(·) between piecewise constant, linear or cubic. Higher order can also be used but would increase the complexity of the problem.

In all the applications of this thesis, both for the human model and the HeiCub model, the chosen function is the piecewise constant, i.e.ϕ_j(t, a_ij) = a_ij.

Discretization of states

The discretization grid of the states can be the same as for the controls or different. For sim- plicity, we assume them to be the same, but we would like to remind that in principle they can be different.

The states are parametrized with the multiple shooting nodes, while the time interval between two nodes is called multiple shooting interval.

Given our assumption, with reference to the notations used for controls discretization, the number of multiple shooting intervals of a phase j corresponds to m_j, while the number of multiple shooting nodes is m_j+ 1. This means that the total number of discretized points is nmp= P

nph

j=1mj+ 1.

For each interval the states are parametrized as: ˙

x= f_j(t, x(t), ϕ_j(t, a_ij)), for i = 0, ..., m_j_{− 1} (2.19) where initial values are defined at each interval i:

x(t_ij) = X_ij, for i= 0, ..., m_j_{− 1} (2.20)

with X_ij being the initial values of interval i and phase j, which have to be satified at the

In document The role of compliance in humans and humanoid robots locomotion (Page 33-36)