Risk-Sensitive Nonlinear Control Given: - Optimal Control Problem min

Optimal Control Problem min

Algorithm 2 Risk-Sensitive Nonlinear Control Given:

- Risk sensitivity parameter σ

- System dynamics (21), measurement model (22), and cost function (23)

Initialization:

- Start with a stable control law π(t, x)

repeat

- Forward integrate or propagate the system dynamics to compute a nominal trajectory: xn

1:T, un1:T

- Linear approximation of the dynamics. Eqs. (27)-(28) or (36)-(37)

- Quadratic approximation of the cost. Eqs. (29)-(30) or (38)-(39)

- Forward pass for estimation gains. Eqs. (32)- (33) or (40), (41)

- Backward pass with regularization parameter λ. Eqs. (35) or (43)

- Update control law with line search parameter α,

π(t, x) =un(t) +αl(t) +L(t)(x(t)−xn(t))

untilconvergence or a termination condition is satisfied

Algorithm2summarizes the procedure to compute locally-optimal feedback

control policies sensitive to process and measurement noise. It is important to mention two implementation details used for the discrete-time case, as presented in [39_{]. First of all, when computing the optimal control in Eq. (}42_{), H}_k_{needs to be}

inverted. When it is positive-definite, the unique minimizer can be readily found. If it is not, there exist control directions which would allow to make the cost arbitrarily small. This is obviously not true for the nonlinear system, it appears because of the approximations (compare for example Hkwith Ht). To control this, a regularization

term λI is used to make the sum Hk+λIpositive-definite. This brings an additional

advantage: when the sensitivity parameter takes a large value so as to make the matrix Hkbe negative-definite, by including the regularization term, the feedback

gains being too loose does not happen anymore. An outer-loop regulates λ, similar to [39]. The second detail, is that when propagating the dynamics with the updated

control sequence, the new state trajectory might diverge. By adding a line search with parameter α∈ [0, 1], a control sequence that generates a reduction in the cost can still be found and progress in the optimization can be made.

5.5 f e e d b a c k c o n t r o l a r c h i t e c t u r e f o rk i n o-dynamic movement p l a n s

In the previous section, a method for synthesis of noise-sensitive feedback controllers was derived (as summarized in algorithm2), that requires the definition of a state

dynamics and measurement models (f(x, u)and h(x, u)respectively as defined in (21_)-(22_{)), as well as of the desired cost to be minimized. This section will}

describe how the previously derived algorithm can be applied to stabilize whole- body movement plans on a simulated humanoid robot and will thus define the appropriate models and objective functions required by the algorithm. As explained in the introductory background section1.6, the algorithm described in chapters 3-4for whole-body motion optimization is based on exploiting the kino-dynamic

separability of the centroidal momentum dynamics model, as shown in Figure1.

In the same spirit, the feedback control architecture for tracking of kino-dynamic movement plans, proposed in this chapter, makes use of this decomposition to synthesize independently impedance behaviors for tracking of the desired whole- body motion (kinematic plan) and momentum trajectory (dynamic plan). These optimally designed closed loop behaviors will finally serve as input to an inverse dynamics controller, that while respecting all physical constraints, will compute the torques required to generate in the robot a behavior as close as possible to the desired ones.

In the rest of this section, the design of kino-dynamic feedback controllers and its use within an inverse dynamics controller will be presented.

5_.5.1 _{Optimal Impedance Behavior for Centroidal Momentum Plans}

The dynamics side of the kino-dynamic trajectory optimization approach is in charge of finding dynamically feasible endeffector contact wrenches (forces and torques) as well as contact locations that support the generation of a desired dynamic motion while satisfying all physical dynamics constraints. Thus, the goal of the impedance controller for a dynamic plan is to stabilize the robot state trajectories h around the desired ones hdes_{using the available controls. In other words, the robot state}

trajectories to be stabilized using the available controls ut, namely contact forces fe

and contact torques γe(with respect to the frame at the endeffector pose), include

the robot’s center of mass r and centroidal momenta l, k. Figure31_{shows the}

structure used to synthesize a dynamics feedback controller. The most important components include:

• The performance criteriaJdyncomposed by two terms: φdyn_trackused to reward good tracking of the desired center of mass and momentum trajectories, and

5.5 feedback control architecture for kino-dynamic movement plans 93

 Kh K

Impedance Control for Contact Force Behaviors



h des

Figure 31: This figure shows the inputs (desired momentum and contact force trajectories), outputs (momentum and force impedances) and the definition of process and measurement models as well as of the objective used to optimize an impedance for tracking of momentum and contact force behaviors.

• The process model in this system defined by fdynthat maps previous state

ht−1and control inputs u=

. . . feTγeT . . .

to output state ht.

• The measurement model hdynthat for simplicity, but without loss of generality,

is assumed to consist of a noisy measurement of each state.

• The zero-mean noise parameters ωht and γht that encode the stochasticity

within the process and measurement models respectively. • The inputs to this system are the desired centroidal dynamics hdes

t and contact

force λdes

t trajectories generated by the kino-dynamic trajectory optimizer, and

the outputs are a set of impedance policies Kh, Kλfor closed loop tracking of

the desired dynamic behaviors.

The force gain Kλis the control policy produced by the noise-sensitive algorithm

mapping tracking errors in the state to corrections in the control actions (contact forces λt). The momentum gain Khis obtained by pre-multiplying the force gain

Kλby the control transition matrix of the linearized dynamics, and maps tracking

errors in the state to corrections in the state (center of mass and momentum). Thus, together Khand Kλallow to update nominal momentum trajectories and reference

contact wrenches according to the current errors in state tracking, behaviors which serve then as input to an inverse dynamics controller that attempts to generate the torques to follow them as close as physically possible.

Impedance Control of Whole-Body Motions

Figure 32: This figure shows the inputs (desired momentum and whole-body motion trajectories), output (whole-body gain) and the definition of process and measurement models as well as of the objective used to optimize an impedance for tracking of whole-body motion behaviors.

5.5.2 Optimal Impedance Behavior for Whole-Body Motion Plans

The kinematics side of the kino-dynamic trajectory optimization is in charge of finding a kinematically feasible sequence of robot postures q, velocities ˙q and accelerations ¨q within the robot limits that resemble the dynamically optimized center of mass motion and centroidal momentum. Thus, the goal of the impedance controller for a kinematic plan is to stabilize the robot state trajectories q, ˙q around the desired ones qdes_{, ˙q}des_{using the available controls, namely contact forces λ}

and joint torques τj, such that the motion-induced momentum h=A(q)˙qfollows

the desired dynamics one hdes_{. Consequently, to synthesize a kinematics feedback}

controller, the structure defined in Figure32is used.

Its most important components include: • The performance criteria Jπ

kin composed as in the previous case by two

components, namely: φkin

trackused to reward good tracking performance of de-

sired momentum and kinematic state trajectories; and φkin

t used to regularize

controls and include user-defined rewards.

• The process model corresponds to the equations of motion of a floating-base rigid-body system. The state dynamics are defined by fkinthat maps previous

state x_t−1and control inputs u=h_λT

In document Motion Planning and Feedback Control of Simulated Robots in Multi-Contact Scenarios (Page 105-108)