Problem Formulation - Stochastic Control Foundations Of Autonomous Behavior

Letf0 :Rn→Rand f :Rn→Rm and let us define the following optimization problem.

p∗:= min

x∈_Rnf0(x)

s.t. f(x)0.

(5.1) The objective of this work is to determine whether the previous problem is feasible or not, i.e., if there existsx†∈_Rn _{such that}_f₍_x†₎_{0. In cases where the latter does not hold we}

would like to solve a relaxed version of the problem, where we can allow for some constraint violation. But most importantly, we want to identify which of the constraints is the hardest to satisfy, so the agent can decide which constraints should be removed from the problem and fall back into a laxer notion of feasibility. A possibility to understand the relative difficulty of satisfying different constraints is trhough Duality Theory. Each dual variable can be interpreted as “cost” or “price” associated to satisfying a given constraint and hence, the larger the value of the dual variable associated to a constraint, the harder it is to satisfy it. To formalize these ideas, introduce the following slack variables∈Rm+ and consider the

following relaxation of the problem (5.1)

p∗(s) := min

x∈_Rnf0(x)

s.t. f0(x)−s0,

(5.2) and its associated Lagrangian

L(x,λ,s) :=f0(x) +λ>(f(x)−s), (5.3)

whereλ∈_Rm

+. Likewise, let us define the dual function g(λ,s)

g(λ,s) := min

x∈Rn

L(x,λ,s). (5.4) The dual function is a lower bound for the primal function [16, Section 5.1.3], this is, for all λand s, we have that

g(λ,s)≤p∗(s). (5.5) The dual problem is then defined as the best lower bound for the previous problem

d∗(s) := max

λ∈_Rm

Notice that fors= 0 we recover the original primal problem (5.1). Duality Theory can allow us to establish whether a problem is feasible or not by looking at the dual problem. Indeed, if the the dual problem is unbounded above, i.e., d∗ =∞ it implies that p∗(s) =∞, hence the primal problem is infeasible. Because the dual function is concave – it is the point-wise minimum of linear functions– when the dual problem is unbounded it means that the dual solution λ∗(s)

λ∗(s) := argmax

λRm+

g(λ,s). (5.7) is also unbounded. The converse holds when strong duality does, i.e., whend∗(s) =p∗(s). Conditions for strong duality to hold are that f0(x) and f(x) are convex functions and

that there exists a strictly feasible point (see e.g., [16, Section 5.3.2]). We formalize this assumptions next for future reference.

AS9. We assume f :Rn→Rm is convex and f0:Rn→R isµ-strongly convex.

AS10. There exists x†∈_Rn _and _s†_∈

Rm+ such thatf(x†)−s†≺0.

Under Assumptions 9 and 10, for anyss†, it also holds that the primal-dual solution (x∗(s),λ∗(s)) is a saddle point of the Lagrangian (5.3) [16, Section 5.4.2]. The latter means, that for allx∈_Rn _and _λ_∈

Rm+ it holds that

L(x∗(s),λ,s)≤ L(x∗(s),λ∗(s),s)≤ L(x,λ∗(s),s). (5.8) The latter can be found via the Arrow-Hurwicz algorithm [4]. For a fixeds, the algorithm is such that it descends inxalong the direction of the negative gradient of the Lagrangian with respect tox ˙ x=−∇xL(x,λ,s) =− ∇f0(x) + m X i=1 λi∇f(x) ! , (5.9) and it ascends inλalong the direction of the gradient of the Lagrangian with respect toλ

λ= Π_Rm

+ (λ,∇λL(x,λ,s)) = ΠRm+ (λ, f(x)−s), (5.10)

where Π_Rm

+(·,·) refers to a projected dynamical system over the positive orthant ofR

m_{. This}

projection is introduced to ensure that the Lagrange multipliers are always non-negative. The intuition behind the previous algorithm is that as long as a constraint i is satisfied, its corresponding Lagrangian multiplier is zero, i.e., λi = 0. However, if said constraint is

being violated, thenfi(x)−s>0 and the value of the corresponding multiplier is increased. The intuition behind the update of the primal variable is that it descends along a weighted combination of the gradients of the objective function and the constraints, so to reduce the

value of all the functions. The specific values of the weights are given by each λi. Hence,

the relative strength that each gradient has is related with how much the constraint is being violated.

The main drawback with Arrow-Hurwciz algorithm in this context is that the value s†

that makes the problem (5.2) feasible, is not known beforehand. To overcome this limitation, we propose to updatexand λas in the classic Arrow-Hurwicz algorithm (5.9)–(5.10), with the following update in the slack variables

s=K(Kλ−s), (5.11) where K 0 is a matrix gain. The intuition behind the previous update is that as long as the constraints in the relaxed problem (5.2) are satisfied, i.e. λi = 0 the value of

the slack can be reduced. However, if a constraint is no longer satisfied, we will have

λi > 0 which will increase the slack of the corresponding constraint. In the next section

we show that the solutions of (5.9)–(5.11) are such that limt→∞s(t) =s∞ and such that

limt→∞x(t) = x∗(s∞) and limt→∞λ(t) = λ∗(s∞). For the slack variable to converge it

must hold that ˙s= 0, which can only happen if (cf., (5.11))

λ=K−1s. (5.12) To understand the importance of the previous condition, we need to refer back to the idea that dual variables are costs associated to satisfying a constraint. Formally, we have that (cf., [16, Section 5.6.2.]) ∇_sp∗(s) s=s∞ =−λ∗(s∞). (5.13)

The latter relationship, combined with the equilibrium condition (5.12) implies that the slack variable in the limit satisfies

∇sp∗(s) s=s∞ =−K−1s∞. (5.14)

The latter condition allows us to analyze the relative hardness of satisfying given constraints. Notice that the gain Matrix K can be used to assign relative importance to the different constraints, however, if they are all equally important, we could think of having K being the identity matrix. In this case, the larger the slack it means that the derivative is larger in absolute value. Hence, a reduction of the slack produces a higher increase in the optimal cost.

In the next section we formalize the convergence results outlined here and in Section 5.4 we generalize these results to settings in which we are not able to evaluate the functions and their gradients, but we have access to a stochastic model about them.

In document Stochastic Control Foundations Of Autonomous Behavior (Page 124-127)