• No results found

3.3 Globalization Strategies

3.3.1 Merit Function

The most straightforward idea for measuring progress is to combine the two goals – reduc- tion of the objective function and constraint violation – into the so called merit function Ψ : Rnx× R → R, e.g., defined by

Ψ(x; τ) := f (x) + τθ (x) (3.21)

with a penalty parameter τ ∈ (0, ∞) that balances these two goals.5The measure of constraint violation θ(x) does not need to be defined as in (3.20), but a merit function has to fulfill two necessary conditions:

i. An optimal solution of minx∈RnxΨ(x; τ) for τ → ∞ must be an optimal solution of

(NLP) and vice versa.

ii. A step ∆xk must produce a reduction in the merit function and therefore be a descent direction for it, i.e., ∇xΨ xk; τ



∆xk< 0.

These two necessary conditions ensure that with every iteration k the merit function decreases monotonically until it ends up at an optimal solution x. So, optimizing (NLP) becomes equiv- alent to the unconstrained minimization of (3.21), but with the difference that the step calcu- lation does not rely on the merit function directly.

Popular examples of merit functions are: i. The ℓp merit functions (cf., Han [113]):

Ψ(x; τ) = f (x) + τ ∥(g(x), max {h(x), 0})∥p, p ∈ {1, 2, ∞} (3.22)

ii. The differentiable ℓ2 merit function for equality constrained problems (cf., Fiacco and McCormick [64, Chapter 4]):

Ψ(x; τ) = f (x) +12τ∥g(x)∥22 (3.23)

4Other globalization strategies like Gould and Toint [101] or Liu and Yuan [136] depend on different step

calculations and are therefore not considered here.

5It is also possible to position the penalty parameter in front of the objective function, but (3.21) is the common

3.3. Globalization Strategies 37

iii. The augmented Lagrangian merit function for equality constrained problems (cf., Hestenes [116] and Powell [164]):

Ψ(x; τ) = f (x) + λg(x) +12τ∥g(x)∥22 (3.24) iv. The augmented Lagrangian merit function for inequality constrained problems (cf., Ar-

row et al. [8] and Rockafellar [169]):

Ψ(x; τ) = f (x) +1 nhi=1 (max {νi+ τhi(x), 0})2− ν2i  (3.25)

Exact Merit Functions

It can be impractical that the penalty parameter τ has to go to infinity in order to satisfy the necessary condition of merit functions, mentioned above. Instead, one wishes that there exists a finite penalty parameter ¯τ > 0 such that this condition holds. For an implementation it would then be sufficient to choose this parameter ¯τ and never increase it. Merit functions having this additional property are called exact merit functions.

Definition 3.8 (Exact Merit Functions). A merit function Ψ(x; τ) defined by (3.21) is called

exact at an optimal solution x, if there exists a fixed parameter ¯τ > 0 such that for all τ > ¯τ the point xis also an optimal solution of min

x∈RnxΨ(x; τ).

It turns out that the ℓp and augmented Lagrangian merit functions are exact as stated in the following theorems, but unfortunately the differentiable ℓ2merit function is not.

Theorem 3.9. Let xbe an optimal solution of (NLP) satisfying the MFCQ and SOSC. Then, the

merit function Ψ(x; τ) = f (x) + τ ∥(g(x), max {h(x), 0})∥p with p ∈ [1, ∞] is exact.

Proof. See Han and Mangasarian [114, Corollary 4.7].

Theorem 3.10. Let xbe an optimal solution of (NLP) satisfying the MFCQ and SOSC. Then,

the merit function Ψ(x; τ) = f (x) + λg(x) +1

2τ∥g(x)∥22is exact. Proof. See Hestenes [116, Theorem 2.1].

The drawback of exact merit functions, however, is that the penalty parameter ¯τ is unknown a priori. This requires a strategy to update the penalty parameter during the optimization. Unfortunately, choosing a very large value from the beginning and hoping to be larger than ¯τ is not a good option as it can lead to very slow convergence. A very small penalty, on the other hand, can cause the attraction of unbounded infeasible points, if the objective function decreases much faster than the constraint violation increases. A survey on exact merit functions is given by Di Pillo [52], which also proposes to use penalty parameters that depend on the constraint violation to overcome the latter drawback.

Sufficient Decrease Condition

So far it has been neglected that the descent direction property, i.e., ∇xΨ xk; τ



∆xk < 0, does not lead to a sufficient reduction of the merit function Ψ(x; τ) for nonlinear programming, since – similarly to the beginning of Section 3.3 – this property is based on local information only. This is the point, where the line-search method comes into play and the step ∆xk may have to be shortened. In the following it is assumed, that the merit function is differentiable6 and compare the actual reduction

Ψ xk+ αk∆xk; τ



− Ψ xk; τ (3.26)

with the predicted reduction based on a linear or quadratic Taylor approximation Ψ xk; τ+ αkxΨ xk; τ  ∆xk+ α2k ∆xk  ∇2x xΨ xk; τ  ∆xk− Ψ xk; τ =αkxΨ xk; τ  ∆xk+ α2k ∆xk  ∇2x xΨ xk; τ  ∆xk. (3.27)

If the actual reduction is at least a fraction of the predicted reduction, the step is said to be acceptable. In case of a linear model of reduction this yields the Armijo [7] condition

Ψ xk+ αk∆xk; τ



− Ψ xk; τ≤ σαkxΨ xk; τ



∆xk≤ 0 (3.28)

with a parameter σ ∈ (0, 1) and which is illustrated in Figure 3.1 (left). Wolfe [194, 195] proposes to extend the Armijo condition by

xΨ xk+ αk∆xk; τ



∆xk≥ η∇xΨ xk; τ



∆xk, (3.29)

η ∈ (σ, 1), to avoid arbitrarily small step sizes. In practice however, this further condition is often neglected and instead a value αk ∈ (0, 1] satisfying the Armijo condition and be- ing as large as possible is selected. Note, that finding the optimal step size, e.g., solving minαk>0Ψ xk+ αk∆xk; τ



, is not a practical option since it involves the solution of a (nons- mooth) nonlinear program.

Exemplary for the SQP method, Algorithm D presents a globally convergent version of Algo- rithm B under rather strong assumptions.

Theorem 3.11 (Global Convergence of SQP Method with a Merit Function). Let



xk, λk, νk

k be a sequence generated by Algorithm D such that the tuple xk, λk, νk

 lies in some compact set for all k, xksatisfies the LICQ and, for all d ∈ Rnx,

c1∥d∥2≤ d2

x xL xk, λk, νk



d ≤ c2∥d∥2 (3.30)

with c1> 0 and c2> 0. Then, xk, λk, νk k converges to a first-order optimal point of (NLP). Proof. See Boggs and Tolle [19, Theorem 4.3].

6If the merit function is not differentiable, then the linear or quadratic model for the predicted reduction has

3.3. Globalization Strategies 39 (θ (xk), f (xk)) forbidden region acceptable region Armijo condition slope −ρ θ f

Monotone Merit Function

(θ (xk), f (xk)) (θ (xk−2), f (xk−2)) (θ (xk−1), f (xk−1)) forbidden region acceptable region θ f

Non-Monotone Merit Function

Figure 3.1:Monotone merit function (left) and non-monotone merit function (right). The non-monotonicity level

on the right is l = 2.

Non-Monotone Merit Functions

Although Theorem 3.11 proofs global convergence for an optimization algorithm, the intro- duction of the merit function – the main extension done in Algorithm D – requires a new study of local convergence, since Theorem 3.6 is based on the full step (αk= 1). One could think that the same properties would hold, but this is actually not true. There exist examples (cf., Powell [165, Section 3]) that show search directions ∆xk yielding local q-quadratic conver-

gence but increasing both, the objective function and the constraint violation, and, thus, would be rejected by the merit function. This is known as the Maratos effect [139]. But also in the unconstrained case, the step size can be reduced unnecessarily, for example when the step direction tries to follow a curvy valley. Possibilities to avoid this are the modification of the step ∆xk, in particular second-order-correction steps (cf., Conn et al. [45, Section 15.3.2.3] or Section 3.6.2), or the relaxation of the merit function acceptance criterion (3.28) to allow a non-monotone decrease of it. Examples include Chamberlain et al. [39], Panier and Tits [157] and Toint [182], which basically exchange (3.28) for

Ψ xk+ αk∆xk; τ  − max i=0,...,lm  Ψ x(k−i)+; τ ≤ σαkxΨ xk; τ⊤∆xk (3.31)

and force a decrease with respect to the largest value of the former lm ∈ N merit function values, see Figure 3.1 (right). While non-monotone merit function techniques usually compli- cate the global convergence theory, overall efficiency gains can be reported (cf., Grippo et al. [112]).

Related documents