Merit Function - Globalization Strategies

3.3 Globalization Strategies

3.3.1 Merit Function

The most straightforward idea for measuring progress is to combine the two goals – reduc- tion of the objective function and constraint violation – into the so called merit function Ψ : Rnx_{× R → R, e.g., defined by}

Ψ(x; τ) := f (x) + τθ (x) (3.21)

with a penalty parameter τ ∈ (0, ∞) that balances these two goals.5_{The measure of constraint} violation θ(x) does not need to be defined as in (3.20), but a merit function has to fulfill two necessary conditions:

i. An optimal solution of min_x∈RnxΨ(x; τ) for τ → ∞ must be an optimal solution of

(NLP) and vice versa.

ii. A step ∆xk must produce a reduction in the merit function and therefore be a descent direction for it, i.e., ∇xΨ xk; τ

_⊤

∆xk< 0.

These two necessary conditions ensure that with every iteration k the merit function decreases monotonically until it ends up at an optimal solution x∗_{. So, optimizing (NLP) becomes equiv-} alent to the unconstrained minimization of (3.21), but with the difference that the step calcu- lation does not rely on the merit function directly.

Popular examples of merit functions are: i. The ℓ_p merit functions (cf., Han [113]):

Ψ(x; τ) = f (x) + τ ∥(g(x), max {h(x), 0})∥p, p ∈ {1, 2, ∞} (3.22)

ii. The differentiable ℓ2 merit function for equality constrained problems (cf., Fiacco and McCormick [64, Chapter 4]):

Ψ(x; τ) = f (x) +1₂τ_∥g(x)∥2₂ (3.23)

4_{Other globalization strategies like Gould and Toint [101] or Liu and Yuan [136] depend on different step}

calculations and are therefore not considered here.

5_{It is also possible to position the penalty parameter in front of the objective function, but (3.21) is the common}

3.3. Globalization Strategies 37

iii. The augmented Lagrangian merit function for equality constrained problems (cf., Hestenes [116] and Powell [164]):

Ψ(x; τ) = f (x) + λ⊤g(x) +1₂τ_∥g(x)∥2₂ (3.24) iv. The augmented Lagrangian merit function for inequality constrained problems (cf., Ar-

row et al. [8] and Rockafellar [169]):

Ψ(x; τ) = f (x) +_4τ1 nh ∑ i=1 (max {νi+ τhi(x), 0})2− ν2i (3.25)

Exact Merit Functions

It can be impractical that the penalty parameter τ has to go to infinity in order to satisfy the necessary condition of merit functions, mentioned above. Instead, one wishes that there exists a finite penalty parameter ¯τ > 0 such that this condition holds. For an implementation it would then be sufficient to choose this parameter ¯τ and never increase it. Merit functions having this additional property are called exact merit functions.

Definition 3.8 (Exact Merit Functions). A merit function Ψ(x; τ) defined by (3.21) is called

exact at an optimal solution x∗_{, if there exists a fixed parameter ¯τ > 0 such that for all τ > ¯τ the} point x∗_{is also an optimal solution of min}

x∈RnxΨ(x; τ).

It turns out that the ℓ_p and augmented Lagrangian merit functions are exact as stated in the following theorems, but unfortunately the differentiable ℓ2merit function is not.

Theorem 3.9. Let x∗be an optimal solution of (NLP) satisfying the MFCQ and SOSC. Then, the

merit function Ψ(x; τ) = f (x) + τ ∥(g(x), max {h(x), 0})∥p with p ∈ [1, ∞] is exact.

Proof. See Han and Mangasarian [114, Corollary 4.7].

Theorem 3.10. Let x∗ be an optimal solution of (NLP) satisfying the MFCQ and SOSC. Then,

the merit function Ψ(x; τ) = f (x) + λ⊤_{g(x) +}1

2τ∥g(x)∥22is exact. Proof. See Hestenes [116, Theorem 2.1].

The drawback of exact merit functions, however, is that the penalty parameter ¯τ is unknown a priori. This requires a strategy to update the penalty parameter during the optimization. Unfortunately, choosing a very large value from the beginning and hoping to be larger than ¯τ is not a good option as it can lead to very slow convergence. A very small penalty, on the other hand, can cause the attraction of unbounded infeasible points, if the objective function decreases much faster than the constraint violation increases. A survey on exact merit functions is given by Di Pillo [52], which also proposes to use penalty parameters that depend on the constraint violation to overcome the latter drawback.

Sufficient Decrease Condition

So far it has been neglected that the descent direction property, i.e., ∇xΨ xk; τ

_⊤

∆xk < 0, does not lead to a sufficient reduction of the merit function Ψ(x; τ) for nonlinear programming, since – similarly to the beginning of Section 3.3 – this property is based on local information only. This is the point, where the line-search method comes into play and the step ∆xk may have to be shortened. In the following it is assumed, that the merit function is differentiable6 and compare the actual reduction

Ψ xk+ αk∆xk; τ

− Ψ xk; τ (3.26)

with the predicted reduction based on a linear or quadratic Taylor approximation Ψ xk; τ+ αk∇xΨ xk; τ _⊤ ∆xk+ α2k ∆xk _⊤ ∇2x xΨ xk; τ ∆xk− Ψ xk; τ =αk∇xΨ xk; τ _⊤ ∆xk+ α2k ∆xk _⊤ ∇2x xΨ xk; τ ∆xk. (3.27)

If the actual reduction is at least a fraction of the predicted reduction, the step is said to be acceptable. In case of a linear model of reduction this yields the Armijo [7] condition

Ψ xk+ αk∆xk; τ

− Ψ xk; τ≤ σαk∇xΨ xk; τ

_⊤

∆xk≤ 0 (3.28)

with a parameter σ ∈ (0, 1) and which is illustrated in Figure 3.1 (left). Wolfe [194, 195] proposes to extend the Armijo condition by

∇xΨ xk+ αk∆xk; τ

_⊤

∆xk≥ η∇xΨ xk; τ

_⊤

∆xk, (3.29)

η _{∈ (σ, 1), to avoid arbitrarily small step sizes. In practice however, this further condition} is often neglected and instead a value α_k _{∈ (0, 1] satisfying the Armijo condition and be-} ing as large as possible is selected. Note, that finding the optimal step size, e.g., solving min_α_k_>0Ψ xk+ αk∆xk; τ

, is not a practical option since it involves the solution of a (nons- mooth) nonlinear program.

Exemplary for the SQP method, Algorithm D presents a globally convergent version of Algo- rithm B under rather strong assumptions.

Theorem 3.11 (Global Convergence of SQP Method with a Merit Function). Let

xk_{, λ}k_{, ν}k

k be a sequence generated by Algorithm D such that the tuple xk, λk, νk

lies in some compact set for all k, xk_{satisfies the LICQ and, for all d ∈ R}nx,

c₁_∥d∥2_{≤ d}⊤_∇2

x xL xk, λk, νk

d ≤ c2∥d∥2 (3.30)

with c₁> 0 and c₂> 0. Then, xk, λk, νk _k converges to a first-order optimal point of (NLP). Proof. See Boggs and Tolle [19, Theorem 4.3].

6_{If the merit function is not differentiable, then the linear or quadratic model for the predicted reduction has}

3.3. Globalization Strategies 39 (θ (xk), f (xk)) forbidden region acceptable region Armijo condition slope −ρ θ f

Monotone Merit Function

(θ (xk), f (xk)) (θ (xk−2), f (xk−2)) (θ (xk−1), f (xk−1)) forbidden region acceptable region θ f

Non-Monotone Merit Function

Figure 3.1:Monotone merit function (left) and non-monotone merit function (right). The non-monotonicity level

on the right is l = 2.

Non-Monotone Merit Functions

Although Theorem 3.11 proofs global convergence for an optimization algorithm, the intro- duction of the merit function – the main extension done in Algorithm D – requires a new study of local convergence, since Theorem 3.6 is based on the full step (α_k= 1). One could think that the same properties would hold, but this is actually not true. There exist examples (cf., Powell [165, Section 3]) that show search directions ∆xk _{yielding local q-quadratic conver-}

gence but increasing both, the objective function and the constraint violation, and, thus, would be rejected by the merit function. This is known as the Maratos effect [139]. But also in the unconstrained case, the step size can be reduced unnecessarily, for example when the step direction tries to follow a curvy valley. Possibilities to avoid this are the modification of the step ∆xk, in particular second-order-correction steps (cf., Conn et al. [45, Section 15.3.2.3] or Section 3.6.2), or the relaxation of the merit function acceptance criterion (3.28) to allow a non-monotone decrease of it. Examples include Chamberlain et al. [39], Panier and Tits [157] and Toint [182], which basically exchange (3.28) for

Ψ xk+ αk∆xk; τ − max i=0,...,lm Ψ x(k−i)+; τ ≤ σα_k∇_x_{Ψ x}k; τ⊤_∆xk (3.31)

and force a decrease with respect to the largest value of the former l_m _{∈ N merit function} values, see Figure 3.1 (right). While non-monotone merit function techniques usually compli- cate the global convergence theory, overall efficiency gains can be reported (cf., Grippo et al. [112]).

In document A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm for Nonlinear Programming (Page 56-59)