Unconstrained Nonlinear Optimization - Methods for Nonlinear Programming

2.5 Methods for Nonlinear Programming

2.5.1 Unconstrained Nonlinear Optimization

All algorithms for unconstrained nonlinear optimization problems follow the idea of generating a sequence of iterates {xk}∞

k=0 such that for some iterate we can

either verify optimality (up to a certain accuracy) or no progress can be made. The design of these algorithms differs in the way of deciding how to move from one iterate xk to the next xk+1. There are mainly two categories of approaches of choosing the next iterate: line search methods and trust region methods.

Line search methods are moving from the current iterate xk _{to a new one x}k+1

with a better objective function value along some descent direction sk_{. To de-}

cide how far the step size along the descent direction should be they solve an one-dimensional optimization problem: minαk>0f (x

k_{+ α}

kxk). This is often done

approximately, since solving this subproblem can already be expensive. If we solve the problem to optimality we call it an exact line search, otherwise an inexact line search.

20 CHAPTER 2. NONLINEAR PROGRAMMING Trust region methods are different, as in each iteration k they first approximate f (xk+ s) by some (simpler) auxiliary function ¯fk(s) and then minimize it locally

in a trust region ∆k in which both are similar:

min{ ¯fk(s) | ||s|| ≤ ∆k}. (2.11)

Typically trust regions are chosen as either balls ∆k = {s ∈ Rn| ||s|| ≤ ∆}, where

∆ > 0, or boxes/ellipsoids. The auxiliary function is usually chosen as a quadratic function of the form ¯fk(s) = f (xk) + ∇f (xk)>s +1₂s>Bks, where Bk is either the

Hessian or an approximation of it. Depending on the ratio of actual and expected reduction of the objective function ρk = f (x

k_{)−f (x}k_+sk₎

fk(0)− ¯fk(sk) , either the solution s

k _is

accepted and the new iterate is set to xk+1 = xk+skor the trust region is modified and the problem is solved again. It is common to reject the step and decrease the trust region if ρk < 0 since it implies the objective function value to become worse

in the current iteration. On the other hand if ρk ≈ 1, we can increase the trust

region, and if ρk 1, the current trust region is kept. The fundamental question

is how to solve the trust region problems (2.11). Well-known algorithms that solve the problem approximately are for example the dogleg method for positive definite Bk, and the two-dimensional subspace minimization for indefinite Bk, a method

by Steinhaug, based on the CG Method, if Bk is chosen as the exact Hessian of f .

In this section we are not going into further details about trust region methods since they are not needed in our approaches. A survey on trust region methods is given by Dennis and Schnabel [67].

Line Search Methods

We start with the definition of a descent direction, i.e., directions in which we can improve our objective function. In fact, moving along any descent direction of f , we can guarantee an improvement of the objective function value.

Definition 2.11 (Descent Direction). Let f : Rn _{→ R be differentiable in x ∈ R}n_.

A vector s ∈ Rn _{is called a descent direction of f in x if we have}

∇f (x)>s < 0.

Lemma 2.22 ([9]). Let f : Rn→ R be differentiable in x ∈ Rn

and s ∈ Rn be a descent direction in x. Then there exists some α > 0, such that f (x + αs) < f (x). This leads to the following iterative procedure, sketched in Algorithm 1. The algorithm starts with an initial guess x0 _{and determines a feasible descent direction}

s0 _{as well as a feasible step size α}

0to compute the new iterate x1 := x0+α0s0 with

a reduced objective function value. This is repeated until a stopping criterion is satisfied. In practice the stopping criterion is ||∇f (xk_{)|| < ε, where 0 < ε 1.}

2.5. METHODS FOR NONLINEAR PROGRAMMING 21 Algorithm 1: Line Search

input : differentiable function f : Rn_{→ R}

output: stationary point x? _{of f}

Choose starting point x0 _{∈ R. Set k = 0.}

while ∇f (xk_{) 6= 0 do}

Compute descent direction sk_.

Compute step size αk> 0, such that

f (xk+ αksk) < f (xk).

Set xk+1 _{= x}k_{+ α}

ksk, k = k + 1.

The obvious questions are how to determine a suitable descent direction s and an appropriate step size α. Concerning the descent direction it is common to choose

s = −B−1∇f

for some symmetric and regular matrix B. Two specific choices for B are for example B = I or B = ∇2_{f (x), if the Hessian is positive definite. The resulting}

descent directions are called Anti-Gradient and Newton-Direction, respectively. Assuming we are interested in an inexact line search, some common conditions on the step sizes then are:

• Wolfe conditions:

f (xk+ αksk) ≤ f (xk) + c1αk∇f (xk)>sk (2.12)

∇f (xk+ αksk)>sk ≥ c2∇f (xk)>sk, (2.13)

where 0 < c1 < c2 < 1.

Condition (2.12) is also known as the Armijo condition and guarantees a sufficient decrease of the objective function. The second condition (2.13) is the curvature condition that ensures the line search to make reasonable progress since a sufficiently small step size would always satisfy the Armijo condition. Furthermore, it may be the case that some step size that satisfies the Wolfe conditions is not close to a minimizer of φ(α) = f (xk_{+ αs}k_{). To}

ensure this, we can require the derivative φ0(α) to be non-negative such that points far away from the minimizer are excluded by the following additional requirement (2.14).

• Strong Wolfe conditions:

f (xk+ αksk) ≤ f (xk) + c1αk∇f (xk)>sk

22 CHAPTER 2. NONLINEAR PROGRAMMING where 0 < c1 < c2 < 1. A third pair of conditions are the

• Goldstein conditions:

f (xk) + (1 − c)αk∇f (xk)>sk≤ f (xk+ αksk) (2.15)

≤ f (xk_{) + cα}

k∇f (xk)>sk, (2.16)

where 0 < c < 1₂.

Again, the first inequality prevents to have a step size too small and the second inequality ensures the sufficient decrease.

Often the sufficient decrease condition (2.12) is combined with a backtracking such that we do not need to consider (2.13). It starts with an initial step size and reduces it step by step until the sufficient decrease condition holds, see Al- gorithm 2.

Algorithm 2: Backtracking Line Search

input : iterate xk_{, search direction s}k_{, step size α} k

output: feasible step size αk

Choose αk _{> 0, ρ, c ∈ (0, 1).}

while f (xk+ αksk) > f (xk) + cαk∇f (xk)>sk do

Set αk _{= ρα}k_.

If the descent directions and step sizes are chosen appropriately the line search algorithm converges to a stationary point. We therefore define feasible search directions and feasible step sizes.

Definition 2.12 (Feasible Search Direction, Feasible Step Size). Let {sk_}

k, {xk}k

and {αk}k be the the sequences of search directions, iterates and step sizes gen-

erated by Algorithm 1. The sequence {sk}k is called feasible, if for all {xk}k we

have the implication: lim

k→∞

∇f (xk₎>_sk

||sk_|| = 0 ⇒ lim_k→∞∇f (x

k_{) = 0.} _(2.17)

The sequence {αk}k is called feasible, if for all {xk}k and {sk}k we have the

implication:

f (xk+ αksk) < f (xk) for all k ∈ N and

lim k→∞(f (x k_{) − f (x}k_{+ α} ksk)) = 0 ⇒ lim k→∞ ∇f (xk₎>_sk ||sk_|| = 0.

2.5. METHODS FOR NONLINEAR PROGRAMMING 23 The condition (2.17) ensures that the descent direction does not tend to become orthogonal to the gradient. Both, the Anti-Gradient as well as the Newton- Direction are feasible search directions. On the other hand, it can be shown that under standard assumptions the Wolfe conditions as well as the Goldstein conditions are choosing step sizes that are both feasible.

Under additional requirements, we can assure the convergence of the line search method to a minimizer.

Theorem 2.23 ([9]). Let f : Rn → R be continuously differentiable, x0 _{be the}

starting point, the level set Nf(x0) be convex and compact and f strictly convex

on Nf(x0). Furthermore, let the sequence of descent directions {sk}k and step

sizes {αk}k be feasible. Then there exists a unique global minimizer x? of f and

Algorithm 1 produces a sequence {xk_}

k that converges to x?.

The assumptions made are quite restrictive since we require the objective function in a neighborhood of x? _{to be strictly convex and additionally that the starting}

point x0 _{has to be in that neighbourhood.}

One of the most important representatives of line search algorithms is the Newton Method that uses in its most general form the Newton Direction as a descent direction and step size 1. Assuming the starting point x0 _{is sufficiently close to}

a minimizer x?_{, it can be shown that if f is twice continuously differentiable}

and the Hessian of f is Lipschitz continuous in a neighbourhood of x? at which ∇f (x?_{) = 0 and ∇}2_{f (x}?_{) is positive definite, then {x}k_}

k converges to x? and

{||∇f (xk_)||}

k converges to 0 quadratically. Several modifications can be made to

get a practical global convergent version of the Newton Method. For example in the Modified Newton Method, whenever necessary, the Hessian is modified to be positive definite throughout the whole algorithm to guarantee descent directions. To save computational expenses, the computation of the Newton Direction can be done approximately, yielding an Inexact Newton Method. In particular, Quasi- Newton type methods aim at reducing the computational effort by approximating the Hessian with some matrix that is easier to invert.

In document Continuous optimization methods for onvex mixed-integer nonlinear programming (Page 31-35)