2.5 Methods for Nonlinear Programming
2.5.1 Unconstrained Nonlinear Optimization
All algorithms for unconstrained nonlinear optimization problems follow the idea of generating a sequence of iterates {xk}∞
k=0 such that for some iterate we can
either verify optimality (up to a certain accuracy) or no progress can be made. The design of these algorithms differs in the way of deciding how to move from one iterate xk to the next xk+1. There are mainly two categories of approaches of choosing the next iterate: line search methods and trust region methods.
Line search methods are moving from the current iterate xk to a new one xk+1
with a better objective function value along some descent direction sk. To de-
cide how far the step size along the descent direction should be they solve an one-dimensional optimization problem: minαk>0f (x
k+ α
kxk). This is often done
approximately, since solving this subproblem can already be expensive. If we solve the problem to optimality we call it an exact line search, otherwise an inexact line search.
20 CHAPTER 2. NONLINEAR PROGRAMMING Trust region methods are different, as in each iteration k they first approximate f (xk+ s) by some (simpler) auxiliary function ¯fk(s) and then minimize it locally
in a trust region ∆k in which both are similar:
min{ ¯fk(s) | ||s|| ≤ ∆k}. (2.11)
Typically trust regions are chosen as either balls ∆k = {s ∈ Rn| ||s|| ≤ ∆}, where
∆ > 0, or boxes/ellipsoids. The auxiliary function is usually chosen as a quadratic function of the form ¯fk(s) = f (xk) + ∇f (xk)>s +12s>Bks, where Bk is either the
Hessian or an approximation of it. Depending on the ratio of actual and expected reduction of the objective function ρk = f (x
k)−f (xk+sk)
¯
fk(0)− ¯fk(sk) , either the solution s
k is
accepted and the new iterate is set to xk+1 = xk+skor the trust region is modified and the problem is solved again. It is common to reject the step and decrease the trust region if ρk < 0 since it implies the objective function value to become worse
in the current iteration. On the other hand if ρk ≈ 1, we can increase the trust
region, and if ρk 1, the current trust region is kept. The fundamental question
is how to solve the trust region problems (2.11). Well-known algorithms that solve the problem approximately are for example the dogleg method for positive definite Bk, and the two-dimensional subspace minimization for indefinite Bk, a method
by Steinhaug, based on the CG Method, if Bk is chosen as the exact Hessian of f .
In this section we are not going into further details about trust region methods since they are not needed in our approaches. A survey on trust region methods is given by Dennis and Schnabel [67].
Line Search Methods
We start with the definition of a descent direction, i.e., directions in which we can improve our objective function. In fact, moving along any descent direction of f , we can guarantee an improvement of the objective function value.
Definition 2.11 (Descent Direction). Let f : Rn → R be differentiable in x ∈ Rn.
A vector s ∈ Rn is called a descent direction of f in x if we have
∇f (x)>s < 0.
Lemma 2.22 ([9]). Let f : Rn→ R be differentiable in x ∈ Rn
and s ∈ Rn be a descent direction in x. Then there exists some α > 0, such that f (x + αs) < f (x). This leads to the following iterative procedure, sketched in Algorithm 1. The al- gorithm starts with an initial guess x0 and determines a feasible descent direction
s0 as well as a feasible step size α
0to compute the new iterate x1 := x0+α0s0 with
a reduced objective function value. This is repeated until a stopping criterion is satisfied. In practice the stopping criterion is ||∇f (xk)|| < ε, where 0 < ε 1.
2.5. METHODS FOR NONLINEAR PROGRAMMING 21 Algorithm 1: Line Search
input : differentiable function f : Rn→ R
output: stationary point x? of f
Choose starting point x0 ∈ R. Set k = 0.
while ∇f (xk) 6= 0 do
Compute descent direction sk.
Compute step size αk> 0, such that
f (xk+ αksk) < f (xk).
Set xk+1 = xk+ α
ksk, k = k + 1.
The obvious questions are how to determine a suitable descent direction s and an appropriate step size α. Concerning the descent direction it is common to choose
s = −B−1∇f
for some symmetric and regular matrix B. Two specific choices for B are for example B = I or B = ∇2f (x), if the Hessian is positive definite. The resulting
descent directions are called Anti-Gradient and Newton-Direction, respectively. Assuming we are interested in an inexact line search, some common conditions on the step sizes then are:
• Wolfe conditions:
f (xk+ αksk) ≤ f (xk) + c1αk∇f (xk)>sk (2.12)
∇f (xk+ αksk)>sk ≥ c2∇f (xk)>sk, (2.13)
where 0 < c1 < c2 < 1.
Condition (2.12) is also known as the Armijo condition and guarantees a sufficient decrease of the objective function. The second condition (2.13) is the curvature condition that ensures the line search to make reasonable progress since a sufficiently small step size would always satisfy the Armijo condition. Furthermore, it may be the case that some step size that satisfies the Wolfe conditions is not close to a minimizer of φ(α) = f (xk+ αsk). To
ensure this, we can require the derivative φ0(α) to be non-negative such that points far away from the minimizer are excluded by the following additional requirement (2.14).
• Strong Wolfe conditions:
f (xk+ αksk) ≤ f (xk) + c1αk∇f (xk)>sk
22 CHAPTER 2. NONLINEAR PROGRAMMING where 0 < c1 < c2 < 1. A third pair of conditions are the
• Goldstein conditions:
f (xk) + (1 − c)αk∇f (xk)>sk≤ f (xk+ αksk) (2.15)
≤ f (xk) + cα
k∇f (xk)>sk, (2.16)
where 0 < c < 12.
Again, the first inequality prevents to have a step size too small and the second inequality ensures the sufficient decrease.
Often the sufficient decrease condition (2.12) is combined with a backtracking such that we do not need to consider (2.13). It starts with an initial step size and reduces it step by step until the sufficient decrease condition holds, see Al- gorithm 2.
Algorithm 2: Backtracking Line Search
input : iterate xk, search direction sk, step size α k
output: feasible step size αk
Choose αk > 0, ρ, c ∈ (0, 1).
while f (xk+ αksk) > f (xk) + cαk∇f (xk)>sk do
Set αk = ραk.
If the descent directions and step sizes are chosen appropriately the line search algorithm converges to a stationary point. We therefore define feasible search directions and feasible step sizes.
Definition 2.12 (Feasible Search Direction, Feasible Step Size). Let {sk}
k, {xk}k
and {αk}k be the the sequences of search directions, iterates and step sizes gen-
erated by Algorithm 1. The sequence {sk}k is called feasible, if for all {xk}k we
have the implication: lim
k→∞
∇f (xk)>sk
||sk|| = 0 ⇒ limk→∞∇f (x
k) = 0. (2.17)
The sequence {αk}k is called feasible, if for all {xk}k and {sk}k we have the
implication:
f (xk+ αksk) < f (xk) for all k ∈ N and
lim k→∞(f (x k) − f (xk+ α ksk)) = 0 ⇒ lim k→∞ ∇f (xk)>sk ||sk|| = 0.
2.5. METHODS FOR NONLINEAR PROGRAMMING 23 The condition (2.17) ensures that the descent direction does not tend to be- come orthogonal to the gradient. Both, the Anti-Gradient as well as the Newton- Direction are feasible search directions. On the other hand, it can be shown that under standard assumptions the Wolfe conditions as well as the Goldstein condi- tions are choosing step sizes that are both feasible.
Under additional requirements, we can assure the convergence of the line search method to a minimizer.
Theorem 2.23 ([9]). Let f : Rn → R be continuously differentiable, x0 be the
starting point, the level set Nf(x0) be convex and compact and f strictly convex
on Nf(x0). Furthermore, let the sequence of descent directions {sk}k and step
sizes {αk}k be feasible. Then there exists a unique global minimizer x? of f and
Algorithm 1 produces a sequence {xk}
k that converges to x?.
The assumptions made are quite restrictive since we require the objective function in a neighborhood of x? to be strictly convex and additionally that the starting
point x0 has to be in that neighbourhood.
One of the most important representatives of line search algorithms is the Newton Method that uses in its most general form the Newton Direction as a descent direction and step size 1. Assuming the starting point x0 is sufficiently close to
a minimizer x?, it can be shown that if f is twice continuously differentiable
and the Hessian of f is Lipschitz continuous in a neighbourhood of x? at which ∇f (x?) = 0 and ∇2f (x?) is positive definite, then {xk}
k converges to x? and
{||∇f (xk)||}
k converges to 0 quadratically. Several modifications can be made to
get a practical global convergent version of the Newton Method. For example in the Modified Newton Method, whenever necessary, the Hessian is modified to be positive definite throughout the whole algorithm to guarantee descent directions. To save computational expenses, the computation of the Newton Direction can be done approximately, yielding an Inexact Newton Method. In particular, Quasi- Newton type methods aim at reducing the computational effort by approximating the Hessian with some matrix that is easier to invert.