• No results found

1. Unconstrained Optimization.

Given f : Rn R.

Minimize f(x) over x Rn.

f has a local minimum at a point ˉx if f(ˉx) f(x) for all x near ˉx, i.e.

∃ ε > 0 s.t. f(ˉx) f(x) x : kx k < ε . f has a global minimum at ˉx if

2. Optimality Conditions.

• First order necessary conditions:

Suppose that f has a local minimum at ˉx and that f is continuously differentiable in an open neighbourhood of ˉx. Then f(ˉx) = 0. (ˉx is called a stationary point.)

• Second order sufficient Conditions:

Suppose that f is twice continuously differentiable in an open neighbourhood of ˉ

x and that f(ˉx) = 0 and 2f(ˉx) is positive definite. Then ˉx is a strict local minimizer of f.

Example: Show that f = (2x21 x2)(x21 2x2) has a minimum at (0, 0) along any

straight line passing through the origin, but f has no minimum at (0,0).

Exercise: Find the minimum solution of

f(x1, x2) = 2x21 + x1x2 + x22 − x1 − 3x2. (4)

Sufficient Condition.

Taylor gives for any d Rn:

f(ˉx + d) = f(ˉx) + f(ˉx)T d + 21 dT 2f(ˉx + λ d)d λ (0, 1). If ˉx is not strict local minimizer, then

∃ {xk} ⊂ Rn \ {} : xk xˉ s.t. f(xk) ≤ f(ˉx). Define dk := xk−xˉ

kxkk. Then kdkk = 1 and there exists a subsequence {dkj} such that

dkj d? as j → ∞ and kd?k = 1. W.l.o.g. we assume dk d? as k → ∞. f(ˉx) f(xk) = f(ˉx + kxk − xˉk dk)

= f(ˉx) + kxk k ∇f(ˉx)T dk + 12 kxk k2dTk 2f(ˉx + λk kxk k dk)dk = f(ˉx) + 12 kxk − xˉk2dTk ∇2f(ˉx + λk kxk − xˉkdk) dk .

Hence dTk 2f(ˉx + λk kxk kdk)dk 0, and on letting k → ∞ dT? 2f(ˉx)d? 0.

Example 6.2. Show that f = (2x21x2)(x21 2x2) has a minimum at (0,0) along any straight line passing through the origin, but f has no minimum at (0, 0).

Answer.

Straight line through (0, 0): x2 = α x1, α R fixed.

g(r) := f(r, α r) = (2r2 α r) (r2 2 α r)

g0(r) = 8r3 15 α r2 + 4 α2 r, g00(r) = 24r2 30 α r + 4 α2

⇒ g0(0) = 0 and g00(0) = 4α2 > 0 .

Hence r = 0 is a minimizer for g ⇐⇒ (0,0) is a minimizer for f along any straight line.

Now let (x1k, xk2) = (1k, k12) (0, 0) as k → ∞. Then f(xk1, xk2) = 1

k2

1

k2 < 0 = f(0, 0) ∀ k .

Hence (0,0) is not a minimizer for f.

[Note: f(0,0) = 0, but 2f(0,0) = 0 0 0 4

!

3. Convex Optimization.

Exercise. When f is convex, any local minimizer ˉx is a global minimizer of f. If in addition f is differentiable, then any stationary point ˉx is a global minimizer of f. (Hint. Use a contradiction argument.)

Exercise 6.3.

When f is convex, any local minimizer ˉx is a global minimizer of f.

Proof.

Suppose ˉx is a local minimizer, but not a global minimizer. Then

∃ xe s.t. f(x)e < f(ˉx). Since f is convex, we have that

f(λxe + (1 λ) ˉx) λ f(x) + (1e λ)f(ˉx)

< λ f(ˉx) + (1 λ) f(ˉx) = f(ˉx) λ (0, 1]. Let xλ := λxe + (1 λ) ˉx. Then

xλ xˉ and f(xλ) < f(ˉx) as λ → 0. This is a contradiction to ˉx being a local minimizer.

4. Line Search.

The basic procedure to solve numerically an unconstrained problem (minimize f(x) over x Rn) is as follows.

(i) Choose an initial point x0 Rn and an initial search direction d0 Rn and set k = 0.

(ii) Choose a step size αk and define a new point xk+1 = xk + αk dk. Check if the stopping criterion is satisfied (k∇f(xk+1)k < ε?). If yes, xk+1 is the optimal solution, stop. If no, go to (iii).

(iii) Choose a new search direction dk+1 (descent direction) and set k = k + 1. Go to (ii).

The essential and most difficult part in any search algorithm is to choose a descent direction dk and a step size αk with good convergence and stability properties.

5. Steepest Descent Method.

f is differentiable.

Choose dk = gk, where gk = f(xk), and choose αk s.t. f(xk + αk dk) = min

αR f(x

k + α dk).

Note that the successive descent directions are orthogonal to each other, i.e. (gk)T gk+1 = 0, and the convergence for some functions may be very slow, called zigzagging.

Exercise.

Use the steepest descent (SD) method to solve (4) with the initial point x0 = (1, 1). (Answer. First three iterations give x1 = (0,1), x2 = (0, 32), and x3 = (18, 32).)

Steepest Descent.

Taylor gives:

f(xk + α dk) = f(xk) + αf(xk)T dk + O(α2). As

∇f(xk)T dk = k∇f(xk)k kdkk cosθk,

with θk the angle between dk and f(xk), we see that dk is a descent direction if cosθk < 0. The descent is steepest when θk = π ⇐⇒ cosθk = 1.

Zigzagging.

αk is minimizer of φ(α) := f(xk + α dk) with dk = gk. Hence

0 = φ0(αk) = ∇f(xk + αk dk)T dk = ∇f(xk+1)T (−gk) = −(gk+1)T gk . Hence dk+1 dk, which leads to zigzagging.

Exercise 6.5.

Use the SD method to solve (4) with the initial point x0 = (1, 1). [min: 17 (1,11).]

Answer. ∇f = (4x1 + x2 1, 2x2 + x1 3). Iteration 0: d0 = −∇f(x0) = (4,0) 6= (0, 0). φ(α) = f(x0 + α d0) = f(1 4α,1) = 2 (1 4α)2 2 minimum point at α0 = 14 ⇒ x1 = x0 + α0 d0 = (0,1), d1 = −∇f(x1) = (0, 1) = (0, 1) 6= (0,0). Iteration 1: x2 = (0, 32), d2 = (12, 0). Iteration 2: x3 = (18, 32), d3 = (0, 18).

6. Newton Method.

f is twice differentiable.

Choose dk = [Hk]−1gk, where Hk = 2f(xk). Set xk+1 = xk + dk.

If Hk is positive definite then dk is a descent direction.

The main drawback of the Newton method is that it requires the computation of

∇2f(xk) and its inverse, which can be difficult and time-consuming.

Exercise.

Use the Newton method to solve (4) with x0 = (1, 1). (Answer. First iteration gives x1 = 17 (1,11).)

Newton Method. Taylor gives f(xk + d) f(xk) + dT f(xk) + 12 dT 2f(xk) d =: m(d) min d m(d) ⇒ ∇m(d) = 0 ⇒ ∇f(xk) + 2f(xk)d = 0 . Hence choose dk = [2f(xk)]−1 f(xk) = [Hk]−1gk. If Hk is positive definite, then so is (Hk)−1, and we get

(dk)T gk = (gk)T (Hk)−1 gk ≤ −σk kgkk2 < 0 for some σk > 0.

Hence dk is a descent direction.

[Aside: The Newton method for minx f(x) is equivalent to the Newton method for finding a root of the system of nonlinear equations f(x) = 0.]

Exercise 6.6.

Use the Newton method to minimize

f(x1, x2) = 2x21 + x1x2 + x22 x1 3x2 with x0 = (1,1)T. Answer. ∇f = 4 x1 + x2 − 1 2 x2 + x1 3 ! , H := 2f = 4 1 1 2 ! . H−1 = 1 detH 2 1 −1 4 ! = 17 2 −1 −1 4 ! . Iteration 0: x0 = (1,1)T, f(x0) = (4, 0)T. x1 = x0 [H0]−1 f(x0) = 1 1 ! − 17 2 −1 −1 4 ! 4 0 ! = 17 −1 11 ! .

⇒ ∇f(x1) = (0, 0)T and H positive definite.

7. Choice of Stepsize.

In computing the step size αk we face a tradeoff. We would like to choose αk to give a substantial reduction of f, but at the same time we do not want to spend too much time making the choice. The ideal choice would be the global minimizer of the univariate function φ : R R defined by

φ(α) = f(xk + α dk), α > 0, but in general, it is too expensive to identify this value.

A common strategy is to perform an inexact line search to identify a step size that achieves adequate reductions in f with minimum cost.

α is normally chosen to satisfy the Wolfe conditions:

f(xk + αk dk) f(xk) + c1 αk (gk)Tdk (5)

∇f(xk + αk dk)Tdk c2 (gk)Tdk, (6) with 0 < c1 < c2 < 1. (5) is called the sufficient decrease condition, and (6) is the curvature condition.

Choice of Stepsize.

The simple condition

f(xk + αk dk) < f(xk) () is not appropriate, as it may not lead to a sufficient reduction.

Example: f(x) = (x 1)2 1. So minf(x) = 1, but we can choose xk satisfying () such that f(xk) = 1k 0.

Note that the sufficient decrease condition (5)

φ(α) = f(xk + α dk) `(α) := f(xk) + c1 α(gk)Tdk

yields acceptable regions for α. Here φ(α) < `(α) for small α > 0, as (gk)Tdk < 0 for descent directions.

The curvature condition (6) is equivalent to

φ0(α) c2φ0(0) [ > φ0(0) ]

8. Convergence of Line Search Methods.

An algorithm is said to be globally convergent if lim

k→∞kg k

k = 0.

It can be shown that if the step sizes satisfy the Wolfe conditions

• then the steepest descent method is globally convergent,

• so is the Newton method provided the Hessian matrices 2f(xk) have a bounded condition number and are positive definite.

Exercise. Show that the steepest descent method is globally convergent if the

following conditions hold

(a) αk satisfies the Wolfe conditions, (b) f(x) M x Rn,

[Hint: Show that

X

k=0

Exercise 6.8.

Assume that dk is a descent direction, i.e. (gk)T dk < 0, where gk := f(xk). Then if 1. αk satisfies the Wolfe conditions,

2. f(x) M x Rn,

3. f C1 and f is Lipschitz, i.e. k∇f(x) − ∇f(y)k ≤ Lkx yk ∀ x, y Rn, it holds that

X

k=0

cos2 θk kgkk2 < , where cosθk := k(ggkkk k)Tddkkk.

[Note: SD method is special case with cos2θk = 1. lim k→∞kg k k = 0.] Proof. Wolfe condition (6) (gk+1)T dk c2(gk)T dk ⇒ (gk+1 gk)T dk (c2 − 1) (gk)T dk . (†) .

The Lipschitz condition yields that

(gk+1 gk)T dk ≤ kgk+1 gkk kdkk = k∇f(xk+1) − ∇f(xk)k kdkk

≤ Lkxk+1 xkk kdkk = αk Lkdkk2. () Combining () and () gives αk c2 − 1

L (gk)T dk kdkk2 , and hence αk (gk)T dk c2 − 1 L [(gk)T dk]2 kdkk2

Together with Wolfe condition (5) we get

f(xk+1) f(xk) + c1 c2 − 1 L [(gk)T dk]2 kdkk2 = f(x k) − c cos2 θk kgkk2 ,

where c := c1 1−c2 L > 0. f(xk+1) f(xk) c cos2 θk kgkk2 f(x0) c k X j=0 cos2θj kgjk2 ⇒ k X j=0 cos2 θj kgjk2 1 c (f(x 0) − M) k ∞ X j=0 cos2θj kgjk2 < .

9. Popular Search Methods.

In practice the steepest descent method and the Newton method are rarely used due to the slow convergence rate and the difficulty in computing Hessian matrices, respectively.

The popular search methods are

• the conjugate gradient method (variation of SD method with superlinear conver- gence) and

• the quasi-Newton method (variation of Newton method without computation of Hessian matrices).

There are some efficient algorithms based on the trust-region approach. See Fletcher (2000) for details.

10. Constrained Optimization.

Minimize f(x) over x Rn subject to the equality constraints

hi(x) = 0, i = 1, . . . , l , and the inequality constraints

gj(x) ≤ 0, j = 1, . . . , m . Assume that all functions involved are differentiable.

11. Linear Programming.

The problem is to minimize

z = c1 x1 + ∙ ∙ ∙ + cnxn subject to

ai1 x1 + ∙ ∙ ∙ + ainxn bi, i = 1, . . . , m , and

x1, . . . , xn 0.

LPs can be easily and efficiently solved with the simplex algorithm or the interior point method.

MS-Excel has a good in-built LP solver capable of solving problems up to 200 vari- ables. MATLAB with optimization toolbox also provides a good LP solver.

12. Graphic Method.

If an LP problem has only two decision variables (x1, x2), then it can be solved by

the graphic method as follows:

• First draw the feasible region from the given constraints and a contour line of the objective function,

• then, on establishing the increasing direction perpendicular to the contour line, find the optimal point on the boundary of the feasible region,

• then find two linear equations which define that point,

• and finally solve the two equations to obtain the optimal point.

Exercise. Use the graphic method to solve the LP: minimize z = 3 x1 2 x2

subject to x1 + x2 80, 2x1 + x2 100, x1 40, and x1, x2 0. (Answer. x1 = 20, x2 = 60.)

13. Quadratic Programming.

Minimize

xTQ x + cTx subject to

A x b and x 0,

where Q is an n × n symmetric positive definite matrix, A is an n × m matrix, x, c Rn, b Rm.

To solve a QP problem, one

• first derives a set of equations from the Kuhn–Tucker conditions, and

• then applies the Wolfe algorithm or the Lemke algorithm to find the optimal solution.

The MS-Excel solver is capable of solving reasonably sized QP problems, similarly for MATLAB.

14. Kuhn–Tucker Conditions.

min f(x) over x Rn s.t. hi(x) = 0, i = 1, . . . , l; gj(x) ≤ 0, j = 1, . . . , m. Assume that ˉx is an optimal solution.

Under some regularity conditions, called the constraint qualifications, there exist two vectors ˉu = (ˉu1, . . . ,uˉl) and ˉv = (ˉv1, . . . ,vˉm), called the Lagrange multipliers, such that the following set of conditions is satisfied:

Lxk(ˉx,u,ˉ v) = 0,ˉ k = 1, . . . , n hi(ˉx) = 0, i = 1, . . . , l gj(ˉx) 0, j = 1, . . . , m ˉ vj gj(ˉx) = 0, vˉj ≥ 0, j = 1, . . . , m where L(x, u, v) = f(x) + l X i=1 ui hi(x) + m X j=1 vj gj(x) is called the Lagrange function or Lagrangian.

Furthermore, if f : Rn R and hi, gj : Rn R are convex, then ˉx is an optimal solution if and only if (ˉx,u,ˉ v) satisfies the Kuhn–Tucker conditions.ˉ

This holds in particular, when f is convex and hi, gj are linear.

Example.

Find the minimum solution to the function x2 x1 subject to x21 + x22 1.

Exercise.

Find the minimum solution to the function x21+x222x1−4x2 subject to x1+2x2 ≤ 2

and x2 0.

Interpretation of Kuhn–Tucker conditions

Assume that no equality constraints are present.

If ˉx is an interior point, i.e. no constraints are active, then we recover the usual optimality condition: f(ˉx) = 0.

Now assume that ˉx lies on the boundary of the feasible set and let gjk be the active constraints at ˉx. Then a necessary condition for optimality is that we cannot find a descent direction for f at ˉx that is also a feasible direction. Such a vector cannot exist, if

−∇f(ˉx) = X k

ˉ

vjk gjk(ˉx) with ˉvjk 0. ()

This is because, if d Rn is a descent direction, then f(ˉx)T d < 0 and Pk vˉjk ∇gjk(ˉx)T d > 0.

So there must exist a jk, such that gjk(ˉx)T d > 0. But that means that d is an ascent direction for gjk, and as gjk is active at ˉx, it is not a feasible direction.

Application of Kuhn–Tucker: LP Duality

Let b Rm, c Rn and A Rn×m.

min cT x s.t. A x b, x 0. (P) Equivalent to min cT x s.t. b A x 0, x 0.

Lagrangian: L = cT x + vT(b A x) + yT (x).

Hence, ˉx is the solution, if there exist ˉv and ˉy such that

∇L = c AT vˉ yˉ = 0 yˉ = c AT v,ˉ KT conditions: vˉT (b Ax) = 0,ˉ yˉT (x) = 0,ˉ

ˉ

Eliminate ˉy to find ˉv, ˉx:

Axˉ b, xˉ 0 feasible region: primal AT vˉ c, vˉ 0 feasible region: dual

ˉ vT (b Ax) = 0ˉ ˉ xT (c AT v) = 0ˉ ) ⇒ xˉT c = ˉxT AT vˉ = ˉvT b Hence ˉv Rm solves the dual:

Here we have used that cT xˉ = min x0, A xb c T x ≥ min x0 maxv0 c T x + vT(b − A x) = max v0 minx0 c T x + vT(b − A x) = max v0 minx0 v T b + xT(c − AT v) ≥ max v0, AT vc vT b ≥ vˉT b = cT xˉ

Example 6.14.

Find the minimum solution to the function x2 x1 subject to x21 + x22 1.

Answer.

L = x2 x1 + v (x21 + x22 1), so the KT conditions become Lx1 = −1 + 2v x1 = 0 (1)

Lx2 = 1 + 2v x2 = 0 (2) x21 + x22 1 (3) v (x21 + x22 1) = 0, v 0 (4) (1) v > 0 and hence x1 = 21v, x2 = 21v.

Plugging this into (4) yields 42v2 = 1 and hence ˉ v = 1 2 ⇒ xˉ1 = 1 √ 2, xˉ2 = − 1 √ 2; with the optimal value being ˉz = √2.

Example.

min x1 s.t. x2 x31 0, x1 1, x2 0.

Since x31 x2 0 we have x1 0 and hence ˉx = (0, 0) is the unique minimizer. Lagrangian: L = x1 + v1 (x2 x31) + v2 (x1 1) + v3(x2).

KT conditions for a feasible point x:

∇L = 1 3v1x21 + v2 v1 v3 = 0 (1) v1(x2 x31) = 0, v2(x1 1) = 0, v3(x2) = 0 (2) v1, v2, v3 0 (3) Check KT conditions at ˉx = (0,0): (1) v2 = 1 < 0 impossible!

KT condition is not satisfied, since the constraint qualifications do not hold.

Here g1 = x2x13 and g3 = x2 are active at (0, 0), and g1 = 01, g3 = 01. Hence

Exercise 6.14

Find the minimum solution to the function x21+x222x14x2 subject to x1+ 2x2 2 and x2 0. Answer. Lagrangian: L = x21 + x22 2x1 4x2 + v1 (x1 + 2 x2 − 2) + v2 (−x2) KT conditions: ∇L = 2x1 2 + v1 2x2 4 + 2 v1 v2 = 0 (1) v1(x1 + 2x2 − 2) = 0, v2 x2 = 0, v1, v2 ≥ 0 (2) x1 + 2x2 2, x2 0 (3)

If x1 + 2x2 2 < 0 then v1 = 0 x1 = 1 , x2 = 2 + 12 v2 2. Hence from (2), v2 = 0, and so x1 = 1, x2 = 2. But that contradicts (3), and so it must hold that x1 + 2 x2 2 = 0.

Related documents