Optimization - Numerical Methods for Finance - Free Computer, Programming, Mathematics, Technic

1. Unconstrained Optimization.

Given f : Rn _→ R.

Minimize f(x) over x _∈ Rn.

f has a local minimum at a point ˉx if f(ˉx) _≤ f(x) for all x near ˉx, i.e.

∃ ε > 0 s.t. f(ˉx) _≤ f(x) _∀ x : _kx ₋ xˉ_k < ε . f has a global minimum at ˉx if

2. Optimality Conditions.

• First order necessary conditions:

Suppose that f has a local minimum at ˉx and that f is continuously differentiable in an open neighbourhood of ˉx. Then _∇f(ˉx) = 0. (ˉx is called a stationary point.)

• Second order sufficient Conditions:

Suppose that f is twice continuously differentiable in an open neighbourhood of ˉ

x and that _∇f(ˉx) = 0 and _∇2f(ˉx) is positive definite. Then ˉx is a strict local minimizer of f.

Example: Show that f = (2x2₁ ₋ x₂)(x2₁ ₋ 2x2) has a minimum at (0, 0) along any

straight line passing through the origin, but f has no minimum at (0,0).

Exercise: Find the minimum solution of

f(x1, x2) = 2x2₁ + x1x2 + x2₂ − x1 − 3x2. (4)

Sufficient Condition.

Taylor gives for any d _∈ Rn:

f(ˉx + d) = f(ˉx) + _∇f(ˉx)T d + ₂1 dT _∇2f(ˉx + λ d)d λ _∈ (0, 1). If ˉx is not strict local minimizer, then

∃ {x_k_{} ⊂} Rn _{\ {}xˉ_} : x_k _→ xˉ s.t. f(xk) ≤ f(ˉx). Define d_k := xk−xˉ

kx_k₋xˉ_k. Then kdkk = 1 and there exists a subsequence {dkj} such that

d_k_j _→ d_? as j _{→ ∞} and _kd_?_k = 1. W.l.o.g. we assume d_k _→ d_? as k _{→ ∞}. f(ˉx) _≥ f(xk) = f(ˉx + kxk − xˉk dk)

= f(ˉx) + _kx_k ₋ xˉ_{k ∇}f(ˉx)T d_k + 1₂ _kx_k ₋ xˉ_k2dT_k _∇2f(ˉx + λ_k _kx_k ₋ xˉ_k d_k)d_k = f(ˉx) + 1₂ _kxk − xˉk2dT_k ∇2f(ˉx + λk kxk − xˉkdk) dk .

Hence dT_k _∇2f(ˉx + λ_k _kx_k ₋ xˉ_kd_k)d_k _≤ 0, and on letting k _{→ ∞} dT_? _∇2f(ˉx)d_? _≤ 0.

Example 6.2. Show that f = (2x2₁₋x₂)(x2₁₋ 2x₂) has a minimum at (0,0) along any straight line passing through the origin, but f has no minimum at (0, 0).

Answer.

Straight line through (0, 0): x₂ = α x₁, α _∈ R fixed.

g(r) := f(r, α r) = (2r2 ₋ α r) (r2 ₋ 2 α r)

g0(r) = 8r3 ₋ 15 α r2 + 4 α2 r, g00(r) = 24r2 ₋ 30 α r + 4 α2

⇒ g0(0) = 0 and g00(0) = 4α2 > 0 .

Hence r = 0 is a minimizer for g _⇐⇒ (0,0) is a minimizer for f along any straight line.

Now let (x₁k, xk₂) = (1_k, _k1₂) _→ (0, 0) as k _{→ ∞}. Then f(xk₁, xk₂) = ₋ 1

k2 < 0 = f(0, 0) ∀ k .

Hence (0,0) is not a minimizer for f.

[Note: _∇f(0,0) = 0, but _∇2f(0,0) = 0 0 0 4

3. Convex Optimization.

Exercise. When f is convex, any local minimizer ˉx is a global minimizer of f. If in addition f is differentiable, then any stationary point ˉx is a global minimizer of f. (Hint. Use a contradiction argument.)

Exercise 6.3.

When f is convex, any local minimizer ˉx is a global minimizer of f.

Proof.

Suppose ˉx is a local minimizer, but not a global minimizer. Then

∃ xe s.t. f(x)_e < f(ˉx). Since f is convex, we have that

f(λx_e + (1 ₋ λ) ˉx) _≤ λ f(x) + (1_e ₋ λ)f(ˉx)

< λ f(ˉx) + (1 ₋ λ) f(ˉx) = f(ˉx) _∀ λ _∈ (0, 1]. Let x_λ := λx_e + (1 ₋ λ) ˉx. Then

x_λ _→ xˉ and f(xλ) < f(ˉx) as λ → 0. This is a contradiction to ˉx being a local minimizer.

4. Line Search.

The basic procedure to solve numerically an unconstrained problem (minimize f(x) over x _∈ Rn) is as follows.

(i) Choose an initial point x0 _∈ Rn and an initial search direction d0 _∈ Rn and set k = 0.

(ii) Choose a step size α_k and define a new point xk+1 = xk + α_k dk. Check if the stopping criterion is satisfied (_k∇f(xk+1)_k < ε?). If yes, xk+1 is the optimal solution, stop. If no, go to (iii).

(iii) Choose a new search direction dk+1 (descent direction) and set k = k + 1. Go to (ii).

The essential and most difficult part in any search algorithm is to choose a descent direction dk and a step size α_k with good convergence and stability properties.

5. Steepest Descent Method.

f is differentiable.

Choose dk = ₋gk, where gk = _∇f(xk), and choose α_k s.t. f(xk + αk dk) = min

α_∈R f(x

k ₊ _{α d}k_).

Note that the successive descent directions are orthogonal to each other, i.e. (gk)T gk+1 = 0, and the convergence for some functions may be very slow, called zigzagging.

Exercise.

Use the steepest descent (SD) method to solve (4) with the initial point x0 = (1, 1). (Answer. First three iterations give x1 = (0,1), x2 = (0, 3₂), and x3 = (₋1₈, 3₂).)

Steepest Descent.

Taylor gives:

f(xk + α dk) = f(xk) + α_∇f(xk)T dk + O(α2). As

∇f(xk)T dk = _k∇f(xk)_{k k}dk_k cosθk,

with θk the angle between dk and _∇f(xk), we see that dk is a descent direction if cosθk < 0. The descent is steepest when θk = π _⇐⇒ cosθk = ₋1.

Zigzagging.

α_k is minimizer of φ(α) := f(xk + α dk) with dk = ₋gk. Hence

0 = φ0(αk) = ∇f(xk + αk dk)T dk = ∇f(xk+1)T (−gk) = −(gk+1)T gk . Hence dk+1 _⊥ dk, which leads to zigzagging.

Exercise 6.5.

Use the SD method to solve (4) with the initial point x0 = (1, 1). [min: 1₇ (₋1,11).]

Answer. ∇f = (4x₁ + x₂ ₋ 1, 2x₂ + x₁ ₋ 3). Iteration 0: d0 = _−∇f(x0) = ₋(4,0) ₆= (0, 0). φ(α) = f(x0 + α d0) = f(1 ₋ 4α,1) = 2 (1 ₋ 4α)2 ₋ 2 minimum point at α₀ = 1₄ ⇒ x1 = x0 + α₀ d0 = (0,1), d1 = _−∇f(x1) = ₋(0, ₋1) = (0, 1) ₆= (0,0). Iteration 1: x2 = (0, 3₂), d2 = (₋1₂, 0). Iteration 2: x3 = (₋1₈, 3₂), d3 = (0, 1₈).

6. Newton Method.

f is twice differentiable.

Choose dk = ₋[Hk]−1gk, where Hk = _∇2f(xk). Set xk+1 = xk + dk.

If Hk is positive definite then dk is a descent direction.

The main drawback of the Newton method is that it requires the computation of

∇2f(xk) and its inverse, which can be difficult and time-consuming.

Exercise.

Use the Newton method to solve (4) with x0 = (1, 1). (Answer. First iteration gives x1 = 1₇ (₋1,11).)

Newton Method. Taylor gives f(xk + d) _≈ f(xk) + dT _∇f(xk) + 1₂ dT _∇2f(xk) d =: m(d) min d m(d) ⇒ ∇m(d) = 0 ⇒ ∇f(xk) + _∇2f(xk)d = 0 . Hence choose dk = ₋[_∇2f(xk)]−1 _∇f(xk) = ₋[Hk]−1gk. If Hk is positive definite, then so is (Hk)−1, and we get

(dk)T gk = ₋(gk)T (Hk)−1 gk _{≤ −}σ_k _kgk_k2 < 0 for some σ_k > 0.

Hence dk is a descent direction.

[Aside: The Newton method for minx f(x) is equivalent to the Newton method for finding a root of the system of nonlinear equations _∇f(x) = 0.]

Exercise 6.6.

Use the Newton method to minimize

f(x₁, x₂) = 2x2₁ + x₁x₂ + x2₂ ₋ x₁ ₋ 3x₂ with x0 = (1,1)T. Answer. ∇f = 4 x1 + x2 − 1 2 x₂ + x₁ ₋ 3 ! , H := _∇2f = 4 1 1 2 ! . H−1 = 1 detH 2 ₋1 −1 4 ! = 1₇ 2 −1 −1 4 ! . Iteration 0: x0 = (1,1)T, _∇f(x0) = (4, 0)T. x1 = x0 ₋ [H0]−1 _∇f(x0) = 1 1 ! − 1₇ 2 −1 −1 4 ! 4 0 ! = 1₇ −1 11 ! .

⇒ ∇f(x1) = (0, 0)T and H positive definite.

7. Choice of Stepsize.

In computing the step size α_k we face a tradeoff. We would like to choose α_k to give a substantial reduction of f, but at the same time we do not want to spend too much time making the choice. The ideal choice would be the global minimizer of the univariate function φ : R _→ R defined by

φ(α) = f(xk + α dk), α > 0, but in general, it is too expensive to identify this value.

A common strategy is to perform an inexact line search to identify a step size that achieves adequate reductions in f with minimum cost.

α is normally chosen to satisfy the Wolfe conditions:

f(xk + α_k dk) _≤ f(xk) + c₁ α_k (gk)Tdk (5)

∇f(xk + α_k dk)Tdk _≥ c₂ (gk)Tdk, (6) with 0 < c₁ < c₂ < 1. (5) is called the sufficient decrease condition, and (6) is the curvature condition.

Choice of Stepsize.

The simple condition

f(xk + α_k dk) < f(xk) (_†) is not appropriate, as it may not lead to a sufficient reduction.

Example: f(x) = (x ₋ 1)2 ₋ 1. So minf(x) = ₋1, but we can choose xk satisfying (_†) such that f(xk) = 1_k _→ 0.

Note that the sufficient decrease condition (5)

φ(α) = f(xk + α dk) _≤ `(α) := f(xk) + c₁ α(gk)Tdk

yields acceptable regions for α. Here φ(α) < `(α) for small α > 0, as (gk)Tdk < 0 for descent directions.

The curvature condition (6) is equivalent to

φ0(α) _≥ c₂φ0(0) [ > φ0(0) ]

8. Convergence of Line Search Methods.

An algorithm is said to be globally convergent if lim

k_→∞kg k

k = 0.

It can be shown that if the step sizes satisfy the Wolfe conditions

• then the steepest descent method is globally convergent,

• so is the Newton method provided the Hessian matrices _∇2f(xk) have a bounded condition number and are positive definite.

Exercise. Show that the steepest descent method is globally convergent if the

following conditions hold

(a) α_k satisfies the Wolfe conditions, (b) f(x) _≥ M _∀ x _∈ Rn,

[Hint: Show that

∞

k=0

Exercise 6.8.

Assume that dk is a descent direction, i.e. (gk)T dk < 0, where gk := _∇f(xk). Then if 1. α_k satisfies the Wolfe conditions,

2. f(x) _≥ M _∀ x _∈ Rn,

3. f _∈ C1 and _∇f is Lipschitz, i.e. _k∇f(x) _{− ∇}f(y)_{k ≤} L_kx ₋ y_{k ∀} x, y _∈ Rn, it holds that

∞

k=0

cos2 θk _kgk_k2 < _∞, where cosθk := _k(_ggkk_{k k})T_ddkk_k.

[Note: SD method is special case with cos2θk = 1. _⇒ lim k_→∞kg k k = 0.] Proof. Wolfe condition (6) _⇒ (gk+1)T dk _≥ c₂(gk)T d_k ⇒ (gk+1 ₋ gk)T dk _≥ (c2 − 1) (gk)T dk . (†) .

The Lipschitz condition yields that

(gk+1 ₋ gk)T dk _{≤ k}gk+1 ₋ gk_{k k}dk_k = _k∇f(xk+1) _{− ∇}f(xk)_{k k}dk_k

≤ L_kxk+1 ₋ xk_{k k}d_k_k = α_k L_kdk_k2. (_‡) Combining (_†) and (_‡) gives α_k _≥ c2 − 1

L (gk)T dk kdk_k2 , and hence α_k (gk)T dk _≤ c2 − 1 L [(gk)T dk]2 kdk_k2

Together with Wolfe condition (5) we get

f(xk+1) _≤ f(xk) + c₁ c2 − 1 L [(gk)T dk]2 kdk_k2 = f(x k₎ − c cos2 θk _kgk_k2 ,

where c := c₁ 1−c2 L > 0. f(xk+1) _≤ f(xk) ₋ c cos2 θk _kgk_k2 _≤ f(x0) ₋ c k X j=0 cos2θj _kgj_k2 ⇒ k X j=0 cos2 θj _kgj_k2 _≤ 1 c (f(x 0₎ − M) _∀ k _⇒ ∞ X j=0 cos2θj _kgj_k2 < _∞ .

9. Popular Search Methods.

In practice the steepest descent method and the Newton method are rarely used due to the slow convergence rate and the difficulty in computing Hessian matrices, respectively.

The popular search methods are

• the conjugate gradient method (variation of SD method with superlinear convergence) and

• the quasi-Newton method (variation of Newton method without computation of Hessian matrices).

There are some efficient algorithms based on the trust-region approach. See Fletcher (2000) for details.

10. Constrained Optimization.

Minimize f(x) over x _∈ Rn subject to the equality constraints

hi(x) = 0, i = 1, . . . , l , and the inequality constraints

gj(x) ≤ 0, j = 1, . . . , m . Assume that all functions involved are differentiable.

11. Linear Programming.

The problem is to minimize

z = c₁ x₁ + _{∙ ∙ ∙} + c_nx_n subject to

a_i₁ x₁ + _{∙ ∙ ∙} + a_inx_n _≥ b_i, i = 1, . . . , m , and

x₁, . . . , x_n _≥ 0.

LPs can be easily and efficiently solved with the simplex algorithm or the interior point method.

MS-Excel has a good in-built LP solver capable of solving problems up to 200 variables. MATLAB with optimization toolbox also provides a good LP solver.

12. Graphic Method.

If an LP problem has only two decision variables (x1, x2), then it can be solved by

the graphic method as follows:

• First draw the feasible region from the given constraints and a contour line of the objective function,

• then, on establishing the increasing direction perpendicular to the contour line, find the optimal point on the boundary of the feasible region,

• then find two linear equations which define that point,

• and finally solve the two equations to obtain the optimal point.

Exercise. Use the graphic method to solve the LP: minimize z = ₋3 x₁ ₋ 2 x₂

subject to x₁ + x₂ _≤ 80, 2x₁ + x₂ _≤ 100, x₁ _≤ 40, and x₁, x₂ _≥ 0. (Answer. x₁ = 20, x₂ = 60.)

13. Quadratic Programming.

Minimize

xTQ x + cTx subject to

A x _≤ b and x _≥ 0,

where Q is an n _× n symmetric positive definite matrix, A is an n _× m matrix, x, c _∈ Rn, b _∈ Rm.

To solve a QP problem, one

• first derives a set of equations from the Kuhn–Tucker conditions, and

• then applies the Wolfe algorithm or the Lemke algorithm to find the optimal solution.

The MS-Excel solver is capable of solving reasonably sized QP problems, similarly for MATLAB.

14. Kuhn–Tucker Conditions.

min f(x) over x _∈ Rn s.t. hi(x) = 0, i = 1, . . . , l; gj(x) ≤ 0, j = 1, . . . , m. Assume that ˉx is an optimal solution.

Under some regularity conditions, called the constraint qualifications, there exist two vectors ˉu = (ˉu₁, . . . ,uˉ_l) and ˉv = (ˉv₁, . . . ,vˉ_m), called the Lagrange multipliers, such that the following set of conditions is satisfied:

Lx_k(ˉx,u,ˉ v) = 0,ˉ k = 1, . . . , n h_i(ˉx) = 0, i = 1, . . . , l g_j(ˉx) _≤ 0, j = 1, . . . , m ˉ vj gj(ˉx) = 0, vˉj ≥ 0, j = 1, . . . , m where L(x, u, v) = f(x) + l X i=1 ui hi(x) + m X j=1 vj gj(x) is called the Lagrange function or Lagrangian.

Furthermore, if f : Rn _→ R and h_i, g_j : Rn _→ R are convex, then ˉx is an optimal solution if and only if (ˉx,u,ˉ v) satisfies the Kuhn–Tucker conditions.ˉ

This holds in particular, when f is convex and h_i, g_j are linear.

Example.

Find the minimum solution to the function x₂ ₋ x₁ subject to x2₁ + x2₂ _≤ 1.

Exercise.

Find the minimum solution to the function x2₁+x2₂₋2x1−4x2 subject to x1+2x2 ≤ 2

and x₂ _≥ 0.

Interpretation of Kuhn–Tucker conditions

Assume that no equality constraints are present.

If ˉx is an interior point, i.e. no constraints are active, then we recover the usual optimality condition: _∇f(ˉx) = 0.

Now assume that ˉx lies on the boundary of the feasible set and let gj_k be the active constraints at ˉx. Then a necessary condition for optimality is that we cannot find a descent direction for f at ˉx that is also a feasible direction. Such a vector cannot exist, if

−∇f(ˉx) = X k

v_j_k _∇g_j_k(ˉx) with ˉv_j_k _≥ 0. (_†)

This is because, if d _∈ Rn is a descent direction, then _∇f(ˉx)T d < 0 and P_k vˉj_k ∇gj_k(ˉx)T d > 0.

So there must exist a j_k, such that _∇g_j_k(ˉx)T d > 0. But that means that d is an ascent direction for g_j_k, and as g_j_k is active at ˉx, it is not a feasible direction.

Application of Kuhn–Tucker: LP Duality

Let b _∈ Rm, c _∈ Rn and A _∈ Rn×m.

min cT x s.t. A x _≥ b, x _≥ 0. (P) Equivalent to min cT x s.t. b ₋ A x _≤ 0, ₋x _≤ 0.

Lagrangian: L = cT x + vT(b ₋ A x) + yT (₋x).

Hence, ˉx is the solution, if there exist ˉv and ˉy such that

∇L = c ₋ AT vˉ ₋ yˉ = 0 _⇒ yˉ = c ₋ AT v,ˉ KT conditions: vˉT (b ₋ Ax) = 0,ˉ yˉT (₋x) = 0,ˉ

Eliminate ˉy to find ˉv, ˉx:

Axˉ _≥ b, xˉ _≥ 0 feasible region: primal AT vˉ _≤ c, vˉ _≥ 0 feasible region: dual

ˉ vT (b ₋ Ax) = 0ˉ ˉ xT (c ₋ AT v) = 0ˉ ) ⇒ xˉT c = ˉxT AT vˉ = ˉvT b Hence ˉv _∈ Rm solves the dual:

Here we have used that cT xˉ = min x_≥0, A x_≥b c T _x ≥ min x_≥0 maxv_≥0 c T _x ₊ _vT_(b − A x) = max v_≥0 minx_≥0 c T _x ₊ _vT_(b − A x) = max v_≥0 minx_≥0 v T _b ₊ _xT_(c − AT v) ≥ max v_≥0, AT v_≤c vT b ≥ vˉT b = cT xˉ

Example 6.14.

Find the minimum solution to the function x₂ ₋ x₁ subject to x2₁ + x2₂ _≤ 1.

Answer.

L = x₂ ₋ x₁ + v (x2₁ + x2₂ ₋ 1), so the KT conditions become Lx₁ = −1 + 2v x1 = 0 (1)

L_x₂ = 1 + 2v x₂ = 0 (2) x2₁ + x2₂ _≤ 1 (3) v (x2₁ + x2₂ ₋ 1) = 0, v _≥ 0 (4) (1) _⇒ v > 0 and hence x₁ = ₂1_v, x₂ = ₋₂1_v.

Plugging this into (4) yields ₄2_v₂ = 1 and hence ˉ v = _√1 2 ⇒ xˉ1 = 1 √ 2, xˉ2 = − 1 √ 2; with the optimal value being ˉz = ₋√2.

Example.

min x₁ s.t. x₂ ₋ x3₁ _≤ 0, x₁ _≤ 1, x₂ _≥ 0.

Since x3₁ _≥ x₂ _≥ 0 we have x₁ _≥ 0 and hence ˉx = (0, 0) is the unique minimizer. Lagrangian: L = x₁ + v₁ (x₂ ₋ x3₁) + v₂ (x₁ ₋ 1) + v₃(₋x₂).

KT conditions for a feasible point x:

∇L = 1 ₋ 3v₁x2₁ + v₂ v₁ ₋ v₃ = 0 (1) v₁(x₂ ₋ x3₁) = 0, v₂(x₁ ₋ 1) = 0, v₃(₋x₂) = 0 (2) v₁, v₂, v₃ _≥ 0 (3) Check KT conditions at ˉx = (0,0): (1) _⇒ v₂ = ₋1 < 0 impossible!

KT condition is not satisfied, since the constraint qualifications do not hold.

Here g₁ = x₂₋x₁3 and g₃ = ₋x₂ are active at (0, 0), and _∇g₁ = 0₁, _∇g₃ = ₋0₁. Hence

Exercise 6.14

Find the minimum solution to the function x2₁+x2₂₋2x₁₋4x₂ subject to x₁+ 2x₂ _≤ 2 and x₂ _≥ 0. Answer. Lagrangian: L = x2₁ + x2₂ ₋ 2x₁ ₋ 4x₂ + v₁ (x1 + 2 x2 − 2) + v2 (−x2) KT conditions: ∇L = 2x₁ ₋ 2 + v₁ 2x₂ ₋ 4 + 2 v₁ ₋ v₂ = 0 (1) v₁(x1 + 2x2 − 2) = 0, v2 x2 = 0, v1, v2 ≥ 0 (2) x₁ + 2x₂ _≤ 2, x₂ _≥ 0 (3)

If x₁ + 2x₂ ₋ 2 < 0 then v₁ = 0 _⇒ x₁ = 1 , x₂ = 2 + 1₂ v₂ _≥ 2. Hence from (2), v₂ = 0, and so x₁ = 1, x₂ = 2. But that contradicts (3), and so it must hold that x₁ + 2 x₂ ₋ 2 = 0.

In document Numerical Methods for Finance - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials (Page 87-123)