• No results found

Numerical Methods for Finance - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials

N/A
N/A
Protected

Academic year: 2020

Share "Numerical Methods for Finance - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials"

Copied!
293
0
0

Loading.... (view fulltext now)

Full text

(1)

Numerical Methods for Finance

Dr Robert N¨urnberg

This course introduces the major numerical methods needed for quantitative work in finance. To this avail, the course will strike a balance between a general survey of significant numerical methods anyone working in a quantitative field should know, and a detailed study of some numerical methods specific to financial mathematics. In the first part the course will cover e.g.

linear and nonlinear equations, interpolation and optimization,

while the second part introduces e.g.

binomial and trinomial methods, finite difference methods, Monte-Carlo simulation, random number generators, option pricing and hedging.

(2)

1. References

1. Burden and Faires (2004), Numerical Analysis.

2. Clewlow and Strickland (1998), Implementing Derivative Models. 3. Fletcher (2000), Practical Methods of Optimization.

4. Glasserman (2004), Monte Carlo Methods in Financial Engineering. 5. Higham (2004), An Introduction to Financial Option Valuation.

6. Hull (2005), Options, Futures, and Other Derivatives.

7. Kwok (1998), Mathematical Models of Financial Derivatives. 8. Press et al. (1992), Numerical Recipes in C. (online)

9. Press et al. (2002), Numerical Recipes in C++. 10. Seydel (2006), Tools for Computational Finance.

(3)
(4)

2. Preliminaries

1. Algorithms.

An algorithm is a set of instructions to construct an approximate solution to a math-ematical problem.

A basic requirement for an algorithm is that the error can be made as small as we like. Usually, the higher the accuracy we demand, the greater is the amount of computation required.

An algorithm is convergent if it produces a sequence of values which converge to the desired solution of the problem.

(5)

Example

Find x = √c, c > 1 constant.

Answer

x = √c ⇐⇒ x2 = c ⇐⇒ f(x) := x2 c = 0

⇒ f(1) = 1 c < 0 and f(c) = c2 c > 0

⇒ ∃ xˉ (1, c) s.t. f(ˉx) = 0

f0(x) = 2x > 0 f monotonically increasing xˉ is unique. Denote In := [an, bn] with I0 = [a0, b0] = [1, c]. Let xn := an+2bn.

(i) If f(xn) = 0 then ˉx = xn.

(6)

Length of In : m(In) = 12 m(In1) = ∙ ∙ ∙ = 21n m(I0) = c2−n1 Algorithm

Algorithm stops if m(In) < ε and let x? := xn. Error as small as we like?

ˉ

x, x? In

⇒ error |x? | = |xn | ≤ m(In) → 0 as n → ∞.

X

Convergence?

I0 I1 ⊃ ∙ ∙ ∙ ⊃ In ⊃ ∙ ∙ ∙

⇒ ∃! ˉx

\

n=0

In, f(ˉx) = 0, i.e. ˉx = √c.

X

Implementation:

No need to define In = [an, bn]. It is sufficient to store only 3 points throughout. Suppose ˉx (a, b), define x := a+2b.

(7)

2. Errors.

There are various errors in computed solutions, such as

• discretization error (discrete approximation to continuous systems),

• truncation error (termination of an infinite process), and

• rounding error (finite digit limitation in computer arithmetic). If a is a number and ea is an approximation to a, then

the absolute error is |a ea| and the relative error is |a − ea|

|a| provided a 6= 0.

(8)

Example

discretization error

x0 = f(x) [differential equation] x(t + h) x(t)

h = f(x(t)) [difference equation] DE =

x(t + h) x(t)

h − x

0(t)

truncation error

lim

n→∞xn = x, approximate x with xN, N a large number. TE = |x xN|

rounding error

We cannot express x exactly, due to finite digit limitation. We get ˆx instead. RE = |x |

(9)

3. Well/Ill Conditioned Problems.

A problem is well-conditioned (or ill-conditioned) if every small perturbation of the data results in a small (or large) change in the solution.

Example: Show that the solution to equations x + y = 2 and x + 1.01 y = 2.01 is ill-conditioned.

Exercise: Show that the following problems are ill-conditioned:

(a) solution to the differential equation x00 10x0 11x = 0 with initial conditions x(0) = 1 and x0(0) = 1,

(10)

Example

  

x + y = 2

x + 1.01 y = 2.01 ⇒

  

x = 1 y = 1 Change 2.01 to 2.02:

  

x + y = 2

x + 1.01 y = 2.02 ⇒

  

x = 0 y = 2

I.e. 0.5% change in data produces 100% change in solution: ill-conditioned !

[reason: det 1 1 1 1.01

!

(11)

4. Taylor Polynomials.

Suppose f, f0, . . . , f(n) are continuous on [a, b] and f(n+1) exists on (a, b). Let x0 [a, b]. Then for every x [a, b], there exists a ξ between x0 and x with

f(x) = n

X

k=0

f(k)(x0)

k! (x − x0)

k + R

n(x)

where Rn(x) = f

(n+1))

(n + 1)! (x − x0)

n+1 is the remainder.

[Equivalently: Rn(x) =

Z x

x0

f(n+1)(t)

n! (x − t)

n dt.]

Examples:

• exp(x) =

X

k=0

xk k!

• sin(x) =

X (1)k (2k + 1)!x

(12)

5. Gradient and Hessian Matrix.

Assume f : Rn R.

The gradient of f at a point x, written as f(x), is a column vector in Rn with ith component ∂x∂f

i(x).

The Hessian matrix of f at x, written as 2f(x), is an n × n matrix with (i, j)th component ∂x∂2f

i∂xj(x). [As

∂2f ∂xi∂xj =

∂2f

∂xj∂xi, ∇2f(x) is symmetric.]

Examples:

• f(x) = aTx, a Rn f = a, 2f = 0

• f(x) = 12 xT A x, A symmetric f(x) = A x, 2f = A

• f(x) = exp(12 xT A x), A symmetric

⇒ ∇f(x) = exp(12 xT A x)A x,

(13)

6. Taylor’s Theorem.

Suppose that f : Rn R is continuously differentiable and that p Rn. Then we have

f(x + p) = f(x) + f(x + t p)Tp , for some t (0, 1).

Moreover, if f is twice continuously differentiable, we have

∇f(x + p) = f(x) +

Z 1

0 ∇

2f(x + t p) p dt,

and

f(x + p) = f(x) + f(x)Tp + 1 2 p

T

(14)

7. Positive Definite Matrices.

An n × n matrix A = (aij) is positive definite if it is symmetric (i.e. AT = A) and xTA x > 0 for all x Rn \ {0}. [I.e. xTA x 0 with “=” only if x = 0.]

The following statements are equivalent: (a) A is a positive definite matrix,

(b) all eigenvalues of A are positive,

(c) all leading principal minors of A are positive.

The leading principal minors of A are the determinants Δk, k = 1, 2, . . . , n, defined by

Δ1 = det[a11], Δ2 = det

"

a11 a12 a21 a22

#

, . . . , Δn = detA.

A matrix A is symmetric and positive semi-definite, if xTA x 0 for all x Rn.

Exercise.

(15)

8. Convex Sets and Functions.

A set S Rn is a convex set if the straight line segment connecting any two points in S lies entirely inside S, i.e., for any two points x, y S we have

α x + (1 α)y S α [0, 1].

A function f : D R is a convex function if its domain D Rn is a convex set and if for any two points x, y D we have

f(α x + (1 α) y) αf(x) + (1 α) f(y) α [0, 1].

Exercise.

Let D Rn be a convex, open set.

(a) If f : D R is continuously differentiable, then f is convex if and only if f(y) f(x) + f(x)T(y x) x, y D .

(16)

Exercise 2.8.

(a) “

As f is convex we have for any x, y in the convex set D that

f(α y + (1 α)x) αf(y) + (1 α)f(x) α [0, 1]. Hence

f(y) f(x + α(y − x)) − f(x)

α + f(x). Letting α 0 yields f(y) f(x) + f(x)T (y x).

For any x1, x2 D and λ [0, 1] let x := λ x1 + (1 λ) x2 D and y := x1. On noting that y x = x1 λ x1 (1 λ)x2 = (1 λ) (x1 x2) we have that

f(x1) = f(y) f(x) + f(x)T (y x) = f(x) + (1 λ)f(x)T (x1 x2). () Similarly, letting x := λ x1 + (1 λ)x2 and y := x2 gives, on noting that y x = λ(x2 x1), that

(17)

Combining λ () + (1 λ) () gives

λ f(x1) + (1 λ) f(x2) f(x) = f(λ x1 + (1 λ)x2 f is convex. (b) “

For any x, x0 D use Taylor’s theorem at x0: f(x) = f(x0)+f(x0)T (xx0)+1

2 (x−x0) T

∇2f(x0+θ (xx0)) (xx0) θ (0, 1) As 2f is positive semi-definite, this immediately gives

f(x) f(x0) + ∇f(x0)T (x − x0) ⇒ f is convex.

Assume 2f is not positive semi-definite in the domain D. Then there exists x0 D and ˆx Rn s.t. ˆxT 2f(x0) ˆx < 0. As D is open we can find x1 := x0 + αxˆ ∈ D, for

(18)

9. Vector Norms.

A vector norm on Rn is a function, k∙k, from Rn into R with the following properties: (i) kxk ≥ 0 for all x Rn and kxk = 0 if and only if x = 0.

(ii) kα xk = |α| kxk for all α R and x Rn. (iii) kx + yk ≤ kxk + kyk for all x, y Rn.

Common vector norms are the l1, l2 (Euclidean), and lnorms:

kxk1 = n

X

i=1

|xi|, kxk2 =

( n X

i=1

x2i

)1/2

, kxk = max

1≤in|xi|.

Exercise.

(a) Prove that k ∙ k1, k ∙ k2 and k ∙ k are norms.

(b) Given a symmetric positive definite matrix A, prove that

kxkA :=

(19)

Example.

Draw graphs defined by kxk1 1, kxk2 1, kxk 1 when n = 2.

l1 l2 l

Exercise. Prove that for all x, y Rn we have (a)

n

X

i=1

|xi yi| ≤ kxk2kyk2 [Scharz inequality] and

(20)

10. Spectral Radius.

The spectral radius of a matrix A Rn×n is defined by ρ(A) = max

1≤in|λi|, where λ1, . . . , λn are all the eigenvalues of A.

11. Matrix Norms.

For an n × n matrix A, the natural matrix norm kAk for a given vector norm k ∙ k is defined by

kAk = max

kxk=1kA xk.

The common matrix norms are

kAk1 = max

1≤jn n

X

i=1

|aij|, kAk2 =

q

ρ(ATA)

| {z }

ρ(A) ifA=AT

, kAk = max

1≤in n

X

j=1

|aij|.

Exercise: Compute kAk1, kAk, and kAk2 for A =

"

1 1 0 1 2 1

−1 1 2

#

.

(21)

12. Convergence.

A sequence of vectors {x(k)} ⊂ Rn is said to converge to a vector x Rn if

kx(k) xk → 0 as k → ∞ for an arbitrary vector norm k ∙ k. This is equivalent to the componentwise convergence, i.e., x(ik) xi as k → ∞, i = 1, . . . , n.

A square matrix A Rn×n is said to be convergent if kAkk → 0 as k → ∞, which is equivalent to (Ak)ij → 0 as k → ∞ for all i, j.

The following statements are equivalent:

(i) A is a convergent matrix, (ii) ρ(A) < 1,

(iii) limk→∞ Ak x = 0, for every x Rn.

Exercise. Show that A is convergent, where

A =

"

1/2 0 1/4 1/2

#

(22)

3. Algebraic Equations

1. Decomposition Methods for Linear Equations.

A matrix A Rn×n is said to have LU decomposition if A = L U where L Rn×n is a lower triangular matrix (lij = 0 if 1 ≤ i < j ≤ n) and U ∈ Rn×n is an upper triangular matrix (uij = 0 if 1 ≤ j < i ≤ n).

The decomposition is unique if one assumes e.g. lii = 1 for 1 i n.

L =         l11 l21 l22 l31 l32 l33

... . ..

ln1 ln2 ln3 . . . lnn

       

, U =

       

u11 u12 u13 . . . u1n u22 u23 . . . u2n

. .. ... un1,n1 un1,n

(23)

In general, the diagonal elements of either L or U are given and the remaining elements of the matrices are determined by directly comparing two sides of the equation.

The linear system A x = b is then equivalent to L y = b and U x = y.

Exercise.

Show that the solution to L y = b is

y1 = b1/l11, yi = (bi − i1

X

k=1

lik yk)/lii, i = 2, . . . , n

(forward substitution) and the solution to U x = y is xn = yn/unn, xi = (yi −

n

X

k=i+1

uik xk)/uii, i = n − 1, . . . ,1

(24)

2. Crout Algorithm. Exercise.

Let A be tridiagonal, i.e. aij = 0 if |ij| > 1 (aij = 0 except perhaps ai1,i, aii and ai,i+1), and strictly diagonally dominant (|aii| > Pj6=i |aij| holds for i = 1, . . . , n). Show that A can be factorized as A = L U where lii = 1 for i = 1, . . . , n, u11 = a11,

and

ui,i+1 = ai,i+1 li+1,i = ai+1,i/uii

ui+1,i+1 = ai+1,i+1 li+1,i ui,i+1

for i = 1, . . . , n 1. [Note: L and U are tridiagonal.]

C++ Exercise: Write a program to solve a tridiagonal and strictly diagonally

(25)

Exercise 3.2.

u11 = a11 and ui,i+1 = ai,i+1, li+1,i = ai+1,i/uii, ui+1,i+1 = ai+1,i+1 − li+1,i ui,i+1, for

i = 1, . . . , n 1, can easily be shown.

It remains to show that uii 6= 0 for i = 1, . . . , n. We proceed by induction to show that

|uii| > |ai,i+1|, where for convenience we define an,n+1 := 0.

• i = 1: |u11| = |a11| > |a1,2|

X

• i 7→ i + 1:

|ui+1,i+1| =

ai+1,i+1 −

ai+1,i ai,i+1 uii

≥ |ai+1,i+1| − |ai+1,i| |ai,i+1|

|uii| ≥ |ai+1,i+1| − |ai+1,i| > |ai+1,i+2|

X

Overall we have that |uii| > 0 and so the Crout algorithm is well defined. Moreover,

(26)

3. Choleski Algorithm. Exercise.

Let A be a positive definite matrix. Show that A can be factorized as A = L LT where L is a lower triangular matrix.

(i) Compute 1st column:

l11 = √a11, li1 = ai1/l11, i = 2, . . . , n. (ii) For j = 2, . . . , n 1 compute jth column:

ljj = (ajj − j1

X

k=1

ljk2 )12

lij = (aij − j1

X

k=1

(27)

(iii) Compute nth column:

lnn = (ann − n1

X

k=1

(28)

4. Iterative Methods for Linear Equations.

Split A into A = M +N with M nonsingular and convert the equation A x = b into an equivalent equation x = C x + d with C = M−1 N and d = M−1b.

Choose an initial vector x(0) and then generate a sequence of vectors by x(k) = C x(k−1) + d, k = 1, 2, . . .

The resulting sequence converges to the solution of A x = b, for an arbitrary initial vector x(0), if and only if ρ(C) < 1.

The objective is to choose M such that M−1 is easy to compute and ρ(C) < 1. The iteration stops if kx(k) x(k−1)k < ε.

(29)

Claim.

The iteration x(k) = C x(k−1) + d is convergent if and only if ρ(C) < 1.

Proof.

Define e(k) := x(k) x, the error of the kth iterate. Then

e(k) = C x(k−1) + d (C x + d) = C (x(k−1) x) = C e(k−1) = C2 e(k−2) = . . . Ck e(0) , where e(0) = x(0) x is an arbitrary vector.

Assume C is similar to the diagonal matrix Λ = diag(λ1, . . . , λn), where λi are the eigenvalues of C.

⇒ ∃ X nonsingular s.t. C = X ΛX−1

⇒ e(k) = Ck e(0) = X Λk X−1 e(0) = X

  

λk1 . ..

λkn

 

 X−1e(0) → 0 as k → ∞

⇐⇒ |λi| < 1 i = 1, . . . , n

(30)

5. Jacobi Algorithm.

Exercise: Let M = D and N = L + U (L strict lower triangular part of A, D

diagonal, U strict upper triangular part). Show that the ith component at the kth iteration is

x(ik) = 1 aii

bi i1

X

j=1

aij x(jk−1) n

X

j=i+1

aij x(jk−1)

 

for i = 1, . . . , n.

6. Gauss–Seidel Algorithm.

Exercise: Let M = D + L and N = U. Show that the ith component at the kth

iteration is

x(ik) = 1 aii

bi i1

X

j=1

aij x(jk) n

X

j=i+1

aij x(jk−1)

 

(31)

7. SOR Algorithm. Exercise.

Let M = ω1D + L and N = U + (1 ω1)D where 0 < ω < 2. Show that the ith component at the kth iteration is

x(ik) = (1 ω)x(ik−1) + ω 1 aii

bi − i1

X

j=1

aij x(jk) − n

X

j=i+1

aij x(jk−1)

 

for i = 1, . . . , n.

C++ Exercise: Write a program to solve a diagonally dominant linear equation

(32)

8. Special Matrices.

If A is strictly diagonally dominant, then Jacobi and Gauss–Seidel converge for any initial vector x(0). In addition, SOR converges for ω (0, 1].

If A is positive definite and 0 < ω < 2, then the SOR method converges for any initial vector x(0).

If A is positive definite and tridiagonal, then ρ(CGS) = [ρ(CJ)]2 < 1 and the optimal choice of ω for the SOR method is ω = 2

1 + p1 ρ(CGS)

∈ [1, 2). With this choice of ω, ρ(CSOR) = ω − 1 ≤ ρ(CGS).

Exercise.

Find the optimal ω for the SOR method for the matrix

A =

  

4 3 0 3 4 1 0 1 4

  .

(33)

9. Condition Numbers.

The condition number of a nonsingular matrix A relative to a norm k ∙ k is defined by

κ(A) = kAk ∙ kA−1k. Note that κ(A) ≥ kA A−1k = kIk = max

kxk=1 kxk = 1.

A matrix A is well-conditioned if κ(A) is close to one and is ill-conditioned if κ(A) is much larger than one.

Suppose kδAk < 1

kA−1k. Then the solution xe to (A + δA)xe = b + δb approximates

the solution x of A x = b with error estimate

kx exk

kxk

κ(A)

1 − kδAk kA−1k

kδbk

kbk +

kδAk

kAk

. In particular, if δA = 0 (no perturbation to matrix A) then

kx k

kxk ≤ κ(A)

kδbk

(34)

Example.

Consider Example 1.3.

A = 1 1 1 1.01

!

⇒ A−1 = 1 det A

1.01 1

−1 1

!

= 1 0.01

1.01 1

−1 1

!

= 101 −100

−100 100

!

Recall

kAk1 = max

1≤jn n

X

i=1

|aij|. Hence

kAk1 = max(2, 2.01) = 2.01 , kA−1k1 = max(201,200) = 201.

⇒ κ1(A) = kAk1 ∙ kA−1k1 = 404.01 1 (ill-conditioned!)

Similarly κ = 404.01 and κ2 = ρ(A)ρ(A−1) = λmax

(35)

10. Hilbert Matrix.

An n × n Hilbert matrix Hn = (hij) is defined by hij = 1/(i + j − 1) for i, j = 1, 2, . . . , n.

Hilbert matrices are notoriously ill-conditioned and κ(Hn) → ∞ very rapidly as n → ∞.

Hn =

           1 1

2 . . .

1 n 1

2

1

3 . . .

1 n + 1

... ...

1 n

1

n + 1 . . .

1 2n 1

           Exercise.

(36)
(37)

11. Fixed Point Method for Nonlinear Equations.

A function g : R R has a fixed point xˉ if g(ˉx) = ˉx.

A function g is a contraction mapping on [a, b] if g : [a, b] [a, b] and

|g0(x)| ≤ L < 1, x (a, b) where L is a constant.

Exercise.

Assume g is a contraction mapping on [a, b]. Prove that g has a unique fixed point ˉx in [a, b], and for any x0 [a, b], the sequence defined by

xn+1 = g(xn), n ≥ 0,

(38)

Exercise 3.11.

Existence:

Define h(x) = x g(x) on [a, b]. Then h(a) = a g(a) 0 and h(b) = b g(b) 0. As h is continuous, c [a, b] s.t. h(c) = 0. I.e. c = g(c). X

Uniqueness:

Suppose p, q [a, b] are two fixed points. Then

|p q| = |g(p) g(q)| |{z}=

MVT, α∈(a,b)

|g0(α) (p q)| ≤ L|p q|

⇒ (1 L)|p q| ≤ 0 |p q| ≤ 0 p = q . X Convergence:

|xn | = |g(xn1) − g(ˉx)| = |g0(α) (xn1 − x)ˉ |

≤ L|xn1 | ≤ . . . Ln|x0 | → 0 as n → ∞. Hence

(39)

12. Newton Method for Nonlinear Equations.

Assume that f C1([a, b]), f(ˉx) = 0 (ˉx is a root or zero) and f0(ˉx) 6= 0.

The Newton method can be used to find the root ˉx by generating a sequence {xn} satisfying

xn+1 = xn f(xn)

f0(xn), n = 0, 1, . . . provided f0(xn) 6= 0 for all n.

The sequence xn converges to the root ˉx as long as the initial point x0 is sufficiently close to ˉx.

The algorithm stops if |xn+1 xn| < ε, a prescribed error tolerance, and xn+1 is taken as an approximation to ˉx.

(40)

Setting Y = 0 yields xn+1 := X = xn f(xn)

(41)

13. Choice of Initial Point.

Suppose f C2([a, b]) and f(ˉx) = 0 with f0(ˉx) 6= 0. Then there exists δ > 0 such that the Newton method generates a sequence xn converging to ˉx for any initial point x0 [ˉx δ, xˉ + δ] (x0 can only be chosen locally).

However, if f satisfies the following additional conditions: 1. f(a)f(b) < 0,

2. f00 does not change sign on [a, b],

3. tangent lines to the curve y = f(x) at both a and b cut the x-axis within [a, b]; (i.e. a ff0((aa)), b ff0((bb)) [a, b])

then f(x) = 0 has a unique root ˉx in [a, b] and Newton method converges to ˉx for any initial point x0 [a, b] (x0 can be chosen globally).

Example.

(42)

Example.

Find x = √c, c > 1.

Answer.

x is root of f(x) := x2 c. Newton: xn+1 = xn f(xn)

f0(xn) = xn −

x2n c 2 xn =

1 2

xn + c xn

, n 0. How to choose x0?

Check the 3 conditions on [1, c].

1. f(1) = 1 c < 0, f(c) = c2 c > 0. f(1) f(c) < 0 X 2. f00 = 2 X

3. Tangent line at 1: Y = f(1) + f0(1) (X 1) = 1 c + 2 (X 1) Let Y = 0, then X = 1 + c − 1

2 ∈ (1, c). X

Tangent line at c: Y = f(c) + f0(c) (X c) = c2 c + 2 c(X c) Let Y = 0, then X = c c − 1

(43)
(44)

Numerical Example.

Find √7. (From calculator: √7 = 2.6457513.) Newton converges for all x0 [1,7]. Choose x0 = 4.

x1 = 1 2

x0 + 7 x0

= 2.875

x2 = 2.6548913 x3 = 2.6457670 x4 = 2.6457513 Comparison to bisection method with I0 = [1, 7]:

I1 = [1,4] I2 = [2.5, 4] I3 = [2.5, 3.25] I4 = [2.5, 2.875]

...

(45)

14. Pitfalls.

Here are some difficulties which may be encountered with the Newton method:

1. {xn} may wander around and not converge (there are only complex roots to the equation),

2. initial approximation x0 is too far away from the desired root and {xn} converges to some other root (this usually happens when f0(x0) is small),

3. {xn} may diverge to + (the function f is positive and monotonically decreasing on an unbounded interval), and

(46)

15. Rate of Convergence.

Suppose {xn} is a sequence that converges to ˉx.

The convergence is said to be linear if there is a constant r (0,1) such that

|xn+1 |

|xn − xˉ| ≤

r, for all n sufficiently large. The convergence is said to be superlinear if

lim n→∞

|xn+1 |

|xn | = 0. In particular, the convergence is said to be quadratic if

|xn+1 − xˉ|

|xn |2 ≤ M, for all n sufficiently large.

where M is a positive constant, not necessarily less than 1.

Example. xn = ˉx + 0.5n linear, xn = ˉx + 0.52n quadratic.

(47)

Example.

Define g(x) = x f(x)

f0(x). Then the Newton method is given by

xn+1 = g(xn).

Moreover, f(ˉx) = 0 and f0(ˉx) 6= 0 imply that g(ˉx) = ˉx ,

g0(ˉx) = 1 (f0)

2 f f00

(f0)2 (ˉx) = f(ˉx)

f00(ˉx)

(f0(ˉx))2 = 0,

g00(ˉx) = f

00x)

f0(ˉx) . Assuming that xn → xˉ we have that

|xn+1 |

|xn |2 =

|g(xn) − g(ˉx)|

|xn |2 |{z}=

Taylor

|g0(ˉx) (xn − x) +ˉ 12 g00(ηn) (xn − x)ˉ 2|

|xn |2

(48)
(49)

4. Interpolations

1. Polynomial Approximation.

For any continuous function f defined on an interval [a, b], there exist polynomials P that can be as “close” to the given function as desired.

Taylor polynomials agree closely with a given function at a specific point, but they concentrate their accuracy only near that point.

(50)

2. Interpolating Polynomial – Lagrange Form.

Suppose xi ∈ [a, b], i = 0,1, . . . , n, are pairwise distinct mesh points in [a, b]. The Lagrange polynomial p is a polynomial of degree n such that

p(xi) = f(xi), i = 0,1, . . . , n. p can be constructed explicitly as

p(x) = n

X

i=0

Li(x)f(xi) where Li is a polynomial of degree n satisfying

Li(xj) = 0, j 6= i, Li(xi) = 1. This results in

Li(x) = Y j6=i

x xj xi xj

i = 0,1, . . . , n.

(51)

Exercise.

Find the Lagrange polynomial p for the following points (x, f(x)): (1,0), (1, 3), and (2, 4). Assume that a new point (0, 2) is observed, and construct a Lagrange polynomial to incorporate this new information in it.

Error formula.

Suppose f is n + 1 times differentiable on [a, b]. Then it holds that f(x) = p(x) + f

(n+1)(ξ)

(n + 1)! (x − x0)∙ ∙ ∙(x − xn), where ξ = ξ(x) lies in (a, b).

Proof.

Define g(x) = f(x) p(x) + λ n

Y

j=0

(52)

Hence

g(x) = f(x) p(x) (f(α) p(α)) n

Y

j=0

x xj α xj

.

⇒ g has at least n + 2 zeros: x0, . . . , xn, α. Mean Value Theorem yields that

g0 has at least n + 1 zeros ...

g(n+1) has at least 1 zero, say ξ

(53)

Hence

0 = g(n+1)(ξ) = f(n+1)(ξ) (f(α) p(α))Qn(n + 1)! j=0(α − xj)

⇒ Error = f(α) p(α) = f

(n+1)(ξ)

(n + 1)! n

Y

j=0

(54)

3. Trapezoid Rule.

We can use linear interpolation (n = 1, x0 = a, x1 = b) to approximate f(x) on [a, b] and then compute Rab f(x) dx to get the trapezoid rule:

Z b

a

f(x) dx 1

2 (b − a) [f(a) + f(b)].

If we partition [a, b] into n equal subintervals with mesh points xi = a + ih, i = 0, . . . , n, and step size h = (b a)/n, we can derive the composite trapezoid rule:

Z b

a

f(x) dx h

2 [f(x0) + 2 n1

X

i=1

f(xi) + f(xn)].

(55)

Use linear interpolation (n = 1, x0 = a, x1 = b) to approximate f(x) on [a, b] and then compute Rab f(x) dx.

Answer.

The linear interpolating polynomial is p(x) = f(a)L0(x) + f(b)L1(x), where L0(x) = x − b

a b , and L1(x) =

x a b a .

Z b

a

f(x) dx

Z b

a

p(x) dx = f(a)

Z b

a

x b

a b dx + f(b)

Z b

a

x a b a dx = f(a) 1

a b 1

2 (x − b)

2

|ba +f(b) 1 b a

1

2 (x − a)

2

|ba

= f(a) 1 a b

1

2 (−(a − b)

2) + f(b) 1

b a 1

2 (b − a)

2

= b − a

2 (f(a) + f(b)) ← Trapezoid Rule

(56)

Error Analysis.

Let f(x) = p(x) + E(x), where E(x) = f

00)

2 (x − a) (x − b) with ξ ∈ (a, b). Assume that |f00| ≤ M is bounded. Then

Z b a

E(x) dx

≤ Z b a |

E(x)| dx M 2

Z b

a

(x a) (b x) dx

= M 2

Z b

a

(x a) [(b a) (x a)] dx = M

2

Z b

a

−(x a)2 + (b a) (x a) dx = M

2

−13 (b a)3 + 1

2 (b − a)

3

dx

= M

12 (b − a)

(57)

The composite formula can be obtained by considering the partitioning of [a, b] into a = x0 < x1 < . . . < xn1 < xn = b, where xi = a + i h with h :=

b a n .

Z b

a

f(x) dx =

n1

X

i=0

Z xi+1

xi

f(x) dx

n1

X

i=0

xi+1 − xi

2 (f(xi) + f(xi+1)) =

n1

X

i=0

h

2 (f(xi) + f(xi+1)) = h

1

2 f(a) + f(x1) + . . . + f(xn−1) + 1

2 f(b)

.

Error analysis then yields that

Error M 12 h

3 n = M (b − a)

12 h

(58)

4. Simpson’s Rule. Exercise.

Use quadratic interpolation (n = 2, x0 = a, x1 = a+2b, x2 = b) to approximate f(x) on [a, b] and then compute Rab f(x) dx to get the Simpson’s rule:

Z b

a

f(x) dx 1

6 (b − a) [f(a) + 4 f(

a + b

2 ) + f(b)]. Derive the composite Simpson’s rule:

Z b

a

f(x) dx h

3 [f(x0) + 2 n/2

X

i=2

f(x2i2) + 4

n/2

X

i=1

f(x2i1) + f(xn)],

where n is an even number and xi and h are chosen as in the composite trapezoid rule.

(59)

5. Newton–Cotes Formula.

Suppose x0, . . . , xn are mesh points in [a, b], usually mesh points are equally spaced and x0 = a, xn = b, then integral can be approximated by the Newton–Cotes formula:

Z b

a

f(x) dx n

X

i=0

Ai f(xi)

where parameters Ai are determined in such a way that the integral is exact for all polynomials of degree n.

[Note: n+1 unknowns Ai and n+1 coefficients for polynomial of degree n.] Exercise. Use Newton–Cotes formula to derive the trapezoid rule and the Simpson’s rule. Prove that if f is n + 1 times differentiable and |f(n+1)| ≤ M on [a, b] then

|

Z b

a

f(x) dx n

X

i=0

Ai f(xi)| ≤

M (n + 1)!

Z b

a n

Y

i=0

(60)

Exercise 4.5.

We have that

Z b

a

q(x) dx = n

X

i=0

Ai q(xi) for all polynomials q of degree ≤ n.

Let q(x) = Lj(x), where Lj is the jth Lagrange polynomial for the data points x0, x1, . . . , xn. I.e. Lj is of degree n and satisfies Lj(xi) = δij =

  

1 i = j

0 i 6= j. Now

Z b

a

Lj dx = n

X

i=0

Ai Lj(xi) = Aj

Z b

a

f(x) dx n

X

i=0

Ai f(xi) = n

X

i=0

f(xi)

Z b

a

Li(x) dx

= Z b a n X i=0

f(xi)Li(x) dx =

Z b

a

p(x) dx,

(61)

(n = 1) and Simpson’s rule (n = 2, with x1 = a+2b). The Lagrange polynomial has the error term

f(x) = p(x) + E(x), E(x) := f

(n+1))

(n + 1)! (x − x0)∙ ∙ ∙ (x − xn), where ξ = ξ(x) lies in (a, b). Hence

Z b a

f(x) dx

Z b a p(x) dx = Z b a

E(x) dx

≤ Z b a |

E(x)| dx

(nM+ 1)!

Z b

a n

Y

i=0

(62)

6. Ordinary Differential Equations.

An initial value problem for an ODE has the form

x0(t) = f(t, x(t)), a t b and x(a) = x0. (1) (1) is equivalent to the integral equation:

x(t) = x0 +

Z t

a

f(s, x(s)) ds, a t b. (2) To solve (2) numerically we divide [a, b] into subintervals with mesh points ti = a+ih, i = 0, . . . , n, and step size h = (b a)/n. (2) implies

x(ti+1) = x(ti) +

Z ti+1

ti

(63)

(a) If we approximate f(s, x(s)) on [ti, ti+1] by f(ti, x(ti)), we get the Euler (explicit) method for equation (1):

wi+1 = wi + hf(ti, wi), w0 = x0.

We have x(ti+1) ≈ wi+1 if h is sufficiently small.

[Taylor: x(ti+1) = x(ti) + x0(ti)h + O(h2) = x(ti) + f(ti, x(ti))h + O(h2).]

(b) If we approximate f(s, x(s)) on [ti, ti+1] by linear interpolation with points (ti, f(ti, x(ti))) and (ti+1, f(ti+1, x(ti+1))), we get the trapezoidal (implicit) method for equation

(1):

wi+1 = wi + h

2 [f(ti, wi) + f(ti+1, wi+1)], w0 = x0.

(c) If we combine the Euler method with the trapezoidal method, we get the modified Euler (explicit) method (or Runge–Kutta 2nd order method):

wi+1 = wi + h

2 [f(ti, wi) + f(ti+1, w| i + hf{z(ti, wi})

≈ w

(64)

7. Divided Differences.

Suppose a function f and (n + 1) distinct points x0, x1, . . . , xn are given. Divided differences of f can be expressed in a table format as follows:

xk 0DD 1DD 2DD 3DD . . .

x0 f[x0]

x1 f[x1] f[x0, x1]

x2 f[x2] f[x1, x2] f[x0, x1, x2]

x3 f[x3] f[x2, x3] f[x1, x2, x3] f[x0, x1, x2, x3]

(65)

where f[xi] = f(xi) f[xi, xi+1] =

f[xi+1] − f[xi] xi+1 xi f[xi, xi+1, . . . , xi+k] =

f[xi+1, xi+2, . . . , xi+k] − f[xi, xi+1, . . . , xi+k1]

xi+k xi f[x1, . . . , xn] = f[x2, . . . , xn] − f[x1, . . . , xn−1]

(66)

8. Interpolating Polynomial – Newton Form.

One drawback of Lagrange polynomials is that there is no recursive relationship between Pn1 and Pn, which implies that each polynomial has to be constructed

individually. Hence, in practice one uses the Newton polynomials.

The Newton interpolating polynomial Pn of degree n that agrees with f at the points x0, x1, . . . , xn is given by

Pn(x) = f[x0] + n

X

k=1

f[x0, x1, . . . , xk] k1

Y

i=0

(x xi). Note that Pn can be computed recursively using the relation

Pn(x) = Pn1(x) + f[x0, x1, . . . , xn](x − x0)(x − x1)∙ ∙ ∙(x − xn1).

[Note that f[x0, x1, . . . , xk] can be found on the diagonal of the DD table.]

Exercise.

(67)
(68)

Exercise 4.8.

Data points: (1, 2), (2, 56), (0,2), (3,4), (1, 16), (7, 376). xk 0DD 1DD 2DD 3DD 4DD 5DD

1 2

−2 56 18

0 2 27 9

3 4 2 5 2

−1 16 5 3 2 0

7 376 49 11 2 0 0

Newton polynomial:

p(x) = 2 + 18 (x 1) 9 (x 1) (x + 2) + 2 (x 1) (x + 2) x = 2x3 7x2 + 5x 2.

(69)

9. Piecewise Polynomial Approximations.

Another drawback of interpolating polynomials is that Pn tends to oscillate widely when n is large, which implies that Pn(x) may be a poor approximation to f(x) if x is not close to the interpolating points.

If an interval [a, b] is divided into a set of subintervals [xi, xi+1], i = 0,1, . . . , n−1, and

on each subinterval a different polynomial is constructed to approximate a function f, such an approximation is called spline.

The simplest spline is the linear spline P which approximates the function f on the interval [xi, xi+1], i = 0, 1, . . . , n − 1, with P agreeing with f at xi and xi+1.

(70)

10. Natural Cubic Splines.

Given a function f defined on [a, b] and a set of points a = x0 < x1 < ∙ ∙ ∙ < xn = b, a function S is called a natural cubic spline if there exist n cubic polynomials Si such that:

(a) S(x) = Si(x) for x in [xi, xi+1] and i = 0,1, . . . , n − 1;

(b) Si(xi) = f(xi) and Si(xi+1) = f(xi+1) for i = 0,1, . . . , n − 1;

(c) Si0+1(xi+1) = Si0(xi+1) for i = 0, 1, . . . , n − 2;

(71)

Natural Cubic Splines.

| | |

xi xi+1 xi+2 Si Si+1

(a) 4n parameters (b) 2n equations

(c) n 1 equations (d) n 1 equations

(e) 2 equations

            

(72)

Example. Assume S is a natural cubic spline that interpolates f C2([a, b]) at the nodes a = x0 < x1 < ∙ ∙ ∙ < xn = b. We have the following smoothness property of cubic splines:

Z b

a

[S00(x)]2 dx

Z b

a

[f00(x)]2 dx.

In fact, it even holds that

Z b

a

[S00(x)]2 dx = min g∈G

Z b

a

[g00(x)]2 dx,

where G := {g C2([a, b]) : g(xi) = f(xi) i = 0,1, . . . , n}.

Exercise: Determine the parameters a to h so that S(x) is a natural cubic spline, where

S(x) = ax3 + bx2 + cx + d for x [1, 0] and S(x) = ex3 + f x2 + gx + h for x [0,1]

with interpolation conditions S(1) = 1, S(0) = 2, and S(1) = 1.

(73)

11. Computation of Natural Cubic Splines.

Denote

ci = S00(xi), i = 0, 1, . . . , n. Then c0 = cn = 0.

Since Si is a cubic function on [xi, xi+1], we know that Si00 is a linear function on [xi, xi+1]. Hence it can be written as

Si00(x) = ci xi+1 − x hi

(74)

Exercise.

Show that Si is given by Si(x) = ci

6hi (xi+1 − x)

3 + ci+1

6 hi (x − xi)

3 + p

i (xi+1 − x) + qi (x − xi), where

pi =

f(xi) hi −

ci hi 6

, qi =

f(xi+1)

hi −

ci+1 hi 6

and c1, . . . , cn1 satisfy the linear equations:

hi1ci1 + 2 (hi1 + hi)ci + hi ci+1 = ui, where

ui = 6 (di − di1), di =

f(xi+1) − f(xi) hi

for i = 1, 2, . . . , n 1.

C++ Exercise: Write a program to construct a natural cubic spline. The inputs are

(75)

5. Basic Probability Theory

1. CDF and PDF.

Let (Ω, F, P) be a probability space, X be a random variable. The cumulative distribution function (cdf) F of X is defined by

F(x) = P(X x), x R. F is an increasing right-continuous function satisfying

F(−∞) = 0, F(+) = 1.

If F is absolutely continuous then X has a probability density function (pdf) f defined by

f(x) = F0(x), x R. F can be recovered from f by the relation

F(x) =

Z x

(76)

2. Normal Distribution.

A random variable X has a normal distribution with parameters μ and σ2, written X N(μ, σ2), if X has the pdf

φ(x) = 1 σ√2πe

−(x2σμ2)2

for x R.

If μ = 0 and σ2 = 1 then X is called a standard normal random variable and its cdf is usually written as

Φ(x) =

Z x

−∞

1

2πe

−u22 du.

If X N(μ, σ2) then the characteristic function (Fourier transform) of X is given by

c(s) = E(ei sX) = ei μ t−σ22s2 ,

(77)

3. Approximation of Normal CDF.

It is suggested that the standard normal cdf Φ(x) can be approximated by a “poly-nomial” Φ(x) as follows:e

e

Φ(x) := 1 Φ0(x) (a1k + a2k2 + a3k3 + a4k4 + a5k5) (3) when x 0 and Φ(x) := 1e Φ(e x) when x < 0.

The parameters are given by k = 1+1γx, γ = 0.2316419, a1 = 0.319381530, a2 =

−0.356563782, a3 = 1.781477937, a4 = 1.821255978, and a5 = 1.330274429. This approximation has a maximum absolute error less than 7.5 × 10−8 for all x.

C++ Exercise: Write a program to compute Φ(x) with (3) and compare the result

(78)

4. Lognormal Random Variable.

Let Y = eX and X be a N(μ, σ2) random variable. Then Y is a lognormal random variable.

Exercise: Show that

E(Y ) = eμ+12σ2, E(Y 2) = e2μ+2σ2.

5. An Important Formula in Pricing European Options.

If V is lognormally distributed and the standard deviation of ln V is s then 1 E(max(V K,0)) = E(V ) Φ(d1) K Φ(d2)

where

d1 = 1 s ln

E(V ) K +

s

2 and d2 = d1 − s.

(79)

E(V K)+ = E(V ) Φ(d1) − K Φ(d2), d1 =

1 s ln

E(V ) K +

s

2 and d2 = d1 − s.

Proof.

Let g be the pdf of V . Then E(V K)+ =

Z

−∞

(v K)+ g(v) dv =

Z

K

(v K)g(v) dv. As V is lognormal, lnV is normal N(m, s2), where m = ln(E(V )) 12 s2. Let Y := lnVs−m, i.e. V = em+s Y . Then Y N(0,1) with pdf: φ(y) = √1

2π e−

y2

2 .

E(V K)+ = E(em+s Y K)+ =

Z

lnKm s

em+s y K φ(y) dy

=

Z

lnKm s

em+s y φ(y) dy K

Z

lnKm s

(80)

I1 =

Z

lnKm s

1

2π e

−y22+m+s y dy

=

Z

lnKm s

1

2π e

−(y−2s)2+m+s22 dy

= em+s22

Z

lnKm s −s

1

2π e

−y22 dy [y s 7→ y]

= em+s22

1 Φ

lnK m

s − s

= em+s22 Φ

−lnK − m

s + s

= elnE(V) Φ −lnK + lnE(V ) − s2

2

s + s

!

= E(V ) Φ

1 s ln

E(V ) K +

s 2

= E(V ) Φ(d1),

(81)

Similarly

I2 = 1 Φ

lnK m s

= Φ

−lnKs− m

(82)

6. Correlated Random Variables.

Assume X = (X1, . . . , Xn) is an n-vector of random variables.

The mean of X is an n-vector μ = (E(X1), . . . , E(Xn)). The covariance of X is an n × n-matrix Σ with components

Σij = (CovX)ij = E((Xi − μi)(Xj − μj)).

The variance of Xi is given by σi2 = Σii and the correlation between Xi and Xj is given by ρij = σΣij

iσj.

X is called a multi-dimensional normal vector, written as X N(μ,Σ), if X has pdf f(x) = 1

(2π)n/2

1

(detΣ)1/2 exp

−12(x μ)TΣ−1(x μ)

(83)

7. Convergence.

Let {Xn} be a sequence of random variables. There are four types of convergence concepts associated with {Xn}:

• Almost sure convergence, written Xn −→a.s. X, if there exists a null set N such that for all ω Ω \ N one has

Xn(ω) X(ω), n → ∞.

• Convergence in probability, written Xn P

−→ X, if for every ε > 0 one has P(|Xn X| > ε) 0, n → ∞.

• Convergence in norm, written Xn Lp

−→ X, if Xn, X ∈ Lp and E|Xn X|p 0, n → ∞.

(84)

8. Strong Law of Large Numbers.

Let {Xn} be independent, identically distributed (iid) random variables with finite expectation E(X1) = μ. Then

Zn n

a.s.

−→ μ where Zn = X1 + ∙ ∙ ∙ + Xn.

9. Central Limit Theorem. Let {Xn} be iid random variables with finite expecta-tion μ and finite variance σ2 > 0. For each n, let

Zn = X1 + ∙ ∙ ∙ + Xn. Then

Zn

n − μ σ

n

= Zn− nμ nσ

D

−→ Z where Z is a N(0,1) random variable, i.e.,

P(Zn− nμ

nσ ≤ z) → 1

Z z

−∞

(85)

10. Lindeberg–Feller Central Limit Theorem.

Suppose X is a triangular array of random variables, i.e.,

X = {X1n, X2n, . . . , Xkn(n) : n ∈ {1, 2, . . .}}, with k(n) → ∞ as n → ∞,

such that, for each n, X1n, . . . , Xkn(n) are independently distributed and are bounded in absolute value by a constant yn with yn 0. Let

Zn = X1n + ∙ ∙ ∙ + Xkn(n).

If E(Zn) → μ and var(Zn) → σ2 > 0, then Zn converges in distribution to a normally distributed random variable with mean μ and variance σ2.

(86)

If X1, X2, . . . are iid with expectation μ and variance σ2, then define Xin := Xi − μ

nσ , i = 1,2, . . . , k(n) := n. For each n, X1n, . . . , Xkn(n) are independent and

E(Xin) = E(Xi) − μ nσ =

μ μ

nσ = 0, Var(Xin) = 1

n σ2 VarXi − μ =

1 n σ2 σ

2 = 1

n . Let Zn = X1n + ∙ ∙ ∙ + Xkn(n) = (Pni=1 Xi n μ) √1

nσ, then

E(Zn) =

k(n)

X

i=1

E(Xin) = 0,

Var(Zn) =

k(n)

X

i=1

Var(Xin) = 1. Hence, by Lindeberg–Feller,

Zn D

(87)

6. Optimization

1. Unconstrained Optimization.

Given f : Rn R.

Minimize f(x) over x Rn.

f has a local minimum at a point ˉx if f(ˉx) f(x) for all x near ˉx, i.e.

∃ ε > 0 s.t. f(ˉx) f(x) x : kx k < ε .

f has a global minimum at ˉx if

(88)

2. Optimality Conditions.

• First order necessary conditions:

Suppose that f has a local minimum at ˉx and that f is continuously differentiable in an open neighbourhood of ˉx. Then f(ˉx) = 0. (ˉx is called a stationary point.)

• Second order sufficient Conditions:

Suppose that f is twice continuously differentiable in an open neighbourhood of ˉ

x and that f(ˉx) = 0 and 2f(ˉx) is positive definite. Then ˉx is a strict local minimizer of f.

Example: Show that f = (2x21 x2)(x21 2x2) has a minimum at (0, 0) along any

straight line passing through the origin, but f has no minimum at (0,0).

Exercise: Find the minimum solution of

f(x1, x2) = 2x21 + x1x2 + x22 − x1 − 3x2. (4)

(89)

Sufficient Condition.

Taylor gives for any d Rn:

f(ˉx + d) = f(ˉx) + f(ˉx)T d + 21 dT 2f(ˉx + λ d)d λ (0, 1). If ˉx is not strict local minimizer, then

∃ {xk} ⊂ Rn \ {} : xk xˉ s.t. f(xk) ≤ f(ˉx). Define dk := xk−xˉ

kxkk. Then kdkk = 1 and there exists a subsequence {dkj} such that

dkj d? as j → ∞ and kd?k = 1. W.l.o.g. we assume dk d? as k → ∞. f(ˉx) f(xk) = f(ˉx + kxk − xˉk dk)

= f(ˉx) + kxk k ∇f(ˉx)T dk + 12 kxk k2dTk 2f(ˉx + λk kxk k dk)dk = f(ˉx) + 12 kxk − xˉk2dTk ∇2f(ˉx + λk kxk − xˉkdk) dk .

Hence dTk 2f(ˉx + λk kxk kdk)dk 0, and on letting k → ∞ dT? 2f(ˉx)d? 0.

(90)

Example 6.2. Show that f = (2x21x2)(x21 2x2) has a minimum at (0,0) along any straight line passing through the origin, but f has no minimum at (0, 0).

Answer.

Straight line through (0, 0): x2 = α x1, α R fixed.

g(r) := f(r, α r) = (2r2 α r) (r2 2 α r)

g0(r) = 8r3 15 α r2 + 4 α2 r, g00(r) = 24r2 30 α r + 4 α2

⇒ g0(0) = 0 and g00(0) = 4α2 > 0 .

Hence r = 0 is a minimizer for g ⇐⇒ (0,0) is a minimizer for f along any straight line.

Now let (x1k, xk2) = (1k, k12) (0, 0) as k → ∞. Then f(xk1, xk2) = 1

k2

1

k2 < 0 = f(0, 0) ∀ k .

Hence (0,0) is not a minimizer for f.

[Note: f(0,0) = 0, but 2f(0,0) = 0 0 0 4

!

(91)

3. Convex Optimization.

(92)

Exercise 6.3.

When f is convex, any local minimizer ˉx is a global minimizer of f.

Proof.

Suppose ˉx is a local minimizer, but not a global minimizer. Then

∃ xe s.t. f(x)e < f(ˉx). Since f is convex, we have that

f(λxe + (1 λ) ˉx) λ f(x) + (1e λ)f(ˉx)

< λ f(ˉx) + (1 λ) f(ˉx) = f(ˉx) λ (0, 1]. Let xλ := λxe + (1 λ) ˉx. Then

xλ xˉ and f(xλ) < f(ˉx) as λ → 0. This is a contradiction to ˉx being a local minimizer.

(93)

4. Line Search.

The basic procedure to solve numerically an unconstrained problem (minimize f(x) over x Rn) is as follows.

(i) Choose an initial point x0 Rn and an initial search direction d0 Rn and set k = 0.

(ii) Choose a step size αk and define a new point xk+1 = xk + αk dk. Check if the stopping criterion is satisfied (k∇f(xk+1)k < ε?). If yes, xk+1 is the optimal solution, stop. If no, go to (iii).

(iii) Choose a new search direction dk+1 (descent direction) and set k = k + 1. Go to (ii).

(94)

5. Steepest Descent Method.

f is differentiable.

Choose dk = gk, where gk = f(xk), and choose αk s.t. f(xk + αk dk) = min

αR f(x

k + α dk).

Note that the successive descent directions are orthogonal to each other, i.e. (gk)T gk+1 = 0, and the convergence for some functions may be very slow, called zigzagging.

Exercise.

(95)

Steepest Descent.

Taylor gives:

f(xk + α dk) = f(xk) + αf(xk)T dk + O(α2). As

∇f(xk)T dk = k∇f(xk)k kdkk cosθk,

with θk the angle between dk and f(xk), we see that dk is a descent direction if cosθk < 0. The descent is steepest when θk = π ⇐⇒ cosθk = 1.

Zigzagging.

αk is minimizer of φ(α) := f(xk + α dk) with dk = gk. Hence

(96)

Exercise 6.5.

Use the SD method to solve (4) with the initial point x0 = (1, 1). [min: 17 (1,11).]

Answer.

∇f = (4x1 + x2 1, 2x2 + x1 3).

Iteration 0: d0 = −∇f(x0) = (4,0) 6= (0, 0).

φ(α) = f(x0 + α d0) = f(1 4α,1) = 2 (1 4α)2 2 minimum point at α0 = 14

⇒ x1 = x0 + α0 d0 = (0,1),

d1 = −∇f(x1) = (0, 1) = (0, 1) 6= (0,0). Iteration 1: x2 = (0, 32), d2 = (12, 0).

(97)

6. Newton Method.

f is twice differentiable.

Choose dk = [Hk]−1gk, where Hk = 2f(xk). Set xk+1 = xk + dk.

If Hk is positive definite then dk is a descent direction.

The main drawback of the Newton method is that it requires the computation of

∇2f(xk) and its inverse, which can be difficult and time-consuming.

Exercise.

(98)

Newton Method.

Taylor gives

f(xk + d) f(xk) + dT f(xk) + 12 dT 2f(xk) d =: m(d) min

d m(d) ⇒ ∇m(d) = 0

⇒ ∇f(xk) + 2f(xk)d = 0 . Hence choose dk = [2f(xk)]−1 f(xk) = [Hk]−1gk. If Hk is positive definite, then so is (Hk)−1, and we get

(dk)T gk = (gk)T (Hk)−1 gk ≤ −σk kgkk2 < 0 for some σk > 0.

Hence dk is a descent direction.

(99)

Exercise 6.6.

Use the Newton method to minimize

f(x1, x2) = 2x21 + x1x2 + x22 x1 3x2 with x0 = (1,1)T.

Answer.

∇f = 4 x1 + x2 − 1 2 x2 + x1 3

!

, H := 2f = 4 1 1 2

!

.

H−1 = 1 detH

2 1

−1 4

!

= 17 2 −1

−1 4

!

.

Iteration 0: x0 = (1,1)T, f(x0) = (4, 0)T. x1 = x0 [H0]−1 f(x0) = 1

1

!

− 17 2 −1 −1 4

!

4 0

!

= 17 −1 11

!

.

⇒ ∇f(x1) = (0, 0)T and H positive definite.

(100)

7. Choice of Stepsize.

In computing the step size αk we face a tradeoff. We would like to choose αk to give a substantial reduction of f, but at the same time we do not want to spend too much time making the choice. The ideal choice would be the global minimizer of the univariate function φ : R R defined by

φ(α) = f(xk + α dk), α > 0, but in general, it is too expensive to identify this value.

A common strategy is to perform an inexact line search to identify a step size that achieves adequate reductions in f with minimum cost.

α is normally chosen to satisfy the Wolfe conditions:

f(xk + αk dk) f(xk) + c1 αk (gk)Tdk (5)

(101)

Choice of Stepsize.

The simple condition

f(xk + αk dk) < f(xk) () is not appropriate, as it may not lead to a sufficient reduction.

Example: f(x) = (x 1)2 1. So minf(x) = 1, but we can choose xk satisfying () such that f(xk) = 1k 0.

Note that the sufficient decrease condition (5)

φ(α) = f(xk + α dk) `(α) := f(xk) + c1 α(gk)Tdk

yields acceptable regions for α. Here φ(α) < `(α) for small α > 0, as (gk)Tdk < 0 for descent directions.

The curvature condition (6) is equivalent to

φ0(α) c2φ0(0) [ > φ0(0) ]

(102)

8. Convergence of Line Search Methods.

An algorithm is said to be globally convergent if lim

k→∞kg k

k = 0.

It can be shown that if the step sizes satisfy the Wolfe conditions

• then the steepest descent method is globally convergent,

• so is the Newton method provided the Hessian matrices 2f(xk) have a bounded condition number and are positive definite.

Exercise. Show that the steepest descent method is globally convergent if the

following conditions hold

(a) αk satisfies the Wolfe conditions, (b) f(x) M x Rn,

References

Related documents

The ques- tions guiding this research are: (1) What are the probabilities that Korean and Finnish students feel learned helplessness when they attribute academic outcomes to the

In relation to this study, a possible explanation for this is that students who were in the high mathematics performance group were driven to learn for extrinsic reasons and

The main purpose of this work was the evaluation of different CZM formulations to predict the strength of adhesively-bonded scarf joints, considering the tensile and shear

The results show that our multilin- gual annotation scheme proposal has been utilized to produce data useful to build anaphora resolution systems for languages with

ADC: Apparent diffusion coefficient; AP: Arterial perfusion; ceCT: Contrast- enhanced computed tomography; CEUS: Contrast-enhanced ultrasound; CRLM: Colorectal liver metastases

The unique optical and physical properties of AuNPs and the increased biocompatibility of PE-coated nanopar- ticles has led to a range of potential biomedical applica- tions,

Table 1 Core cellular processes, signaling pathways, genomic mutations in cancers, and identified imaging correlates (Co ntinued) Core cell ular proces s Sign aling pathway

Paper discolouration in a single journal type, the Wagga Wagga Daily Advertiser published from the dates 1876–2004 was chosen as the focus of study, with a specific hypothesis to