Newton’s Method - Univariate Optimization

5.3 Univariate Optimization

5.3.2 Newton’s Method

The main workhorse of many optimization algorithms is a centuries old technique for the solution of nonlinear equations developed by Sir Isaac Newton. We will discuss the multivariate version of Newton’s method later. We focus on the univariate case first. For a given nonlinear function f we want to find an x such that

f (x) = 0.

Assume that f is continuously differentiable and that we currently have an estimate xk _{of the solution (we will use superscripts for iteration indices in}

the following discussion). The first order (i.e., linear) Taylor series approximation to the function f around xk can be written as follows:

f (xk+ δ) ≈ ˆf (δ) := f (xk) + δf0(xk).

This is equivalent to saying that we can approximate the function f by the line ˆf (δ) that is tangent to it at xk_{. If the first order approximation ˆ}_{f (δ)}

were perfectly good, and if f0(xk)6= 0, the value of δ that satisfies ˆ

f (δ) = f (xk) + δf0(xk) = 0

would give us the update on the current iterate xk necessary to get to the solution. This value of δ is computed easily:

δ = −f (x

k₎

f0_(xk₎.

The expression above is called the Newton update and Newton’s method determines its next estimate of the solution as

xk+1 = xk+ δ = xk₋ f (x

k₎

f0_(xk₎.

Since ˆf (δ) is only an approximation to f (xk+δ), we do not have a guarantee that f (xk+1) is zero, or even small. However, as we discuss below, when xk is close enough to a solution of the equation f (x) = 0, xk+1 _{is even closer.}

We can then repeat this procedure until we find an xk such that f (xk) = 0, or in most cases, until f (xk) becomes reasonably small, say, less than some pre-specified ε > 0.

There is an intuitive geometric explanation of the procedure we just described: We first find the line that is tangent to the function at the current iterate, then we calculate the point where this line intersects the x-axis, and we set the next iterate to this value and repeat the process. See Figure 5.1 for an illustration.

−0.05 0 0.05 0.1 0.15 0.2 0 200 400 600 800 1000 r f(r) x 0=0 x1=0.1 f(r) tangent f(0)=500 f’(0)=−5000

Figure 5.1: First step of Newton’s method in Example 5.3

Example 5.3 Let us recall Example 5.1 where we computed the IRR of an investment. Here we solve the problem using Newton’s method. Recall that the yield r must satisfy the equation

f (r) = 100 1 + r + 100 (1 + r)2 + 100 (1 + r)3 + 1100 (1 + r)4 − 900 = 0.

The derivative of f (r) is easily computed: f0(r) =− 100 (1 + r)2 − 200 (1 + r)3 − 300 (1 + r)4 − 4400 (1 + r)5.

We need to start Newton’s method with an initial guess, let us choose x0 = 0. Then

x1 = x0− f (0) f0₍₀₎

= 0₋ 500

−5000 = 0.1

We mentioned above that the next iterate of Newton’s method is found by cal- culating the point where the line tangent to f at the current iterate intersects the axis. This observation is illustrated in Figure 5.1.

Since f (x1) = f (0.1) = 100 is far from zero we continue by substituting x1 into the Newton update formula to obtain x2 = 0.131547080371 and so on. The complete iteration sequence is given in Table 5.3.

A few comments on the speed and reliability of Newton’s method are in order. Under favorable conditions, Newton’s method converges very fast

Table 5.3: Newton’s method for Example 5.3 k xk f (xk ) 0 0.000000000000 500.000000000000 1 0.100000000000 100.000000000000 2 0.131547080371 6.464948211497 3 0.133880156946 0.031529863053 4 0.133891647326 0.000000758643 5 0.133891647602 0.000000000000

to a solution of a nonlinear equation. Indeed, if xk is sufficiently close to a solution x∗ _{and if f}0_(x∗₎_{6= 0, then the following relation holds:}

xk+1_{− x}∗ _{≈ C(x}k_{− x}∗)2 with C = f00(x∗)

2f0_(x∗₎ (5.2)

(5.2) indicates that, the error in our approximation (xk− x∗_{) is approx-}

imately squared in each iteration. This behavior is called the quadratic convergence of Newton’s method. Note that the number of correct digits is doubled in each iteration of the example above and the method required much fewer iterations than the simple bisection approach.

However, when the ‘favorable conditions’ we mentioned above are not satisfied, Newton’s method may fail to converge to a solution. For example, consider f (x) = x3_{− 2x + 2. Starting at 0, one would obtain iterates cycling} between 1 and 0. Starting at a point close to 1 or 0, one similarly gets iterates alternating in close neighborhoods of 1 and 0, without ever reaching the root around -1.76. Therefore, it often has to be modified before being applied to general problems. Common modifications of Newton’s method include the line-search and trust-region approaches. We briefly discuss line search approaches in Section 5.3.3. More information on these methods can be found in standard numerical optimization texts such as [55].

Next, we derive a variant of Newton’s method that can be applied to univariate optimization problems. If the function to be minimized/maximized has a unique minimizer/maximizer and is twice differentiable, we can do the following. Differentiability and the uniqueness of the optimizer indicate that x∗ maximizes (or minimizes) g(x) if and only if g0(x∗) = 0. Defin- ing f (x) = g0_{(x) and applying Newton’s method to this function we obtain}

iterates of the following form:

xk+1 = xk₋ f (x

k₎

f0_(xk₎ = x

k₋ g0(xk)

g00_(xk₎.

Example 5.4 Let us apply the optimization version of Newton’s method to Example 5.2. Recalling that f (x) = x5 _{− 10x}2 _{+ 2x, we have f}0_{(x) =}

5x4− 20x + 2 and f00_{(x) = 20(x}3_{− 1). Thus, the Newton update formula is}

given as

xk+1= xk−5(x

k₎4_{− 20x}k_{+ 2}

Table 5.4: Iterates of Newton’s method in Example 5.4 k xk f (xk ) f0_(xk ) 0 0.000000000000 0.000000000000 2.000000000000 1 0.100000000000 0.100010000000 0.000500000000 2 0.100025025025 0.100010006256 0.000000000188 3 0.100025025034 0.100010006256 0.000000000000

Starting from 0 and iterating we obtain the sequence given in Table 5.4. Once again, observe that Newton’s method converged very rapidly to the solution and generated several more digits of accuracy than the golden section search. Note however that the method would have failed if we had chosen x0 = 1 as our starting point.

Exercise 5.5 Repeat Exercises 5.2, 5.3, and 5.4 using Newton’s method. Exercise 5.6 We derived Newton’s method by approximating a given function f using the first two terms of its Taylor series at the current point xk.

When we use Taylor series approximation to a function, there is no a priori reason that tells us to stop at two terms. We can consider, for example, using the first three terms of the Taylor series expansion of the function and get a quadratic approximation. Derive a variant of Newton’s method that uses this approximation to determine the roots of the function f . Can you determine the rate of convergence for this new method, assuming that the method converges?

In document Optimization Methods in Finance (Page 92-95)