Newton’s method - Univariate optimization

Nonlinear programming: theory and algorithms

5.3 Univariate optimization

5.3.2 Newton’s method

The main workhorse of many optimization algorithms is a centuries-old technique for the solution of nonlinear equations developed by Sir Isaac Newton. We will

5.3 Univariate optimization 87 discuss the multivariate version of Newton’s method later. We focus on the univari-ate case first. For a given nonlinear function f we want to find an x such that

f (x)= 0.

Assume that f is continuously differentiable and that we currently have an estimate x^k of the solution (we will use superscripts for iteration indices in the following discussion). The first-order (i.e., linear) Taylor series approximation to the function

f around x^k can be written as follows:

f (x^k+ δ) ≈ ˆf(δ) := f (x^k)+ δ f(x^k).

This is equivalent to saying that we can approximate the function f by the line ˆf(δ) that is tangent to it at x^k. If the first-order approximation ˆf(δ) were perfectly good, and if f(x^k) = 0, the value of δ that satisfies

ˆf(δ) = f (x^k)+ δ f(x^k)= 0

would give us the update on the current iterate x^knecessary to get to the solution.

This value ofδ is computed easily:

δ = − f (x^k) f(x^k).

The expression above is called the Newton update and Newton’s method determines its next estimate of the solution as

x^k+1= x^k+ δ = x^k− f (x^k) f(x^k).

Since ˆf(δ) is only an approximation to f (x^k+ δ), we do not have a guarantee that f (x^k⁺¹) is zero, or even small. However, as we discuss below, when x^k is close enough to a solution of the equation f (x)= 0, x^k⁺¹ is even closer. We can then repeat this procedure until we find an x^ksuch that f (x^k)= 0, or in most cases, until

f (x^k) becomes reasonably small, say, less than some pre-specifiedε > 0.

There is an intuitive geometric explanation of the procedure we just described:

we first find the line that is tangent to the function at the current iterate, then we calculate the point where this line intersects the x-axis and set the next iterate to this value and repeat the process. See Figure 5.1 for an illustration.

Example 5.3 Let us recall Example 5.1 where we computed the IRR of an invest-ment. Here we solve the problem using Newton’s method. Recall that the yield r must satisfy the equation

f (r )= 100

1+ r + 100

(1+ r)² + 100

(1+ r)³ + 1100

(1+ r)⁴ − 900 = 0.

−0.05 0 0.05 0.1 0.15 0.2 0

200 400 600 800 1000

f(r)

x₀ = 0 x₁ = 0.1

f(r)tangent

f(0) = 500 f′(0) = −5000

Figure 5.1 First step of Newton’s method in Example 5.3

The derivative of f (r ) is easily computed:

f(r )= − 100

(1+ r)² − 200

(1+ r)³ − 300

(1+ r)⁴ − 4400 (1+ r)⁵.

We need to start Newton’s method with an initial guess, let us choose x⁰= 0.

Then

x¹= x⁰− f (0) f(0)

= 0 − 500

−5000 = 0.1.

We mentioned above that the next iterate of Newton’s method is found by calculating the point where the line tangent to f at the current iterate intersects the axis. This observation is illustrated in Figure 5.1.

Since f (x¹)= f (0.1) = 100 is far from zero we continue by substituting x¹ into the Newton update formula to obtain x²= 0.131547080371 and so on. The complete iteration sequence is given in Table 5.3.

A few comments on the speed and reliability of Newton’s method are in order.

Under favorable conditions, Newton’s method converges very fast to a solution of a nonlinear equation. Indeed, if x^k is sufficiently close to a solution x^∗ and if

f(x^∗) = 0, then the following relation holds:

x^k+1− x^∗≈ C(x^k− x^∗)²with C = f(x^∗)

2 f(x^∗). (5.2)

5.3 Univariate optimization 89 Table 5.3 Newton’s method for Example 5.3

k x^k f (x^k)

0 0.000000000000 500.000000000000 1 0.100000000000 100.000000000000 2 0.131547080371 6.464948211497 3 0.133880156946 0.031529863053 4 0.133891647326 0.000000758643 5 0.133891647602 0.000000000000

Equation (5.2) indicates that the error in our approximation (x^k − x^∗) is approxi-mately squared in each iteration. This behavior is called the quadratic convergence of Newton’s method. Note that the number of correct digits is doubled in each iteration of the example above and the method required much fewer iterations than the binary search approach.

However, when the “favorable conditions” we mentioned above are not satis-fied, Newton’s method may fail to converge to a solution. For example, consider f (x)= x³− 2x + 2. Starting at 0, one would obtain iterates cycling between 1 and 0. Starting at a point close to 1 or 0, one similarly gets iterates alternating in close neighborhoods of 1 and 0, without ever reaching the root around−1.76. Therefore, it often has to be modified before being applied to general problems. Common mod-ifications of Newton’s method include the line-search and trust-region approaches.

We briefly discuss line search approaches in Section 5.3.3. More information on these methods can be found in standard numerical optimization texts such as [61].

Next, we derive a variant of Newton’s method that can be applied to univariate optimization problems. If the function to be minimized/maximized has a unique minimizer/maximizer and is twice differentiable, we can do the following. Dif-ferentiability and the uniqueness of the optimizer indicate that x^∗ maximizes (or minimizes) g(x) if and only if g(x^∗)= 0. Defining f (x) = g(x) and applying Newton’s method to this function we obtain iterates of the following form:

x^k⁺¹= x^k− f (x^k)

f(x^k) = x^k − g(x^k) g(x^k).

Example 5.4 Let us apply the optimization version of Newton’s method to Exam-ple 5.2 . Recalling that f (x)= x⁵− 10x²+ 2x, we have f(x)= 5x⁴− 20x + 2 and f(x)= 20(x³− 1). Thus, the Newton update formula is given as

x^k⁺¹= x^k−5(x^k)⁴− 20x^k+ 2 20[(x^k)³− 1] .

Starting from 0 and iterating we obtain the sequence given in Table 5.4.

Table 5.4 Iterates of Newton’s method in Example 5.4

k x^k f (x^k) f(x^k)

0 0.000000000000 0.000000000000 2.000000000000 1 0.100000000000 0.100010000000 0.000500000000 2 0.100025025025 0.100010006256 0.000000000188 3 0.100025025034 0.100010006256 0.000000000000

Once again, observe that Newton’s method converged very rapidly to the solution and generated several more digits of accuracy than the golden section search. Note, however, that the method would have failed if we had chosen x⁰ = 1 as our starting point.

Exercise 5.5 Repeat Exercises 5.2, 5.3, and 5.4 using Newton’s method.

Exercise 5.6 We derived Newton’s method by approximating a given function f using the first two terms of its Taylor series at the current point xk. When we use Taylor series approximation to a function, there is no a priori reason that tells us to stop at two terms. We can consider, for example, using the first three terms of the Taylor series expansion of the function to get a quadratic approximation. Derive a variant of Newton’s method that uses this approximation to determine the roots of the function f . Can you determine the rate of convergence for this new method, assuming that the method converges?

In document Optimization Methods in Finance (Page 100-104)