METHODS REQUIRING DERIVATIVES - Test for termination:

FUNCTIONS OF A SINGLE VARIABLE

Step 6. Test for termination:

2.5 METHODS REQUIRING DERIVATIVES

The search methods discussed in the preceding sections required the assump-tions of unimodality and, in some cases, continuity of the performance index being optimized. It seems reasonable that, in addition to continuity, if it is assumed that the function is differentiable, then further efﬁciencies in the search could be achieved. Recall from Section 2.2 that the necessary condition for a point z to be a local minimum is that the derivative at z vanish, that is, ƒ_⬘(z)_⫽ dƒ / dx_兩_x⫽z _⫽ 0.

When ƒ(x) is a function of third-degree or higher terms involving x, direct analytic solution of ƒ_⬘(x)_⫽0 would be difficult. Hence, a search method that successively approximates the stationary point of ƒ is required. We first de-scribe a classical search scheme for finding the root of a nonlinear equation that was originally developed by Newton and later refined by Raphson [5].

2.5.1 Newton–Raphson Method

The Newton–Raphson scheme requires that the function ƒ be twice differ-entiable. It begins with a point x₁that is the initial estimate or approximation to the stationary point or root of the equation ƒ_⬘(x) _⫽ 0. A linear approxi-mation of the function ƒ_⬘(x) at the point x₁ is constructed, and the point at which the linear approximation vanishes is taken as the next approximation.

Formally, given the point xkto be the current approximation of the stationary point, the linear approximation of the function ƒ_⬘(x) at xkis given by

ƒ˜_⬘(x; x )k _⫽ƒ_⬘(x )k _⫹ ƒ_ⴖ(x )(xk _⫺ x )k (2.7) Setting Eq. (2.7) to be zero, we get the next approximation point as

ƒ_⬘(x )_k xk⫹1_⫽ xk_⫺

ƒ_ⴖ(x )_k

Figure 2.14 illustrates the general steps of Newton’s method. Unfortunately, depending on the starting point and the nature of the function, it is quite

Figure 2.14. Newton–Raphson method (convergence).

Figure 2.15. Newton–Raphson method (divergence).

possible for Newton’s method to diverge rather than converge to the true stationary point. Figure 2.15 illustrates this difﬁculty. If we start at a point to the right of x₀, the successive approximations will be moving away from the stationary point z.

Example 2.10 Newton–Raphson Method

Consider the problem

2 16 Minimize ƒ(x)_⫽ 2x _⫹

Suppose we use the Newton–Raphson method to determine a stationary point of ƒ(x) starting at the point x₁ _⫽ 1:

16 32

ƒ_⬘(x)_⫽ 4x_⫺ ₂ ƒ_ⴖ(x)_⫽ 4_⫹ ₃

x x

We continue until_兩ƒ_⬘(xk)_{兩 ⬍ ␧}, where_␧ is a prespeciﬁed tolerance.

2.5.2 Bisection Method

If the function ƒ(x) is unimodal over a given search interval, then the optimal point will be the one where ƒ_⬘(x) _⫽ 0. If both the function value and the derivative of the function are available, then an efﬁcient region elimination search can be conducted using just a single point rather than a pair of points to identify a point at which ƒ_⬘(x) _⫽ 0. For example, if, at a point z, ƒ_⬘(z) _⬍ 0, then assuming that the function is unimodal, the minimum cannot lie to the left of z. In other words, the interval x _⭐ z can be eliminated. On the other hand, if ƒ_⬘(z)_⬎ 0, then the minimum cannot lie to the right of z, and the interval x _⭓ z can be eliminated. Based on these observations, the bisec-tion method (sometimes called the Bolzano search) can be constructed.

Determine two points L and R such that ƒ_⬘(L) _⬍ 0 and ƒ_⬘(R) _⬎ 0. The stationary point is between the points L and R. We determine the derivative of the function at the midpoint,

L_⫹ R z_⫽

If ƒ_⬘(z) _⬎ 0, then the interval (z, R) can be eliminated from the search. On the other hand, if ƒ_⬘(z) _⬍ 0, then the interval (L, z) can be eliminated. We shall now state the formal steps of the algorithm.

Given a bounded interval a_⭐ x _⭐ b and a termination criterion_␧: Step 1. Set R _⫽ b, L _⫽ a; assume ƒ_⬘(a)_⬍ 0 and ƒ_⬘(b)_⬎ 0.

Step 2. Calculate z _⫽ (R_⫹ L) / 2, and evaluate ƒ_⬘(z).

Step 3. If_兩ƒ_⬘(z)_{兩 ⭐ ␧}, terminate. Otherwise, if ƒ_⬘(z)_⬍ 0, set L_⫽ z and go to step 2. If ƒ_⬘(z) _⬎ 0, set R_⫽ z and go to step 2.

Note that the search logic of this region elimination method is based purely on the sign of the derivative and does not use its magnitude. A method that uses both is the secant method, to be discussed next.

Figure 2.16. Secant method.

2.5.3 Secant Method

The secant method combines Newton’s method with a region elimination scheme for ﬁnding a root of the equation ƒ_⬘(x)_⫽ 0 in the interval (a, b) if it exists. Suppose we are interested in ﬁnding the stationary point of ƒ(x) and we have two points L and R in (a, b) such that their derivatives are opposite in sign. The secant algorithm then approximates the function ƒ_⬘(x) as a ‘‘se-cant line’’ (a straight line between the two points) and determines the next point where the secant line of ƒ_⬘(x) is zero (see Figure 2.16). Thus, the next approximation to the stationary point x* is given by

ƒ_⬘(R) z _⫽R_⫺

[ƒ_⬘(R) _⫺ ƒ_⬘(L)] / (R_⫺ L)

If_兩ƒ_⬘(z)_{兩 ⭐ ␧}, we terminate the algorithm. Otherwise, we select z and one of the points L or R such that their derivatives are opposite in sign and repeat the secant step. For example, in Figure 2.16, we would have selected z and R as the next two points. It is easy to see that, unlike the bisection search, the secant method uses both the magnitude and sign of the derivatives and hence can eliminate more than half the interval in some instances (see Figure 2.16).

Example 2.11 Secant Method

Consider again the problem of Example 2.10:

2 16

Minimize ƒ(x) _⫽ 2x _⫹ over the interval 1_⭐ x_⭐ 5 x

dƒ(x) 16

ƒ_⬘(x)_⫽ _⫽ 4x_⫺ ₂

dx x

Iteration 2

Step 2. z _⫽ 2.53_⫺ 7.62 _⫽ 1.94 (7.62 _⫹12) /1.53

Step 3. ƒ_⬘(z)_⫽ 3.51 _⬎ 0; set R_⫽ 1.94.

Continue until_兩ƒ_⬘(z)_{兩 ⭐ ␧}.

2.5.4 Cubic Search Method

This is a polynomial approximation method in which a given function ƒ to be minimized is approximated by a third-order polynomial. The basic logic is similar to the quadratic approximation scheme. However, in this instance, because both the function value and the derivative value are available at each point, the approximating polynomial can be constructed using fewer points.

The cubic search starts with an arbitrary point x₁ and ﬁnds another point x₂ by a bounding search such that the derivatives ƒ_⬘(x₁) and ƒ_⬘(x₂) are of opposite sign. In other words, the stationary pointxwhere ƒ_⬘(x)_⫽0 is brack-eted between x₁ and x₂. A cubic approximation function of the form

ƒ(x)_⫽ a₀_⫹ a (x₁ _⫺ x )₁ _⫹ a (x₂ _⫺ x )(x₁ _⫺ x )₂ _⫹ a (x₃ _⫺x ) (x₁ 2 _⫺ x )₂ (2.8) is ﬁtted such that Eq. (2.8) agrees with ƒ(x) at the two points x₁ and x₂. The ﬁrst derivative of ƒ(x)is given by

dƒ(x) ₂

⫽ a₁_⫹ a (x₂ _⫺ x )₁ _⫹a (x₂ _⫺ x )₂ _⫹ a (x₃ _⫺x )₁ _⫹2a (x₃ _⫺ x )(x₁ _⫺ x )₂ dx

(2.9) The coefﬁcients a₀, a₁, a₂, and a₃ of Eq. (2.8) can now be determined using the values of ƒ(x₁), ƒ(x₂), ƒ_⬘(x₁), and ƒ_⬘(x₂) by solving the following linear equations:

ƒ₁ _⫽ƒ(x )₁ _⫽a₀

ƒ₂ _⫽ƒ(x )₂ _⫽a₀ _⫹ a (x₁ ₂ _⫺x )₁ ƒ_{⬘ ⫽}₁ ƒ_⬘(x )₁ _⫽a₁ _⫹a (x₂ ₁ _⫺x )₂

ƒ_{⬘ ⫽}₂ ƒ_⬘(x )₂ _⫽a₁ _⫹a (x₂ ₂ _⫺x )₁ _⫹ a (x₃ ₂_⫺ x )₁ 2

Note that the above system can be solved very easily in a recursive manner.

Then, as in the quadratic case discussed earlier, given these coefﬁcients, an estimate of the stationary point of ƒ can be obtained from the approximating cubic of Eq. (2.8). In this case, when we set the derivative ofƒ(x)given by Eq. (2.9) to zero, we get a quadratic equation. By applying the formula for the root of the quadratic equation, a closed-form solution to the stationary pointx of the approximating cubic is obtained as follows:

x₂ if␮ ⬍ 0

The deﬁnition of w in Eq. (2.10) ensures that the proper root of the quadratic equation is selected, while forcing␮to lie between 0 and 1 ensures that the predicted pointxlies between the bounds x₁and x₂. Once again we select the next two points for the cubic approximation as x and one of the points x₁or x₂such that the derivatives of the two points are of opposite sign and repeat the cubic approximation.

Given an initial point x₀, positive step size _⌬, and termination parameters

␧1 and _␧₂, the formal steps of the cubic search algorithm are as follows:

Step 1. Compute ƒ_⬘(x₀).

If ƒ_⬘(x₀) _⬍ 0, compute xK⫹1 ⫽ xK⫹ 2^K_⌬ for K_⫽ 0, 1, . . . . If ƒ_⬘(x₀) _⬎ 0, compute xK⫹1 ⫽ xK⫺ 2^K_⌬ for K_⫽ 0, 1, 2, . . . .

Step 2. Evaluate ƒ_⬘(x) for points xK⫹1 for K _⫽ 0, 1, 2, . . . until a point xM

is reached at which ƒ_⬘(xM⫺1)ƒ_⬘(xM) _⭐ 0.

In document Engineering Optimization (Page 76-82)