When the function is not unimodal, multiple local optima are possible and the global minimum can be found only by locating all local optima

FUNCTIONS OF A SINGLE VARIABLE

3. When the function is not unimodal, multiple local optima are possible and the global minimum can be found only by locating all local optima

and selecting the best one.

In Figure 2.7, x₁is the global maximum, x₂ is a local minimum, x₃is a local maximum, x₄is the global minimum, and x₅ may be considered as both local minimum and local maximum points.

Identiﬁcation of Single-Variable Optima. Suppose ƒ(x) is a function of a sin-gle variable x deﬁned on an open interval (a, b) and ƒ is differentiable to the nth order over the interval. If x* is a point within that interval, then Taylor’s

Figure 2.7. Local and global optima.

theorem allows us to write the change in the value of ƒ from x* to x* _{⫹ ␧} as follows:

2 2 n n

dƒ _␧ d ƒ _␧ d ƒ

ƒ(x*_{⫹ ␧}) _⫺ ƒ(x*)_⫽ (_␧) ^dx冏^x⫽x*_⫹ ^{2! dx}²冏x⫽x*_{⫹ 䡠䡠䡠 ⫹}^{n! dx}ⁿ冏x⫽x*

⫹O_n⫹1(_␧) (2.1)

where O_n⫹1(_␧) indicates terms of (n _⫹ 1)st order or higher in _␧. If x* is a local minimum of ƒ on (a, b), then from the deﬁnition there must be an _␧ neighborhood of x* such that for all x within a distance_␧

ƒ(x)_⭓ƒ(x*) (2.2)

Inequality (2.2) implies that

2 2 n n

dƒ _␧ d ƒ _␧ d ƒ

␧^dx冏^x⫽x*⫹ ^{2! dx}²冏x⫽x*⫹ 䡠䡠䡠 ⫹^{n! dx}ⁿ冏x⫽x* ⫹Oⁿ^⫹1(_␧) _⭓ 0 (2.3) For _␧sufﬁciently small, the ﬁrst term will dominate the others, and since

␧ can be chosen both positive and negative, it follows that inequality (2.3) will hold only when

dƒ^dx冏^x⫽x*⫽ 0 (2.4)

Continuing this argument further, inequality (2.3) will be true only when

d ƒ2

⭓ 0 (2.5)

冏

dx2 _x_⫽x*

The same construction applies in the case of a local maximum but with in-equality (2.2) reversed, and we obtain the following general result:

Theorem 2.1

Necessary conditions for x* to be a local minimum (maximum) of ƒ on the open interval (a, b), providing that ƒ is twice differentiable, are that

1. dƒ^dx冏x⫽x* _⫽ 0

2. _⭓ 0 (_⭐ 0)

d ƒ2²冏

dx _x_⫽x*

These are necessary conditions, which means that if they are not satisﬁed, then x* is not a local minimum (maximum). On the other hand, if they are satisﬁed, we still have no guarantee that x* is a local minimum (maximum).

For example, consider the function ƒ(x)_⫽x³, shown in Figure 2.8. It satisﬁes the necessary conditions for both local minimum and local maximum at the origin, but the function does not achieve a minimum or a maximum at x*_⫽ 0.

Deﬁnitions

A stationary point is a point x* at which dƒ冏 ⫽0 dx _x_⫽x*

An inﬂection point or saddlepoint is a stationary point that does not corre-spond to a local optimum (minimum or maximum).

To distinguish whether a stationary point corresponds to a local minimum, a local maximum, or an inﬂection point, we need the sufﬁcient conditions of optimality.

Theorem 2.2

Suppose at a point x* the ﬁrst derivative is zero and the ﬁrst nonzero higher order derivative is denoted by n.

Figure 2.8. Illustration of inﬂection point.

This result is easily veriﬁed by recourse to the Taylor series expansion given in Eq. (2.1). Since the ﬁrst nonvanishing higher order derivative is n, Eq. (2.1) reduces to

n n

␧ d ƒ

ƒ(x* _{⫹ ␧}) _⫺ƒ(x*)_⫽ ^{n! dx}ⁿ冏^x⫽x*_⫹ Oⁿ^⫹1(_␧) (2.6) If n is odd, then the right-hand side of Eq. (2.6) can be made positive or negative by choosing_␧ positive or negative. This implies that depending on the sign of _␧, ƒ(x* _{⫹ ␧}) _⫺ ƒ(x*) could be positive or negative. Hence the function does not attain a minimum or a maximum at x*, and x* is an in-ﬂection point.

Now, consider the case when n is even. Then the term_␧ⁿis always positive, and for all_␧sufﬁciently small the sign of Eq. (2.6) will be dominated by the ﬁrst term. Hence, if (dⁿƒ / dxⁿ)_兩_x⫽x*is positive, ƒ(x*_{⫹ ␧})_⫺ ƒ(x*)_⬎0, and x*

corresponds to a local minimum. A similar argument can be applied in the case of a local maximum.

Applying Theorem 2.2 to the function ƒ(x) _⫽ x³ shown in Figure 2.8, we note that

2 3

dƒ d ƒ d ƒ

⫽ 0 _⫽0 _⫽6

冏 ²冏 ³冏

dx _x_⫽0 dx _x_⫽0 dx _x_⫽0

Thus the ﬁrst nonvanishing derivative is 3 (odd), and x_⫽ 0 is an inﬂection point.

Remark

In the above discussion, we have assumed that the function is always differ-entiable, in other words, that continuous ﬁrst derivatives always exist. How-ever, if the function is not differentiable at all points, then even the necessary condition for an unconstrained optimum, that the point is a stationary point, may not hold. For example, consider the piecewise linear function given by

x for x_⭐ 2

ƒ(x)_⫽ 再⁴^⫺ ^x ^{for x}^⭓ ²

The function is continuous at all points. But it is not differentiable at the point x_⫽ 2, and the function attains its maximum at x _⫽ 2, which is not a stationary point by our deﬁnition.

Example 2.1

Consider the function

6 5 –––165 4 3

ƒ(x)_⫽ 5x _⫺ 36x _⫹ ₂ x _⫺ 60x _⫹ 36 deﬁned on the real line. The ﬁrst derivative of this function is

dƒ ₅ ₄ ₃ ₂ ₂

⫽ 30x _⫺ 180x _⫹ 330x _⫺ 180x _⫽ 30x (x_⫺ 1)(x_⫺ 2)(x_⫺ 3) dx

Clearly, the ﬁrst derivative vanishes at x_⫽ 0, 1, 2, 3, and hence these points can be classiﬁed as stationary points. The second derivative of ƒ is

d ƒ2 ₄ ₃ ₂

⫽150x _⫺720x _⫹990x _⫺ 360x dx2

Evaluating this derivative at the four candidate points x_⫽0, 1, 2, 3, we obtain

x ƒ(x) d²ƒ /dx²

0 36 0

1 27.5 60

2 44 _⫺120

3 5.5 540

Since this is an odd-order derivative and is nonzero, the point x _⫽ 0 is not an optimum point but an inﬂection point.

The next question is how to determine the global maximum or minimum of a function of one variable. Since the global optimum has to be a local optimum, a simple approach is to compute all local optima and choose the best. An algorithm based on this is given below:

Maximize ƒ(x) Subject to a _⭐x _⭐b

where a and b are practical limits on the values of the variable x.

Once a function is bounded in an interval, you should notice that in ad-dition to the stationary points, the boundary points can also qualify for the local optimum.

Step 1. Set dƒ /dx _⫽ 0 and compute all stationary points.

Step 2. Select all stationary points that belong to the interval [a, b]. Call them x₁, x₂, . . . , xN. These points, along with a and b, are the only points that can qualify for a local optimum.

Step 3. Find the largest value of ƒ(x) out of ƒ(a), ƒ(b), ƒ(x₁), . . . , ƒ(xN).

This value becomes the global maximum point.

Note: We did not try to classify the stationary points as local minimum, local maximum, or inﬂection points, which would involve the calculation of higher order derivatives. Instead it is easier to just compute their functional values and discard them.

Example 2.2

Maximize ƒ(x) _{⫽ ⫺}x³ _⫹ 3x² _⫹ 9x _⫹ 10 in the interval _⫺2 _⭐ x_⭐ 4. Set

dƒ ₂

⫽ ⫺3x _⫹ 6x_⫹ 9_⫽ 0 dx

Solving this equation, we get x_⫽3 and x_{⫽ ⫺}1 as the two stationary points, and both are in the search interval.

To ﬁnd the global maximum, evaluate ƒ(x) at x _⫽ 3,_⫺1,_⫺2, and 4:

ƒ(3)_⫽37 ƒ(_⫺1) _⫽ 5 ƒ(_⫺2) _⫽12 ƒ(4)_⫽ 30 Hence x _⫽ 3 maximizes ƒ over the interval (_⫺2, 4).

Instead of evaluating all the stationary points and their functional values, we could use certain properties of the function to determine the global opti-mum faster. At the end of Section 2.1, we introduced unimodal functions, for which a local optimum is the global optimum. Unfortunately, the deﬁnition of unimodal functions does not suggest a simple test for determining whether or not a function is unimodal. However, there exists an important class of unimodal functions in optimization theory, known as convex and concave functions, which can be identiﬁed with some simple tests. A review of convex and concave functions and their properties is given in Appendix B.

Example 2.3

Let us examine the properties of the function ƒ(x)_⫽ (2x_⫹ 1) (x2 _⫺ 4) ƒ_⬘(x)_⫽ (2x_⫹ 1)(6x_⫺ 15) ƒ_ⴖ(x)_⫽ 24(x_⫺ 1)

For all x _⭐ 1, ƒ_ⴖ(x) _⭐ 0, and the function is concave in this region. For all x _⭓ 1, ƒ_ⴖ(x)_⭓ 0, and the function is convex in the region.

Note that the function has two stationary points, x _⫽ ⫺–¹₂ and x _⫽ –⁵₂. ƒ_ⴖ(⫺–¹₂) _⬍ 0, and the function has a local maximum at x _⫽ _⫺–¹₂. At x _⫽ –⁵₂, ƒ_ⴖ(–⁵₂) _⬎ 0, and the function attains a local minimum at this point. If we restricted the region of interest to x_⭐1, then ƒ(x) attains the global maximum at x_⫽⫺–¹2since ƒ(x) is concave in this region and x_⫽⫺–¹2 is a local maximum.

Similarly, if we restricted the region of interest to x_⭓1, then ƒ(x) attains the global minimum at x _⫽ –⁵₂. However, over the entire range of x from _⫺⬁ to

⫹⬁, ƒ(x) has no ﬁnite global maximum or minimum.

Example 2.4 Inventory Control

Many ﬁrms maintain an inventory of goods to meet future demand. Among the reasons for holding inventory is to avoid the time and cost of constant replenishment. On the other hand, to replenish only infrequently would imply large inventories that would tie up unnecessary capital and incure huge storage

Figure 2.9. Inventory problem.

inventory is penalized by assuming that each unit will cost $h to store for one year. To keep things simple, we will assume that all demand must be met immediately (i.e., no back orders are allowed) and that replenishment occurs instantaneously as soon as an order is sent.

Figure 2.9 graphically illustrates the change in the inventory level with respect to time. Starting at any time A with an inventory of B, the inventory level will decrease at the rate of _␭units per unit time until it becomes zero at time C, when a fresh order is placed.

The triangle ABC represents one inventory cycle that will be repeated throughout the year. The problem is to determine the optimal order quantity B, denoted by variable Q, and the common length of time C _⫺ A, denoted by T, between reorders.

Since T is just the length of time required to deplete Q units at rate _␭, we get

T_⫽ Q

␭

The only remaining problem is to determine Q. Note that when Q is small, T will be small, implying more frequent reorders during the year. This will result in higher reorder costs but lower inventory holding cost. On the other hand, a large inventory (large Q) will result in a higher inventory cost but lower reorder costs. The basic inventory control problem is to determine the optimal value of Q that will minimize the sum of the inventory cost and reorder cost in a year.

We shall now develop the necessary mathematical expression to optimize the yearly cost (cost / cycle _⫻ number of cycles / year).

1 _␭ Number of cycles (reorders) / year_{⫽ ⫽}

T Q

Cost per cycle_⫽ reorder cost_⫹ inventory cost

⫽ (K_⫹ cQ)_⫹ 冉冊Q² hT hQ2

⫽ K_⫹cQ_⫹ 冉冊²^␭

Note: The inventory cost per cycle is simply the cost of holding an average inventory of Q / 2 for a length of time T.

Thus, the yearly cost to be minimized is

␭K hQ ƒ(Q) _⫽ _⫹ _␭c_⫹

Q 2

⫺␭K h ƒ_⬘(Q) _⫽ ₂ _⫹

Q 2

2␭K

ƒ_ⴖ(Q) _⫽ ₃ _⬎ 0 for all Q_⬎ 0 Q

Hence ƒ(Q) is a convex function, and if there exists a positive Q* such that ƒ_⬘(Q*) _⫽ 0, then Q* minimizes ƒ(Q).

Solving ƒ_⬘(Q) _⫽ 0, we get

2_␭K Q*_⫽

冪

^h _⬎ 0 Thus, the optimal order quantity is

2␭K Q*_⫽

冪

and

T*_⫽ time between reorders_⫽

冪

2K^h^␭

The parameter Q* is the famous economic order quantity, or EOQ, used frequently in inventory control.

Figure 2.10. Case (i) and case (ii) of Theorem 2.3.

velop a number of single-variable search methods for locating the optimal point in a given interval. Search methods that locate a single-variable optimum by successively eliminating subintervals so as to reduce the remaining interval of search are called region elimination methods.

In Section 2.1, we introduced the deﬁnition of unimodal functions. uni-modality is an extremely important functional property and virtually all the single-variable search methods currently in use require at least the assumption that within the domain of interest the function is unimodal. The usefulness of this property lies in the fact that if ƒ(x) is unimodal, then it is only nec-essary to compare ƒ(x) at two different points to predict in which of the subintervals deﬁned by those two points the optimum does not lie.

Theorem 2.3

Suppose ƒ is strictly unimodal^† on the interval a _⭐ x _⭐ b with a minimum at x*. Let x₁ and x₂ be two points in the interval such that a _⬍ x₁ _⬍x₂_⬍ b.

Comparing the functional values at x₁and x₂, we can conclude:

(i) If ƒ(x₁) _⬎ ƒ(x₂), then the minimum of ƒ(x) does not lie in the interval (a, x₁). In other words, x*_僆 (x₁, b) (see Figure 2.10).

(ii) If ƒ(x₁)_⬍ ƒ(x₂), then the minimum does not lie in the interval (x₂, b) or x*_僆 (a, x₂) (see Figure 2.10).

†A function is strictly unimodal if it is unimodal and has no intervals of ﬁnite length in which the function is of constant value.

Proof

Consider case (i), where ƒ(x₁)_⬎ ƒ(x₂). Suppose we assume the contrary, that a_⭐ x* _⭐x₁. Since x* is the minimum point, by deﬁnition we have ƒ(x*)_⭐ ƒ(x) for all x_僆 (a, b). This implies that

ƒ(x*)_⭐ ƒ(x )₁ _⬎ ƒ(x )₂ with x*_⬍ x₁_⬍ x₂

But this is impossible, because the function has to be monotonic on either side of x* by the unimodality of ƒ(x). Thus, the theorem is proved by con-tradiction. A similar argument holds for case (ii).

Note: When ƒ(x₁) _⫽ ƒ(x₂), we could eliminate both ends, (a, x₁) and (x₂, b), and the minimum must occur in the interval (x₁, x₂) providing ƒ(x) is strictly unimodal.

By means of Theorem 2.3, sometimes called the elimination property, one can organize a search in which the optimum is found by recursively elimi-nating sections of the initial bounded interval. When the remaining subinterval is reduced to a sufﬁciently small length, the search is terminated. Note that without the elimination property, nothing less than an exhaustive search would sufﬁce. The greatest advantage of these search methods is that they require only functional evaluations. The optimization functions need not be differ-entiable. As a matter of fact, the functions need not be in a mathematical or analytical form. All that is required is that given a point x the value of the function ƒ(x) can be determined by direct evaluation or by a simulation ex-periment. Generally, these search methods can be broken down into two phases:

Bounding Phase. An initial coarse search that will bound or bracket the optimum.

Interval Refinement Phase. A finite sequence of interval reductions or re-finements to reduce the initial search interval to desired accuracy.

2.3.1 Bounding Phase

In the initial phase, starting at some selected trial point, the optimum is roughly bracketed within a ﬁnite interval by using the elimination property.

Typically, this bounding search is conducted using some heuristic expanding pattern, although extrapolation methods have also been devised. An example of an expanding pattern is Swann’s method [1], in which the (k _⫹ 1)st test point is generated using the recursion

xk⫹1 _⫽ xk_⫹2k_⌬ for k_⫽ 0, 1, 2, . . .

then, because of the unimodality assumption, the minimum must lie to the right of x₀, and_⌬ is chosen to be positive. If the inequalities are reversed,_⌬ is chosen to be negative; if

ƒ(x₀ _{⫺ 兩⌬兩}) _⭓ƒ(x )₀ _⭐ƒ(x₀_{⫹ 兩⌬兩})

the minimum has been bracketed between x₀ _{⫺ 兩⌬兩} and x₀ _{⫹ 兩⌬兩} and the bounding search can be terminated. The remaining case,

ƒ(x₀ _{⫺ 兩⌬兩}) _⭐ƒ(x )₀ _⭓ƒ(x₀_{⫹ 兩⌬兩})

is ruled out by the unimodality assumption. However, occurrence of the above condition indicates that the given function is not unimodal.

Example 2.5

Consider the problem of minimizing ƒ(x) _⫽ (100 _⫺ x)² given the starting point x₀_⫽ 30 and a step size _{兩⌬兩 ⫽}5.

The sign of _⌬is determined by comparing ƒ(x )₀ _⫽ ƒ(30)_⫽ 4900 ƒ(x₀_{⫹ 兩⌬兩})_⫽ ƒ(35)_⫽ 4225 ƒ(x₀_{⫺ 兩⌬兩})_⫽ ƒ(25)_⫽ 5625 Since

ƒ(x₀ _{⫺ 兩⌬兩}) _⭓ƒ(x )₀ _⭓ƒ(x₀_{⫹ 兩⌬兩})

⌬must be positive, and the minimum point x* must be greater than 30. Thus, x₁ _⫽ x⁰ _{⫹ ⌬ ⫽}35.

Next,

x₂ _⫽ x₁_⫹ 2_{⌬ ⫽}45 ƒ(45)_⫽ 3025_⬍ ƒ(x )₁

therefore, x*_⬎ 35;

x₃ _⫽ x₂_⫹ 22_{⌬ ⫽}65 ƒ(65)_⫽ 1225_⬍ ƒ(x )₂ therefore, x*_⬎ 45;

x₄ _⫽ x₃_⫹ 23_{⌬ ⫽}105 ƒ(105)_⫽ 25_⬍ ƒ(x )₃ therefore, x*_⬎ 65;

x₅ _⫽ x₄_⫹ 24_{⌬ ⫽}185 ƒ(185)_⫽ 7225_⬎ ƒ(x )₄

therefore, x*_⬍ 185. Consequently, in six evaluations x* has been bracketed within the interval

65_⭐ x*_⭐ 185

Note that the effectiveness of the bounding search depends directly on the step size _⌬. If _⌬ is large, a poor bracket, that is, a large initial interval, is obtained. On the other hand, if_⌬is small, many evaluations may be necessary before a bound can be established.

2.3.2 Interval Reﬁnement Phase

Once a bracket has been established around the optimum, then more sophis-ticated interval reduction schemes can be applied to obtain a reﬁned estimate of the optimum point. The amount of subinterval eliminated at each step depends on the location of the trial points x₁and x₂within the search interval.

Since we have no prior knowledge of the location of the optimum, it is reasonable to expect that the location of the trial points ought to be such that regardless of the outcome the interval should be reduced the same amount.

Moreover, in the interest of efﬁciency, that same amount should be as large as possible. This is sometimes called the minimax criterion of search strategy.

Interval Halving. This method deletes exactly one-half the interval at each stage. This is also called a three-point equal-interval search since it works with three equally spaced trial points in the search interval. The basic steps

Note that the points x₁, x_m, and x₂ are all equally spaced at one-fourth the interval. Compute ƒ(x₁) and ƒ(x₂).

Step 3. Compare ƒ(x₁) and ƒ(xm).

(i) If ƒ(x₁) _⬍ ƒ(xm), then drop the interval (xm, b) by setting b_⫽ xm. The midpoint of the new search interval will now be x₁. Hence, set xm⫽ x₁. Go to step 5.

(ii) If ƒ(x₁) _⭓ ƒ(xm), go to step 4.

Step 4. Compare ƒ(x₂) and ƒ(xm).

(i) If ƒ(x₂) _⬍ ƒ(xm), drop the interval (a, xm) by setting a _⫽ xm. Since the midpoint of the new interval will now be x₂, set xm⫽ x₂. Go to step 5.

(ii) If ƒ(x₂)_⭓ ƒ(xm), drop the interval (a, x₁) and (x₂, b). Set a_⫽ x₁ and b

⫽ x₂. Note that xmcontinues to be the midpoint of the new interval. Go to step 5.

Step 5. Compute L _⫽ b _⫺ a. If _兩L_兩 is small, terminate. Otherwise return to step 2.

Remarks

1. At each stage of the algorithm, exactly half the length of the search

In document Engineering Optimization (Page 51-64)