• No results found

When the function is not unimodal, multiple local optima are possible and the global minimum can be found only by locating all local optima

In document Engineering Optimization (Page 51-64)

FUNCTIONS OF A SINGLE VARIABLE

3. When the function is not unimodal, multiple local optima are possible and the global minimum can be found only by locating all local optima

and selecting the best one.

In Figure 2.7, x1is the global maximum, x2 is a local minimum, x3is a local maximum, x4is the global minimum, and x5 may be considered as both local minimum and local maximum points.

Identification of Single-Variable Optima. Suppose ƒ(x) is a function of a sin-gle variable x defined on an open interval (a, b) and ƒ is differentiable to the nth order over the interval. If x* is a point within that interval, then Taylor’s

Figure 2.7. Local and global optima.

theorem allows us to write the change in the value of ƒ from x* to x* ⫹ ␧ as follows:

2 2 n n

d ƒ d ƒ

ƒ(x*⫹ ␧) ƒ(x*) () dxx⫽x* 2! dx2x⫽x*⫹ 䡠 䡠 䡠 ⫹n! dxnx⫽x*

On⫹1() (2.1)

where On⫹1() indicates terms of (n 1)st order or higher in . If x* is a local minimum of ƒ on (a, b), then from the definition there must be an neighborhood of x* such that for all x within a distance

ƒ(x)ƒ(x*) (2.2)

Inequality (2.2) implies that

2 2 n n

d ƒ d ƒ

dxx⫽x*2! dx2x⫽x*⫹ 䡠 䡠 䡠 ⫹n! dxnx⫽x*On⫹1() 0 (2.3) For sufficiently small, the first term will dominate the others, and since

␧ can be chosen both positive and negative, it follows that inequality (2.3) will hold only when

dxx⫽x*⫽ 0 (2.4)

Continuing this argument further, inequality (2.3) will be true only when

d ƒ2

⭓ 0 (2.5)

dx2 x⫽x*

The same construction applies in the case of a local maximum but with in-equality (2.2) reversed, and we obtain the following general result:

Theorem 2.1

Necessary conditions for x* to be a local minimum (maximum) of ƒ on the open interval (a, b), providing that ƒ is twice differentiable, are that

1. dxx⫽x* 0

2. 0 ( 0)

d ƒ22

dx x⫽x*

These are necessary conditions, which means that if they are not satisfied, then x* is not a local minimum (maximum). On the other hand, if they are satisfied, we still have no guarantee that x* is a local minimum (maximum).

For example, consider the function ƒ(x)x3, shown in Figure 2.8. It satisfies the necessary conditions for both local minimum and local maximum at the origin, but the function does not achieve a minimum or a maximum at x* 0.

Definitions

A stationary point is a point x* at which 冏 ⫽0 dx x⫽x*

An inflection point or saddlepoint is a stationary point that does not corre-spond to a local optimum (minimum or maximum).

To distinguish whether a stationary point corresponds to a local minimum, a local maximum, or an inflection point, we need the sufficient conditions of optimality.

Theorem 2.2

Suppose at a point x* the first derivative is zero and the first nonzero higher order derivative is denoted by n.

Figure 2.8. Illustration of inflection point.

This result is easily verified by recourse to the Taylor series expansion given in Eq. (2.1). Since the first nonvanishing higher order derivative is n, Eq. (2.1) reduces to

n n

d ƒ

ƒ(x* ⫹ ␧) ƒ(x*) n! dxnx⫽x* On⫹1() (2.6) If n is odd, then the right-hand side of Eq. (2.6) can be made positive or negative by choosing positive or negative. This implies that depending on the sign of , ƒ(x* ⫹ ␧) ƒ(x*) could be positive or negative. Hence the function does not attain a minimum or a maximum at x*, and x* is an in-flection point.

Now, consider the case when n is even. Then the termnis always positive, and for allsufficiently small the sign of Eq. (2.6) will be dominated by the first term. Hence, if (dnƒ / dxn)x⫽x*is positive, ƒ(x*⫹ ␧) ƒ(x*)0, and x*

corresponds to a local minimum. A similar argument can be applied in the case of a local maximum.

Applying Theorem 2.2 to the function ƒ(x) x3 shown in Figure 2.8, we note that

2 3

d ƒ d ƒ

⫽ 0 0 6

23

dx x⫽0 dx x⫽0 dx x⫽0

Thus the first nonvanishing derivative is 3 (odd), and x 0 is an inflection point.

Remark

In the above discussion, we have assumed that the function is always differ-entiable, in other words, that continuous first derivatives always exist. How-ever, if the function is not differentiable at all points, then even the necessary condition for an unconstrained optimum, that the point is a stationary point, may not hold. For example, consider the piecewise linear function given by

x for x 2

ƒ(x)4 x for x 2

The function is continuous at all points. But it is not differentiable at the point x 2, and the function attains its maximum at x 2, which is not a stationary point by our definition.

Example 2.1

Consider the function

6 5 –––165 4 3

ƒ(x) 5x 36x 2 x 60x 36 defined on the real line. The first derivative of this function is

5 4 3 2 2

30x 180x 330x 180x 30x (x 1)(x 2)(x 3) dx

Clearly, the first derivative vanishes at x 0, 1, 2, 3, and hence these points can be classified as stationary points. The second derivative of ƒ is

d ƒ2 4 3 2

150x 720x 990x 360x dx2

Evaluating this derivative at the four candidate points x0, 1, 2, 3, we obtain

x ƒ(x) d2ƒ /dx2

0 36 0

1 27.5 60

2 44 120

3 5.5 540

Since this is an odd-order derivative and is nonzero, the point x 0 is not an optimum point but an inflection point.

The next question is how to determine the global maximum or minimum of a function of one variable. Since the global optimum has to be a local optimum, a simple approach is to compute all local optima and choose the best. An algorithm based on this is given below:

Maximize ƒ(x) Subject to a x b

where a and b are practical limits on the values of the variable x.

Once a function is bounded in an interval, you should notice that in ad-dition to the stationary points, the boundary points can also qualify for the local optimum.

Step 1. Set dƒ /dx 0 and compute all stationary points.

Step 2. Select all stationary points that belong to the interval [a, b]. Call them x1, x2, . . . , xN. These points, along with a and b, are the only points that can qualify for a local optimum.

Step 3. Find the largest value of ƒ(x) out of ƒ(a), ƒ(b), ƒ(x1), . . . , ƒ(xN).

This value becomes the global maximum point.

Note: We did not try to classify the stationary points as local minimum, local maximum, or inflection points, which would involve the calculation of higher order derivatives. Instead it is easier to just compute their functional values and discard them.

Example 2.2

Maximize ƒ(x) ⫽ ⫺x3 3x2 9x 10 in the interval 2 x 4. Set

2

⫽ ⫺3x 6x 9 0 dx

Solving this equation, we get x3 and x⫽ ⫺1 as the two stationary points, and both are in the search interval.

To find the global maximum, evaluate ƒ(x) at x 3,1,2, and 4:

ƒ(3)37 ƒ(1) 5 ƒ(2) 12 ƒ(4) 30 Hence x 3 maximizes ƒ over the interval (2, 4).

Instead of evaluating all the stationary points and their functional values, we could use certain properties of the function to determine the global opti-mum faster. At the end of Section 2.1, we introduced unimodal functions, for which a local optimum is the global optimum. Unfortunately, the definition of unimodal functions does not suggest a simple test for determining whether or not a function is unimodal. However, there exists an important class of unimodal functions in optimization theory, known as convex and concave functions, which can be identified with some simple tests. A review of convex and concave functions and their properties is given in Appendix B.

Example 2.3

Let us examine the properties of the function ƒ(x) (2x 1) (x2 4) ƒ(x) (2x 1)(6x 15) ƒ(x) 24(x 1)

For all x 1, ƒ(x) 0, and the function is concave in this region. For all x 1, ƒ(x) 0, and the function is convex in the region.

Note that the function has two stationary points, x ⫺–12 and x 52. ƒ(⫺–12) 0, and the function has a local maximum at x 12. At x 52, ƒ(–52) 0, and the function attains a local minimum at this point. If we restricted the region of interest to x1, then ƒ(x) attains the global maximum at x⫺–12since ƒ(x) is concave in this region and x⫺–12 is a local maximum.

Similarly, if we restricted the region of interest to x1, then ƒ(x) attains the global minimum at x 52. However, over the entire range of x from ⫺⬁ to

⫹⬁, ƒ(x) has no finite global maximum or minimum.

Example 2.4 Inventory Control

Many firms maintain an inventory of goods to meet future demand. Among the reasons for holding inventory is to avoid the time and cost of constant replenishment. On the other hand, to replenish only infrequently would imply large inventories that would tie up unnecessary capital and incure huge storage

Figure 2.9. Inventory problem.

inventory is penalized by assuming that each unit will cost $h to store for one year. To keep things simple, we will assume that all demand must be met immediately (i.e., no back orders are allowed) and that replenishment occurs instantaneously as soon as an order is sent.

Figure 2.9 graphically illustrates the change in the inventory level with respect to time. Starting at any time A with an inventory of B, the inventory level will decrease at the rate of units per unit time until it becomes zero at time C, when a fresh order is placed.

The triangle ABC represents one inventory cycle that will be repeated throughout the year. The problem is to determine the optimal order quantity B, denoted by variable Q, and the common length of time C A, denoted by T, between reorders.

Since T is just the length of time required to deplete Q units at rate , we get

T Q

The only remaining problem is to determine Q. Note that when Q is small, T will be small, implying more frequent reorders during the year. This will result in higher reorder costs but lower inventory holding cost. On the other hand, a large inventory (large Q) will result in a higher inventory cost but lower reorder costs. The basic inventory control problem is to determine the optimal value of Q that will minimize the sum of the inventory cost and reorder cost in a year.

We shall now develop the necessary mathematical expression to optimize the yearly cost (cost / cycle number of cycles / year).

1 Number of cycles (reorders) / year⫽ ⫽

T Q

Cost per cycle reorder cost inventory cost

(K cQ) 冉 冊Q2 hT hQ2

KcQ 冉 冊2

Note: The inventory cost per cycle is simply the cost of holding an average inventory of Q / 2 for a length of time T.

Thus, the yearly cost to be minimized is

K hQ ƒ(Q) c

Q 2

⫺␭K h ƒ(Q) 2

Q 2

2␭K

ƒ(Q) 3 0 for all Q 0 Q

Hence ƒ(Q) is a convex function, and if there exists a positive Q* such that ƒ(Q*) 0, then Q* minimizes ƒ(Q).

Solving ƒ(Q) 0, we get

2K Q*

h 0 Thus, the optimal order quantity is

2␭K Q*

h

and

T* time between reorders

2Kh

The parameter Q* is the famous economic order quantity, or EOQ, used frequently in inventory control.

Figure 2.10. Case (i) and case (ii) of Theorem 2.3.

velop a number of single-variable search methods for locating the optimal point in a given interval. Search methods that locate a single-variable optimum by successively eliminating subintervals so as to reduce the remaining interval of search are called region elimination methods.

In Section 2.1, we introduced the definition of unimodal functions. uni-modality is an extremely important functional property and virtually all the single-variable search methods currently in use require at least the assumption that within the domain of interest the function is unimodal. The usefulness of this property lies in the fact that if ƒ(x) is unimodal, then it is only nec-essary to compare ƒ(x) at two different points to predict in which of the subintervals defined by those two points the optimum does not lie.

Theorem 2.3

Suppose ƒ is strictly unimodal on the interval a x b with a minimum at x*. Let x1 and x2 be two points in the interval such that a x1 x2 b.

Comparing the functional values at x1and x2, we can conclude:

(i) If ƒ(x1) ƒ(x2), then the minimum of ƒ(x) does not lie in the interval (a, x1). In other words, x* (x1, b) (see Figure 2.10).

(ii) If ƒ(x1) ƒ(x2), then the minimum does not lie in the interval (x2, b) or x* (a, x2) (see Figure 2.10).

A function is strictly unimodal if it is unimodal and has no intervals of finite length in which the function is of constant value.

Proof

Consider case (i), where ƒ(x1) ƒ(x2). Suppose we assume the contrary, that a x* x1. Since x* is the minimum point, by definition we have ƒ(x*) ƒ(x) for all x (a, b). This implies that

ƒ(x*) ƒ(x )1 ƒ(x )2 with x* x1 x2

But this is impossible, because the function has to be monotonic on either side of x* by the unimodality of ƒ(x). Thus, the theorem is proved by con-tradiction. A similar argument holds for case (ii).

Note: When ƒ(x1) ƒ(x2), we could eliminate both ends, (a, x1) and (x2, b), and the minimum must occur in the interval (x1, x2) providing ƒ(x) is strictly unimodal.

By means of Theorem 2.3, sometimes called the elimination property, one can organize a search in which the optimum is found by recursively elimi-nating sections of the initial bounded interval. When the remaining subinterval is reduced to a sufficiently small length, the search is terminated. Note that without the elimination property, nothing less than an exhaustive search would suffice. The greatest advantage of these search methods is that they require only functional evaluations. The optimization functions need not be differ-entiable. As a matter of fact, the functions need not be in a mathematical or analytical form. All that is required is that given a point x the value of the function ƒ(x) can be determined by direct evaluation or by a simulation ex-periment. Generally, these search methods can be broken down into two phases:

Bounding Phase. An initial coarse search that will bound or bracket the optimum.

Interval Refinement Phase. A finite sequence of interval reductions or re-finements to reduce the initial search interval to desired accuracy.

2.3.1 Bounding Phase

In the initial phase, starting at some selected trial point, the optimum is roughly bracketed within a finite interval by using the elimination property.

Typically, this bounding search is conducted using some heuristic expanding pattern, although extrapolation methods have also been devised. An example of an expanding pattern is Swann’s method [1], in which the (k 1)st test point is generated using the recursion

xk⫹1 xk2k for k 0, 1, 2, . . .

then, because of the unimodality assumption, the minimum must lie to the right of x0, and is chosen to be positive. If the inequalities are reversed, is chosen to be negative; if

ƒ(x0 ⫺ 兩⌬兩) ƒ(x )0 ƒ(x0⫹ 兩⌬兩)

the minimum has been bracketed between x0 ⫺ 兩⌬兩 and x0 ⫹ 兩⌬兩 and the bounding search can be terminated. The remaining case,

ƒ(x0 ⫺ 兩⌬兩) ƒ(x )0 ƒ(x0⫹ 兩⌬兩)

is ruled out by the unimodality assumption. However, occurrence of the above condition indicates that the given function is not unimodal.

Example 2.5

Consider the problem of minimizing ƒ(x) (100 x)2 given the starting point x0 30 and a step size 兩⌬兩 ⫽5.

The sign of is determined by comparing ƒ(x )0 ƒ(30) 4900 ƒ(x0⫹ 兩⌬兩) ƒ(35) 4225 ƒ(x0⫺ 兩⌬兩) ƒ(25) 5625 Since

ƒ(x0 ⫺ 兩⌬兩) ƒ(x )0 ƒ(x0⫹ 兩⌬兩)

must be positive, and the minimum point x* must be greater than 30. Thus, x1 x0 ⫹ ⌬ ⫽35.

Next,

x2 x1 2⌬ ⫽45 ƒ(45) 3025 ƒ(x )1

therefore, x* 35;

x3 x2 22⌬ ⫽65 ƒ(65) 1225 ƒ(x )2 therefore, x* 45;

x4 x3 23⌬ ⫽105 ƒ(105) 25 ƒ(x )3 therefore, x* 65;

x5 x4 24⌬ ⫽185 ƒ(185) 7225 ƒ(x )4

therefore, x* 185. Consequently, in six evaluations x* has been bracketed within the interval

65 x* 185

Note that the effectiveness of the bounding search depends directly on the step size . If is large, a poor bracket, that is, a large initial interval, is obtained. On the other hand, ifis small, many evaluations may be necessary before a bound can be established.

2.3.2 Interval Refinement Phase

Once a bracket has been established around the optimum, then more sophis-ticated interval reduction schemes can be applied to obtain a refined estimate of the optimum point. The amount of subinterval eliminated at each step depends on the location of the trial points x1and x2within the search interval.

Since we have no prior knowledge of the location of the optimum, it is reasonable to expect that the location of the trial points ought to be such that regardless of the outcome the interval should be reduced the same amount.

Moreover, in the interest of efficiency, that same amount should be as large as possible. This is sometimes called the minimax criterion of search strategy.

Interval Halving. This method deletes exactly one-half the interval at each stage. This is also called a three-point equal-interval search since it works with three equally spaced trial points in the search interval. The basic steps

Note that the points x1, xm, and x2 are all equally spaced at one-fourth the interval. Compute ƒ(x1) and ƒ(x2).

Step 3. Compare ƒ(x1) and ƒ(xm).

(i) If ƒ(x1) ƒ(xm), then drop the interval (xm, b) by setting b xm. The midpoint of the new search interval will now be x1. Hence, set xmx1. Go to step 5.

(ii) If ƒ(x1) ƒ(xm), go to step 4.

Step 4. Compare ƒ(x2) and ƒ(xm).

(i) If ƒ(x2) ƒ(xm), drop the interval (a, xm) by setting a xm. Since the midpoint of the new interval will now be x2, set xmx2. Go to step 5.

(ii) If ƒ(x2) ƒ(xm), drop the interval (a, x1) and (x2, b). Set a x1 and b

x2. Note that xmcontinues to be the midpoint of the new interval. Go to step 5.

Step 5. Compute L b a. If L is small, terminate. Otherwise return to step 2.

Remarks

1. At each stage of the algorithm, exactly half the length of the search

In document Engineering Optimization (Page 51-64)