Two new search techniques - Black Box Simulation Optimization: Generalized Response Surface Met

We consider a lower, one-sided 1 − α confidence interval for the predictor based on (2.2), given d. This interval ranges from infinity down to

b ymin(d) = by(d) − tαN −qbσ(by, d) = 1, dT ˆβ_{− t}α_{N −q}bσε 1, dT(XTX)−1 1 d 1/2 (2.9)

2.3. Two new search techniques 21

where tα

N −q denotes the 1 − α quantile of the t distribution with N − q degrees of freedom, and (2.4) through (2.7) lead to

bσ(by, d) = (var[_d_b_{y(d)| d])}1/2 =_bσε 1, dT(XTX)−1 1 d 1/2 .

The first term in (2.9) concerns the signal, whereas the second term concerns the noise. When we consider a set of d values, then the set of intervals following from (2.9) has a joint (simultaneous) probability lower than 1 − α. This complication is ignored in our two techniques.

• Technique 1 (ASA) finds (say) d+ which is the d that maximizes the minimum output predicted through (2.9). This d+ gives both a search direction and a step size. First we prove in Appendix 2 that the objective function in (2.9) is concave in d. Next in Appendix 3 we derive the following explicit solution for the optimal input values of the next observation:

d+ _{= −C}−1b + λC−1βb

−0 (2.10)

where C−1_βb

−0 is the ASA direction, and λ the step size specified by

λ =   a − bTC−1b (tα_{N −qbσ}_ε)2 _{− b}_βT −0C −1_βb −0   1/2 . (2.11)

We point out that when deriving this step size, we assume that the local regression model provides some guidance outside the local region currently explored. (Ang¨un et al. (2003) uses the local regression model to make a relatively big step along the search path, and then check whether that step should be reduced.)

• Technique 2 still maximizes bymin(d), but the new point is restricted to the SA path; that is, the search direction is specified by the estimated local gradient,

β₋₀. In Appendix 4, we derive the optimal step size (say) ζ+ _{along this path:}

ζ+ =      a − bTC−1_b _b βT −0C bβ−0 b βT −0βb−0 tα_{N −qbσε} 2 − bβT₋₀C bβ₋₀      1/2 . (2.12)

We derive the following mathematical properties and interpretations of these two techniques.

The first term in (2.10) means that the ASA path starts from the point with minimal predictor variance, namely −C−1_{b (also see end of §2.2). The second term} means that the ASA path adjusts the classic SA direction bβ₋₀ (second term’s last factor) through the covariance matrix of bβ₋₀, which is σ2

εC (see §2.2, last paragraph). Finally, the step size λ is quantified in (2.11).

For the orthogonal case (i.e., (XT_{X) = N Iq×q)), it is easy to verify that a =} 1/N , b = 0, and C = I/N , so (2.10) reduces to

d+ = 1 (tα N −qbσε) 2 N − bβ T −0βb−0 1/2βb−0. (2.13)

This solution implies identical search directions for ASA and SA in case of orthogonal- ity. Moreover, for the orthogonal case we prove in Appendix 4 that the two techniques coincide (both the search direction and the step size are the same), provided SA starts from the design center.

In practice, however, designs are not orthogonal. The classic textbooks on Design Of Experiments (DOE) and RSM do present many orthogonal designs (for example, 2k−p _{designs), but these designs use standardized inputs (say) tj; that is,} inputs ranging between −1 and +1, with an average value of zero. In practice, we apply the following linear transformation to obtain original inputs dj that range between Lj and Hj: dj = fj + gjtj with fj = Lj+ Hj 2 ; gj = Lj_{− H}j 2 . (2.14)

Consequently, the first-order polynomial regression model (2.2) implies that βj and ϕj - the main effects of the original and standardized inputs respectively - are related as follows: βj = ϕj/gj. Hence, the steepest ascent path directions for the original and the standardized inputs differ (unless ∀j : gj = 1). (The interpretation of standardization is controversial in mathematical statistics; see the many references in Kleijnen (1987, pp. 221, 345).)

We prove in Appendix 5 that ASA is scale independent. So ASA is not affected by switching from (say) inches to centimeters when measuring inputs. Driessen et al. (2001) proves that ASA is also independent of linear transformations with fj _{6= 0 in} (2.14).

In case of large signal/noise ratios (defined in (2.1)), the denominator under the square root in (2.11) is negative. So this equation does not give a finite solution for d+; that is, (2.9) can be driven to infinity (unbounded solution). Indeed, if the noise is negligible, we have a deterministic problem, which our technique is not meant

2.3. Two new search techniques 23

to address (many other researchers - including Conn, Gould, and Toint (2000) - study optimization of deterministic simulation models).

In case of a small signal/noise ratio, no step is taken. Actually, we distinguish two cases: (i) the signal is small, (ii) the noise is big. These two cases are discussed next.

In case (i), the signal may be small because the first-order polynomial approximation is bad. Then we should switch to an alternative metamodel using transformations of dj such as log(dj) or 1/dj (inexpensive alternative), a second-order polynomial, which adds d2

j and djdj′ with j′ > j (expensive because many more observations are re-

quired to estimate the corresponding effects), etc.; see the RSM literature (for example, Irizarry, Wilson, and Trevino (2001)).

In case (ii), however, the first-order polynomial may fit, but the intrinsic noise may be high (also see the comment below (2.8)). To decrease this noise, we should increase the number of observations, N ; see the denominator in (2.8). Hence, we should increase either n or mi (see the definitions below (2.3)). When our technique gives a value d+ that is “close” to one of the old points, then in practice we may increase mi. Otherwise we observe a new combination: we increase n. So our technique suggests an approach to the old problem of how to choose between either using the next observation to increase the accuracy of the current local approximation, or trusting that approximation and moving into a new area. A different approach is discussed in Kleijnen (1975, p. 360). In the literature on maximizing the output of deterministic simulation, this is called the geometry improvement problem; see Conn, Gould, and Toint (2000). More research on this problem is needed.

If we specify a different α value in tα

N −q, then (2.11) gives a different step size (in the same direction). Obviously, tα

N −q increases to infinity, as α decreases to zero. So, a sufficiently small α always gives a finite solution. However, if we increase α, then we make a bigger step, and we prefer to take a bigger step in order to get quicker to the top of the response surface. We feel that a reasonable maximum α value is 0.20 (so we are “80% sure”), since 0.20 corresponds to the maximum of the common values of α (i.e., 0.01, 0.05, 0.10, and 0.20) among practitioners; however, more empirical research is needed.

We assume that the noise ε has zero mean when deriving the 1 − α confidence interval in (2.9), which leads to the techniques in (2.10), (2.11), and (2.12). Actually, the locally fitted first-order polynomials may show lack of fit so the expected value of bσ2

ε exceeds σ2ε; see the lack-of-fit tests in many RSM textbooks. Fortunately, this bias has the “right” sign; that is, this bias increases bσε in (2.11) and (2.12) so that it

decreases the step size.

In document Black Box Simulation Optimization: Generalized Response Surface Methodology. (Page 32-36)