dopt = d∗+ ∆ β 2 β. (5.1)
The location of the points on which the local approximating models are fitted does not influence the shape of this spherical trust region in any way. Furthermore, in this trust region framework it is implicitly assumed that all design parameters are of comparable magnitude.
We believe that in the objective improving step, in which the aim is to find a locally optimal point with respect to the objective function given the data available at that moment, the trust region should be driven by the data points. The trust region is the region in which you trust your local models to approximate well. If in one dimension there is more information than in the other, and hence you trust the models to predict better in one dimension than in the other, then this should be reflected in the maximum stepsize you wish to take in those dimensions, hence in the shape of the trust region.
The dispersion of the design points contains information about the reliability of the models fitted on these points. In order to be able to incorporate this information into the method we formulate a new problem by changing the shape of the trust region. In statistics, a natural way to take locations of design points into account is by using the prediction variance of the approximation (see, e.g., Kleijnen et al. (2004)). Formally, this prediction variance is only valid if the true underlying function is linear. As the optimization approach consists of local approximations, the underlying function will be approximately linear if we look at a small enough scale.
As long as the variance remains within acceptable ranges, the model is trusted. The idea is to apply this approach to our problem. The variance is minimized in the center of gravity of the design points and the contour curves of this variance are ellipsoids. Before formulating this result formally in Theorem 5.1, we first introduce some notation. The results of Theorem 5.1 allow us to choose a suitable trust region during the optimization process.
In the statistic linear regression model the variance of the predictor ˆy = β+00 x equals σx0(X0X)−1x, where A−1 denotes the inverse of matrix A. With β+0 we denote the model
coefficients including the coefficient for the constant term. The matrix X is known as the extended design matrix and consists of the row vectors (xi)0 = [1 (di)0], i = 1, . . . , n,
where di denotes the design vector for the ith experiment and n is the number of design
points that are used for fitting the local approximations. The matrix X is assumed to have linearly independent columns. The symbol σ denotes the standard deviation of the error term. The covariance matrix (X0X)−1 plays an essential role in D-optimality. We will show how this matrix induces a new trust region. As we focus on the design space
and do not take the constant term into account, we work with the matrix C instead of the matrix (X0X)−1, which is related to (X0X)−1 in the following way
(X0X)−1 = a b
0
b C
, (5.2)
where a ∈ IR, b ∈ IRq, and C ∈ IRq×q. As (X0X)−1 is positive definite and symmetric, C is positive definite and symmetric as well. Hence, C is also non-singular. Due to the special structure of the matrix X, the matrix X0X has the following structure
X0X = n n ¯d n ¯d D0D , (5.3)
where ¯d, the center of gravity of the design points di, i = 1, ..., n, is defined as
¯ d = 1 n n X i=1 di
and the matrix D equals X without the first column of ones, i.e., X = [1 D].
The following theorem, well known in statistics, shows that the ellipsoids arising from the matrix C are in fact contour curves of the variance.
Theorem 5.1 The variance of the predictor ˆy = ˆβ+00 x is minimal in ¯d, the center of gravity of the design points di, i = 1, ..., n. The contour curves of this variance are given
by the ellipsoids
(d − ¯d)0C(d − ¯d) = ρ, (5.4) where ρ = ρ0− a + ¯d0C ¯d and ρ0 equals the variance of the predictor.
We propose to use the ellipsoids as defined in (5.4) in the definition of the trust region. The new problem formulation including the ellipsoidal trust region becomes
max d β 0 d s.t. d − ¯d C ≤ ρ, (P2)
where ρ is the trust region radius and k x kC is the C-norm defined by
k x kC =
√ x0Cx.
As the matrix C is positive definite, it defines a proper norm. Also the solution of problem (P2) is explicitly known. By setting both the derivatives to d and to λ of the Lagrangian
of problem (P2) equal to zero and solving the resulting equations, we find that the optimal
d becomes dopt = ¯d + ρ β C−1 C−1β. (5.5)
118 5.2. The objective improving step The matrix C can be ill-conditioned and when solving the explicit solution one should take care not to compute this matrix by inversion, but to use for example the expression for C−1 that is derived in Theorem 5.2 to avoid numerical instability and loss of accuracy. The first of the two main differences between problem (P2) and problem (P1) is that we
now use the C-norm instead of the 2-norm. The second difference is that in problem (P2)
the center of the trust region is determined by all the design points on which the local linear models are fitted together, while in problem (P1) the trust region is centered around
the best point so far.
We illustrate the implications of using this C-norm instead of the 2-norm in Fig- ure 5.1. Two important observations are illustrated in this figure. The first one is that the ellipsoidal trust region adapts its form to the locations of the design points, whereas the spherical trust region does not. This adaptation ensures that the models are more trusted in areas where actual evaluations have been performed. The second observation is that the center of the ellipsoidal trust region is determined by the design points such that the ellipsoid covers the design points in the best possible way. The spherical region is centered on the best point found so far. Hence, if such a point lies a bit apart from the other design points, some parts of the spherical trust region might not contain design points at all.
Figure 5.1: The ellipsoidal trust region adapts better to the locations of the design points than the spherical trust region. The small black dots indicate design points, the ∗ indicates the center of the ellipse, and the open dot is the center of the sphere.
When the ellipsoidal trust region becomes too narrow in one or more directions, this is an indication that the consecutively simulated design points have more or less the same value for these dimensions. Eventually, the approximating models will start to show lack of fit in these dimensions. In the next section we describe how to prevent the occurrence of this situation by means of a geometry improving step, that is also based on the same
matrix C.
The natural question arises how the use of higher-order approximation models would affect the above analysis. Let us consider quadratic models. For a quadratic model the extended design matrix X is extended with the second-order terms. The resulting matrix C then gives rise to a non-convex trust region and the equivalent of (P2) becomes a non-
convex NLP, which is harder to solve than (P2). Another option would be to use the
ellipsoidal trust region induced by linear models also for higher-order models.