Decomposition of the Overall Prediction Error for Stochastic Case 69

The additional error caused by parameter estimation uncertainty is reported in several different research papers. den Hertog et al.(2005) discussed the parameter

estimation uncertainty issues for kriging metamodel, especially on the parameter estimation uncertainty’s influence on the overall prediction error. Bootstrapping numerical experiments show that the actual kriging model prediction error which correctly accounts for parameter estimation uncertainty is larger than the tra-ditional kriging variance given the known parameter. Assuming the predictor Z (xˆ ₀)_Y is a linear function of the parameter φ_Z and the estimator for φ_Z is un-biased, from the results of Kackar & Harville (1984), the prediction error with unknown parameter can be approximated by Equation (4.1).

MSE[ ˆZ (x0)_Y] = tr[A( ˆφZ)B( ˆφZ)] + MSE[ ˆZ (x0)_Y | ˆφZ] (4.1) where tr[A( ˆφZ)B( ˆφZ)] is an approximation of the additional error introduced when the estimator ˆφZ is used, A( ˆφZ) = var8 is the traditional mean squared error (MSE) when ˆφZ is used, and this can be further decomposed as:

The first term on the right hand side of Equation (4.2) is the prediction error caused by model misspecification. The second term is the direct effect of the stochastic noise ε (x) on the prediction error, and is a function of the variance of ε (x). Combining Equation (4.1) and Equation (4.2) together, we find that the overall prediction error can be decomposed into the following three error

components:

The prediction error caused by model misspecification is inherent in the meta-model selection and will not be the focus in this research. In the next section, we analyze how the random noise ε (x) affects the parameter estimation of φZ and the additional error caused by parameter estimation uncertainty (the last term in Equation (4.3)).

4.3 Maximum Likelihood Estimation with Stochas-tic Response

The Maximum Likelihood Estimation (MLE) method is commonly used in kriging model’s estimation. Assuming a Gaussian random process, the log-likelihood function is given by:

.(µ, φZ, σ_Z²) = −1

2ln det(R^$) − 1

2( ¯Y − F β)^TR^$−1( ¯Y − F β) (4.4) where F is the design matrix for the ordinary least squares model, β is the regression parameters and F β represents the mean function µ; σ²_Z is the variance of the Gaussian random process Z, which indicates the variability of an unknown point in Z. Here we write the correlation matrix as R(φ_Z) to denote that it is a function of parameter φ_Z. We can find the estimators for µ, φ_Z and σ_Z² by taking

Solving the above three equations, the MLE estimators µ and σ²_Z result as func-tions of φ. For simplification purposes, µ and σ²_Z are typically assumed fixed or known in order to estimate the sensitivity parameter φ_Z. This simplifies the likelihood function to a function of φZ only. However, in this simplification, the

MLE estimator for φZ is biased as the estimation of φZ depends also on β, which is usually unknown, see Cressie(1993). In order to simplify the approximation in Equation (4.1), we use instead the restricted maximum likelihood (REML) pro-posed byPatterson & Thompson(1971) andPatterson & Thompson(1974) which provides an unbiased estimator of φZ for this Gaussian random process. With this unbiased estimator, B(φ_Z) will equal to the variance of the φ_Z estimator.

The log-likelihood function for the REML is given by:

.(φZ, σ²_Z) = 1 which is independent of β. The difference between Equation (4.4) and Equa-tion (4.5) is especially significant in the small sample case, like the two-point problem we propose here, see Cressie(1993).

4.3.1 A simple two-point problem

We design a simple two-point problem to provide some theoretical insights to the parameter estimation problem for the kriging model: Points P0, P1 and P2 are

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 4.1: Design for two-point problem.

evenly spaced. Take points P₁ and P₂ to be the observation points, and point P0 to be the prediction point. Suppose that the mean function Z is an unknown Gaussian random process with the mean function µ(x) and bias function δ(x).

The additional random noise function ε (x) follows the normal distribution with zero mean and unknown heterogeneous variance function ε (Pi). Then at points:

P1(x1) : Y1 = Z1+ ε1, P1(x1) : Y2 = Z2+ ε2 (4.6) In the next subsection, we describe the parameter estimation techniques for the unknown parameter before discussing its effects on the overall prediction error.

For this two-point problem, the terms in Equation (4.5) are given as:

σ²_Z = 1, F =

As mentioned in Section 4.2.1, the kriging predictor is not a linear function of the parameter φ in this two-point case. In order to make the predictor a linear function of the estimated parameter to apply the approximation in Equation (4.1), a reparameterization is made as follows:

ρ = exp(−dφ^Z), R(ρ) =

< 1 ρ ρ 1

With this reparameterization, Equation (4.5) becomes a function of ρ instead of φ:

For the three different model forms described in Section 4.1, we have the following results: Deterministic kriging model:

Following, the expectation and variance of the parameter estimators are given as:

E(ˆρ) = 1 − 1

2E((y1− y²)²) (4.8) var(ˆρ) = 1

4var((y1− y²)²) (4.9) From Equation (4.6), it is straightforward to see that y1 follows normal distribu-tion with mean z1 and variance σ_ε²(x1), and y2 follows normal distribution with mean z2 and variance σ_ε²(x2). Separating the mean and pure noise components, we get:

E((y1− y²)²) = σ²_ε(x1) + σ²_ε(x2) + (z1− z²)² (4.10) var((y1− y²)²) = 2(σ_ε²(x1) + σ_ε²(x2))²+ 4(σ_ε²(x1)

+ σ²_ε(x₂))(z₁− z2)² (4.11) Combining Equation (4.8) and Equation (4.10),

E((y₁− y2)²) = 1 − 1 Similarly, for the nugget effect model and modified nugget effect model, we obtain the following: For deterministic model, we see from Equation (4.12) that the expectation and variance of the estimated parameter are functions of the input variance σ_ε²(x1)

and σ_ε²(x2). If the input variances increase, the variance of the estimated param-eter will also increase and its mean will decrease, indicating a weaker correlation between the points. From Equation (4.12), the expectation of the estimated pa-rameter can be negative if the variance σ²_ε(x1) and σ_ε²(x2) are high enough. How-ever as we assume the exponential correlation function in this two-point problem, we consider only non-negative correlations. For the cases when estimated ρ is negative, the restricted likelihood function is monotonically decreasing, indicat-ing that extra sample data is needed. For the modified nugget effect model, from Equation (4.16), we see that the influence of the input variance can cancel out if ι^∗₁ and ι^∗₂ are the exact estimators of σ_ε²(x1) and σ²_ε(x2). Similarly, from Equa-tion (4.14), we see that the nugget effect model can have the same results when the nugget value c0 equals to the average of σ²_ε(x1) and σ²_ε(x2). This partially explains what was observed in Yin et al. (2008), where it was observed that the estimated φs for the modified nugget effect model and nugget effect model is closer to the optimal value than the deterministic model.

4.3.3 Influence of Parameter Estimation on Overall Pre-diction Error

From the approximation in Equation (4.1), the additional prediction error caused by parameter estimation uncertainty can be approximated as tr[A( ˆφZ)B( ˆφZ)], where tr[Q] stands for trace of matrix Q. Based on the REML, can be computed and is the variance of ˆρ, see Harville (1985). As a result, we can formulate the approximation as a function of ρ:

tr[A( ˆφZ)B( ˆφZ)] = 1 − ˆρ 2 var(ˆρ)

Since the variance of the estimator is the same for all three models, an estima-tor ˆρ closer to 1 is favorable. Comparing Equation (4.12) and Equation (4.16), the modified nugget effect estimator ˆρ_M is closer to one in expectation than the estimated parameter given with deterministic kriging model. With careful selec-tion of the nugget value ι0 in Equation (4.14), the additional error incurred by the nugget effect model can be as small as the modified nugget effect model. In this simple example, we see that in stochastic situations where random noise is

present, selection of the appropriate stochastic model can reduce the additional error introduced by parameter estimation in φZ. Furthermore, although not ad-dressed in this thesis, good knowledge or accurate estimation of σ_ε²(z) can also improve the estimation error.

In document Design and analysis of computer experiments for stochastic systems (Page 81-88)