Input Distributions
4. Sensitivity Analysis
The purpose of sensitivity analysis is to identify the most significant factors or variables affecting the uncertainty of the model predictions. In this section, we summarize two approaches to performing sensitivity analysis: correlation analysis and variance-based decomposition. The reader is encouraged to refer to one of the books published by a group at the European Research
2.7. SPECTRAL PROJECTION 27
Figure 2.1: Two-dimensional grid comparison with a tensor product grid using Clenshaw-Curtis points (left) and sparse grids A (5, 2) utilizing Clenshaw-Curtis (middle) and Gauss-Legendre (right) points with nonlinear growth.
In [22], it is demonstrated that the synchronization of total-order PCE with the monomial resolution of a sparse grid is imperfect, and that sparse grid SC consistently outperforms sparse grid PCE when employing the sparse grid to directly evaluate the integrals in Eq.2.16. In our DAKOTA implementation, we depart from the use of sparse integration of total-order expansions, and instead employ a linear combination of tensor expansions [14].
That is, we compute separate tensor polynomial chaos expansions for each of the underlying tensor quadrature grids (for which there is no synchronization issue) and then sum them using the Smolyak combinatorial coeffi-cient (from Eq.2.22in the isotropic case). This improves accuracy, preserves the PCE/SC consistency property described in Section2.7.2, and also simplifies PCE for the case of anisotropic sparse grids described next.
For anisotropic Smolyak sparse grids, a dimension preference vector is used to emphasize important stochastic dimensions. Given a mechanism for defining anisotropy, we can extend the definition of the sparse grid from that of Eq.2.22to weight the contributions of different index set components. First, the sparse grid index set constraint becomes
w < i · ⇥ w + | | (2.26)
where is the minimum of the dimension weights k, k = 1 to n. The dimension weighting vector amplifies the contribution of a particular dimension index within the constraint, and is therefore inversely related to the dimension preference (higher weighting produces lower index set levels). For the isotropic case of all k = 1, it is evident that you reproduce the isotropic index constraint w + 1 ⇥ |i| ⇥ w + n (note the change from < to
⇥). Second, the combinatorial coefficient for adding the contribution from each of these index sets is modified as described in [10].
2.7.4 Cubature
Cubature rules [63,76] are specifically optimized for multidimensional integration and are distinct from tensor-products and sparse grids in that they are not based on combinations of one-dimensional Gauss quadrature rules.
They have the advantage of improved scalability to large numbers of random variables, but are restricted in inte-grand order and require homogeneous random variable sets (achieved via transformation). For example, optimal rules for integrands of 2, 3, and 5 and either Gaussian or uniform densities allow low-order polynomial chaos expansions (p = 1 or 2) that are useful for global sensitivity analysis including main effects and, for p = 2, all
Commission on sensitivity analysis, such as Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models [34].
4.1. Correlation Analysis
Correlation refers to a statistical relationship between two random variables or two sets of data. In analysis of computer experiments, where an ensemble of simulation runs have been performed according to some type of sampling design, we have a set of results. The convention is to have each sample (run) of the simulation written on a separate row of the ensemble matrix. For example, if N simulation runs were performed, with D inputs and P outputs, the resulting ensemble matrix would be of dimension N*(D+P). In this scenario, we could perform a correlation analysis on the entire matrix; however, often the correlations between the various inputs are not interesting—especially if the sample design has been constructed so that the inputs are independent, in which case the correlations between inputs would be near zero. Likewise, correlations between various outputs may not be interesting unless some of the outputs are very strongly correlated, and thus perhaps we could reduce the analysis by focusing only on a subset of outputs. The main focus of correlation analysis of computer experiments is the correlation between inputs and outputs.
There are several types of correlations that can be calculated: simple, rank, and partial. Simple correlation, which measures the strength and direction of a linear relationship between variables, refers to correlations performed on the actual input and output data, calculated by the Pearson correlation coefficient. For example, the Pearson correlation between input X and output Y is given by ρ(X,Y) [22]:
(1)
The Pearson correlation is +1 in the case of a perfect positive (increasing) linear relationship, -1 in the case of a perfect decreasing (negative) linear relationship, and some value between -1 and 1 in all other cases. A simple correlation near zero means there is not a linearly organized relationship between the variables. Figure 5 shows some example correlation patterns and corresponding correlation coefficients. Note that if two variables are independent, they will have zero correlation but the converse is not true: they may have zero or near-zero correlation but show a strongly organized nonlinear relationship (for example, see the last row of Figure 5). The best way to identify such zero-correlation but strongly patterned relationships is to plot them in a scatterplot as shown in Figure 5.
Rank correlation refers to correlations performed on the ranks of the data. Ranks are obtained by replacing the actual data by the ranked values, which are obtained by ordering the data in ascending order. For example, the smallest value in a set of input samples would be given a rank 1, the next smallest value a rank 2, etc. Rank correlations are useful when some of the inputs and outputs differ greatly in magnitude; it is easier to check whether the smallest ranked input sample is correlated with the smallest ranked output, for example. Rank correlations can also be used when monotonic nonlinear relationships exist. A rank correlation coefficient is also called a Spearman correlation coefficient.
Figure 5. Example correlation relationships (Source: http://en.wikipedia.org/wiki/Correlation).
Partial correlation coefficients are similar to simple correlations, but a partial correlation coefficient between two variables measures their correlation while adjusting for the effects of the other variables. Partial correlation measures the strength and direction of a linear relationship between an input Xj and an output Y after the linear effects of the remaining input parameters have been removed from both Xj and Y. For example, if one has a problem with two highly correlated inputs and one output, the correlation of the second input and the output may be very low after accounting for the effect of the first input. Partial correlations are calculated in the following way: for a particular input, Xj, a regression model of the response Y is constructed over all of the inputs except Xj. The residuals (differences between the actual Y values and the regression model predictions) are constructed, indicating how well the other variables are able to predict the response Y. Then, a regression model of Xj is constructed as a function of the other inputs, and the residuals based on this regression are constructed. The partial correlation between Y and Xj is given by the correlation between these two sets of residuals. If Y is highly dependent on the other variables, and Xj is also highly dependent on the other variables, the residuals for each will be small and their correlation will be low, indicating that Xj is not highly correlated with Y when the effects of the other variables have been incorporated.
4.2. Variance-based Decomposition
The correlation coefficients described in Section 4.1 only detect a linear or monotonic relationship. In contrast, the variance-based indices (also referred to as Sobol´ indices) are not limited in this way. The variance-based indices identify the fraction of the variance in the output that can be attributed to an individual variable alone or with interaction effects [35, 36]. There are two classes of variance-based sensitivity indices: main effects and total effects. The main effects indices, Si, identify the fraction of uncertainty in the output Y attributed to input Xi alone. The total effects indices, Ti, correspond to the fraction of the uncertainty in output Y attributed to Xi
and its interactions with other variables. These sensitivity indices are represented as:
(2)
where Var(·) is the variance, E(·) is the expected value, and E(Y|Xi) is the expected value of Y conditioned on Xi. Var(Y|X-i) is the variance of Y conditioned on all the inputs except Xi. These indices involve multi-dimensional integrals that, in practice, are evaluated approximately. Note that each Si varies between 0 and 1. Values close to 1 mean that the uncertainty in variable Xi is very significant in contributing to the uncertainty in output Y. The sum of Si over all variables i must equal 1. However, there are not the same restrictions on Ti. The values of Ti are greater than or equal to 0, but are not upper-bounded by 1 and their sum over all variables does not add to 1.
The team led by Andrea Saltelli at the European Research Commission is generally credited with popularizing the use of variance-based indices for sensitivity analysis. In the past 10–15 years, several approaches have been developed for calculating the Sobol’ sensitivity indices. A recent paper by Saltelli et al. [37] provides a detailed comparison of sampling approaches, with some comments about the relationship between the estimators and the sampling methods used.
Ideally, a full factorial sample would be performed with m samples taken in each of d input dimensions. Then, the integrals in the Sobol’ formulas can easily be calculated given the md samples. For example, when calculating the numerator in Eq. 2, we calculate the inner expectation term m times, each time averaging over the remaining md-1 points in the other dimensions. We calculate: E(Y|Xi = xi) for each of the m points in dimension i, then take the variance of m expected values to obtain the numerator for the main effects indices. The total effects indices are calculated in a similar manner.
The full factorial approach requires md samples, which may not be practical when each sample is an evaluation of a computationally costly function. Saltelli et al. [34] developed an approach that uses fewer samples: (2+d)m samples. In Section 7, we discuss results obtained using a recent formulation [37] for the (2+d)m samples, which has been improved to remove bias and better capture interaction effects. The actual formulas are described in a report by Weirs et al. [38].
Finally, these sensitivity indices may be calculated when stochastic expansion methods such as polynomial chaos or stochastic collocation are used to propagate the uncertainty from inputs to outputs. When using stochastic expansion methods, the sensitivity indices Si and Ti can be calculated as analytic functions of the coefficients of the expansion. This is an efficient property, since one does not have to take additional samples beyond those used to construct the expansion initially. Formulations of the sensitivity indices based on polynomial chaos are derived by Sudret [39]; the sensitivity indices based on stochastic collocation are derived by Tang et al. [40]. We present the results of variance-based decomposition using polynomial chaos in Section 7.