Operational validation - Modeling and validation

2.3 Modeling and validation

2.3.3 Operational validation

Operational validation is the task of establishing that a conceptual model’s output data closely – means with the accuracy required for the model’s intended purpose – resembles the output data that would be expected from the actual system, and is argued to be the most definitive test of a simulation model’s validity (Sargent, 2005). Operational validation therefore involves comparing output data from the existing system with the proposed model’s output, if the two sets of data compare favorably, then the conceptual model is considered valid (Law and Kelton, 1991). Thus, operational validation is a form of black-box validation that ignores the detailed internal workings of the model but is only interested in whether or not the model’s output accurately reflects that of the real system. This type of validation is mainly performed by statistical comparisons of the model’s and the real system’s outputs and examines the predictive power of the model, i.e. whether it adequately predicts how a given system actually behaves and would behave under certain conditions (Pidd, 1992; Kleijnen, 1995).

An appropriate approach to perform operational validation is the correlated inspection approach (Law and Kelton, 1991; Pidd, 1992) – also called trace-driven validation (Kleijnen and van Groenendaal, 1992): ’In particular, it is recommended that the system and model be compared by

driving the model with historical system input data (e.g., actual observed interarrival times and service times), rather than samples from the input probability distributions, and then comparing the model and system outputs. Thus, the system and the model experience exactly the same observations from the input random variables, which should result in a statistically more precise comparison.’ (Law and Kelton, 1991, p.316). When using the correlated inspection approach

input data from the real-world system is used to run the simulation program and then the corresponding outputs of both the real-world system’s (historical) and the simulation program’s outputs are compared as illustrated in Figure 2.3. It is better to use the original input data rather than random input values – from a theoretical distribution fitted to the empirical data – as otherwise the effects of random input variables on the one hand and the conceptual model on the other hand could be mixed. This compounding of effects could result in wrongly rejecting an actually valid model due to output differences induced not by an inappropriate model but by the random input numbers used (Law and Kelton, 1991).

Figure 2.3: Correlated inspection approach (Law and Kelton, 1991, p. 317)

The resulting output data of the system and the model then can be compared either qualitatively by Turing tests – individuals knowledgeable about the operations of the system being modeled are asked if they can discriminate between system and model output and the model is considered

valid if they cannot – or by plotting and visual inspection (Kleijnen, 1995; Sargent, 2005), or quantitatively in using statistical tests of the equality distributions for paired samples, like the parametric t-test or the non-parametric Wilcoxon signed-rank test (Law and Kelton, 1991), or by ordinary least squares regression (Kleijnen, 1995). Concerning the statistical testing of correspondence of system output and model output it has to be kept in mind, that due to approximations made in the conceptual model it is likely that the model and the real system do not have identical output and that therefore statistical tests could come to the end that outputs are different, though the model can be useful nevertheless (Bratley et al., 1987; Law and Kelton, 1991). Even if there are differences – which clearly must not be significant enough to affect the conclusions derived from the model – the greater the commonality between the outcomes of the system and the conceptual model the greater the confidence in the model (Law and Kelton, 1991). Further problems of statistical testing arise as the gathered data on real system output is often non-stationary and autocorrelated, which makes it difficult to find appropriate statistical tests (Law and Kelton, 1991).

Assume observations ri of the output of the system and si of the output of the simulation of

the conceptual model are available for same inputs i (i = 1, . . . , n). Then the t-test for paired samples can be used to analyze whether the distributions of the two samples are equal (Law and Kelton, 1991; Kleijnen, 1995), in calculating the paired differences di= ri− siand the according

t statistic (2.1).

tn−1= d − δ

sd/√n

(2.1)

In (2.1) d is the average and sd the standard deviation of di i.e. the average of the n differences

between the system and the model output and its standard deviation. If for the hypothesis H0 : δ = 0 the t-statistic is significant we reject the model otherwise the means are practically

the same so the simulation model is considered valid.

Alternatively, Kleijnen (1995) proposes to use ordinary least squares regression for the regression function (2.2) to test hypotheses about the correspondence of the system and the model, by estimating intercept β0 and slope β1 for the system output s and model output r for the same

historical input data. This approach takes into account the positive correlation of outputs not just the equality of their means.

r = β0+ β1s (2.2)

The most stringent test of operational validity using ordinary least squares regression would be to test hypothesis (2.3), which implies not only identical means of the system and model responses but also their positive correlation. Testing this composite hypothesis involves computing the sum of squared errors for the reduced (without the hypothesis) and full (with the hypothesis) regression model and comparing these two values. A significantly high F statistic indicates that the hypothesis should be rejected and the simulation model is not valid (Kleijnen, 1995). A less stringent validation requirement would be that simulated and real responses do not necessarily have the same mean, but they are positively correlated, which makes sense when the model is used to predict relative response rather than absolute response, the according hypothesis for

In document Simulation of automated negotiation (Page 43-45)