Methods of Analysing Validation Data - Research Design and Empirical Validation Methods

Chapter 4: Research Design and Empirical Validation Methods

4.5. Methods of Analysing Validation Data

4.5.1. Review of analytical methods and techniques

The methods of comparison are very important in determining if a model is working or not. According to Jensen (1995), comparisons between measured and predicted values are often performed in a very subjective way by visually comparing graphs of measured and predicted data. While this is a simple and quick method of comparison, it is only the initial step in a validation process. Jensen also reported that the method of graphical temperature comparison can give imprecise information about what may be the cause of deviations. He recommends that statistical techniques should also be used to assess resulting uncertainties in the program output parameters and that uncertainties should be identified.

There are a number of statistical techniques for comparing measured and predicted values and testing the goodness-of-fit of different aspects of a program. In the PASSYS Model Validation and Development Subgroup, two different statistical tools were applied, namely: the parametric sensitivity analysis and the residual analysis (Lomas & Eppel 1992).

Jensen (1995) describes the analysis of the results and assessment of the sensitivity as follows:

 The parametric sensitivity analysis includes the differential sensitivity analysis and the Monte Carlo method. With the differential sensitivity analysis perturbed simulations are performed by changing each input parameter by its standard deviation. Based on the results, the overall uncertainty band of the simulation is calculated. The agreement is said to be good if the measured value fits within this uncertainty band. The advantage of this method is that it is very clear when good agreement is obtained. Parametric sensitivity analysis can only compare measurements and predictions in the low frequency range and cannot test the goodness-of-fit at other frequencies, such as the dynamic part of the experiment. Other statistical methods have been therefore developed for such procedures including the residual analysis. The residuals (the time series of the difference between measurements and predictions) are analysed in the power spectrum, and the cross- correlation function between residuals and certain input parameters of the simulation model are analysed in the time and frequency domain. The power spectrum discloses at which frequency the residuals appear and the analysis of the cross-correlation function discloses which input parameters are correlated with the residuals and therefore may cause divergence. Finally, the squared spectra are analyzed to determine how large a part of the residuals may be explained by the input parameters. While the residual analysis does not disclose what is wrong with the program, it does indicate where to look for

Williamson (1995, p. 268) pointed out several inadequacies in the goodness-of-fit between measured and predicted data in a variety of empirical validation projects as follows:

 ‘No attempt is made to take into account the severity of the validation test;

 None gives a single measure of success (or otherwise) of the test;

 Isolation of sources of error is difficult;

 Tests cannot be used easily for internal validation and / or algorithm “tuning’.

Williamson further describes an objective technique for establishing the accuracy of simulation predictions, called the ‘Confirmation Technique’. In this technique of analysis a confirmation factor ‘Cs’ and the degree of confirmation factor ‘D’ are established to respond to the degree of correspondence to reality; the severity of the test, and to decide if the test result is sufficient to provide confidence that the model can be used for decision-making. He concluded that a minimum acceptable program level can be established based on the degree of confirmation factor. Williamson suggests that D>0.80 would seem to ensure a program of sufficient accuracy for most design decision-making.

Table 4.3 below represents a summary of degree of confirmation analysis and the goodness-of-fit statistic for a 7-day comparison of measured and predicted environmental temperatures in the living area of the CSIRO’s experimental low energy consumption houses (LECH) in Highett, Melbourne (Williamson 1995). The simulations were performed with the program EnCom 2.

Table 4.3: Degree of confirmation D and goodness -of-fit statistics (Source: Williamson 1995)

While there are a number of methods of analysing validation accuracy, the statistical method of identifying the residuals between measured predicted values should be the basis of any empirical validation process. However, predetermining a particular expected accuracy of a program and setting the parameter for passing or failing the test is more intricate. Establishing a confirmation

factor ‘D’ and providing a single measurement of success of the test would also provide concrete answers regarding the accuracy of the program for its use as an acceptable design tool.

4.5.2. Methods of Analysing Validation Data used in this Study

Two sets of data, the simulated and the measured data, were compared and analysed. Both data sets provided hourly time steps between the values. The primary objectives of the validation analysis were as follows:

 To demonstrate a relatively straightforward method of comparing the data sets;

 To present an analysis that could provide a basis for further developing and improving the software.

The first objective was achieved by using linear graphical temperature diagrams utilizing the graphical function within the spreadsheet based software EXCEL. General temperature profiles and differences in minimum and maximum temperatures between simulated and measured values in the zones of the houses were presented for each of the houses. The second objective required statistical analyses, specifically: linear correlation and residual analysis. These two options are discussed below.

a) Graphical Analysis

Linear graphical analysis was undertaken to visually compare simulated and measured hourly temperature values. This type of analysis allowed for a convenient visual comparison of temperatures between 5 September 2007 and 26 September 2007. An analysis of the differences between simulated and measured maximum and minimum temperature was undertaken. This method was used to initially identify key temperature trends in the zones of the houses. If temperature profiles of simulated and measured temperatures were very similar, the software simulation was correct. If the temperature profiles were similar, with corresponding trends of peaks and troughs but indicating different values, this may indicate faulty sensor calibration or suggest that aspects of the software needed improvement. If the temperature profiles were dissimilar, the software may have been inappropriately considering climate conditions or aspects of the building fabric. Figure 4.9 shows the graphical comparison of simulated and measured data for one of the test cells in Launceston for one week in July 2007 (Dewsbury et al. 2009).

Test Cell 1 15-22 July 2007 -10 -5 0 5 10 15 20 15 16 17 18 19 20 21 Dates, July 2007 A v e rage T emperat ure , De grees Ce nt ig rade

Outdoor Simulated Measured

Figure 4.9: Outdoor, simulated and measured temperatures in test sell 1 during a cold week (Source: Dewsbury 2009)

The graphical analysis shows the temperature profile comparison of the simulated and measured temperatures and the simulated and measured maximum and minimum daily temperatures comparison of the houses. Figure 4.9 shows that the maximum simulated and measured temperatures are very similar, however minimum temperature comparison is dissimilar, showing up to 3ºC lower simulated temperatures, when compared to the measured temperatures.

b) Statistical Analysis

The statistical analysis was undertaken to provide an indication of the accuracy of the simulation and to offer an explanation for the simulation errors, especially in relation to the modeling by the software. The statistical analysis included the following examinations:

 Correlation between measured and simulated temperatures;

 Distribution of residuals;

 Correlation of residuals between adjacent zones;

 Correlation of residuals zones and climate parameter.

4.5.3. Correlation between measured and simulated temperatures

To examine how different the simulated and measured temperature values are, correlation analysis was used. This technique determines the extent to which changes in the value of the simulated temperature are associated with changes in the measured temperatures. As a rating tool, the AccuRate software should predict temperatures as closely as possible to measured temperatures at any time, and an increase in measured temperature should correspond to a proportionate increase in simulated temperature. The proximity of the simulated temperature to the measured temperature is examined by drawing the scatter plot, with the measured

temperature in the X-axis and the simulated temperature in the Y-axis. If the line of best fit to the scatterplot slopes upwards (positive slope) and its correlation factor is close to 1, this indicates that the AccuRate simulation is directly correlated to the measured temperature in a linear manner, and their values are very close. For a perfect fit line, the correlation factor is 1. This means that the measured and simulated temperatures are equal. If the line of best fit slopes downward (negative slope), this indicates that the simulated program has a potentially serious problem that needs to be further examined. The tighter and more concentrated the data is accumulated around the trend line, the greater is the correlation within this cluster of data.

All scatterplots display a correlation factor ‘r’ at the lower left hand side of the diagrams. The correlation coefficient is an indication of the strength of linear association between the variables. For the purpose of identifying the strength of correlation, the value r can be classified as follows: (F Soriano 2010, pers. comm., 21 December).

 > 0.8 indicates a high degree of correlation;

 0.5 to 0.8 indicates a moderate degree of correlation;

 < 0.5 indicates a low degree of correlation.

Two lines are shown in the diagrams, namely, the best fit line (a black continuous line), and the perfect fit line (a red dotted line). By comparing the two lines, it is possible to examine how closely the program is predicting the simulation to reality throughout the temperature range.

4.5.4. Residual Analysis

The residual temperatures, referred to as ‘residuals’, are the actual temperature errors of the simulation. The residuals’ values are obtained by subtracting the simulated temperature from the measured temperature, as shown in Equation 4.2.

Residual Temperature = Tm – Ts Equation 4.2 where Tm = Measured Temperature

Ts = Simulated Temperature

a) Residual Histogram

This part of the residual analysis employs the histogram to examine the range, frequency, and distribution of residuals. One observation represents the value of mean hourly temperature data. This method also examines the normality of distribution of grouped residuals (or errors), and

clearly shows the frequency of positive and negative residuals. A positive residual value indicates that the simulation under-predicted the temperature, whereas a negative residual value indicates that the software over-predicted the temperature.

To investigate the cause of the difference between the simulated temperature and the measured temperature, correlation analysis is used as follows:

b) Correlation of residuals between adjacent zones.

This analysis investigates the correlation of residuals between adjacent zones of the house, as a means of examining how the residual values or simulation error in one zone may impact residual values or simulation error of an adjacent zone. The software program calculates temperatures based on an energy balance equation in the house considering many factors such as: fabric conductivity, material emittance values, thermal capacitance and external climate inputs. If the software has not correctly calculated the thermal performance of one zone, this would also affect the thermal performance of the adjoining zones. For example, when the residual value for one zone has a positive value, (that is, the software under-predicts the temperature), this can be due to the software modeling too much heat to adjoining zones with less heat or energy level remaining in the original zone. Scatterplots are drawn with the residuals of one zone in the X- axis and the residuals of an adjacent zone in the Y-axis.

c) Correlation of zone residuals and climate parameters.

This part of residual analysis focuses on the examination of zone residuals with the measured climate parameter, namely: external air temperature, global solar radiation, wind speed and wind direction. One of the major factors affecting the thermal performance of buildings is the external climate. The use of measured site climate data in the simulation is one of the fundamental necessities in the empirical validation process.

In document An empirical validation of the house energy rating software AccuRate for residential buildings in cool temperate climates of Australia (Page 102-107)