5.3 Analysis Methodology
5.3.3 Cross-validation
To evaluate the quality of air temperature prediction using the relationship as defined by the linear regression of EST and air temperatures, cross-validation was used (Geisser, 1993). Cross-validation allows testing of predictive rela- tionships derived from a correlation between two data-sets without the need for additional test data (Geisser, 1993). As such, cross-validation was selected for use in this study as all available pre-processed AVHRR and MIDAS data were used to define the EST-air temperature relationship, meaning that no additional data were available for validation. Cross-validation works by re- moving a sub-set of the original data (the validation sample) and using the remaining data points (the construction sample) to derive a new line of best fit to act as the model. The new model is then used to predict values of the dependant variable. The newly predicted values are then compared to the validation sample to assess the quality of prediction (Geisser, 1993).
To test EST and air temperatures four different cross-validation tests were performed, each generating root mean square error (RMSE) as the test met- ric to quantify the resulting errors. RMSE was selected as it is a popular method for succinctly quantifying the error of residuals (Ebdon, 1985). Each of the four cross-validation tests were designed to assess the response of the model to different characteristics. The first test performed a standard two- fold cross-validation, which was used as a benchmark for the other cross- validation tests. In the two-fold test the paired EST and air temperatures were randomly divided into two with the first half being used as construction data for the model and the second half as validation data (Geisser, 1993). The process is then repeated with a second pass, but with the construction and val- idation data swapped, and an average of the RMSE values from both passes calculated. Figure 5.6 shows the first pass of the two-fold cross-validation process. A two-fold cross-validation function was created as part of the SDAT module and can be seen in Appendix I.
Figure 5.6: Flow diagram showing the two-fold cross-validation (first pass) testing of the air temperature model.
The second cross-validation test was a leave-one-out test where a single obser- vation is removed from the paired data, and the remaining observations are used to construct a new line of best fit. The removed observation is then used for validation, generating an RMSE value. This process is iterated over all the data and average of all RMSE values forms the test statistic. The advantage of the leave-one-out test is that all observations are used for both training
and validation. An RMSE higher than the two-fold test would indicate an un- even weighting in the data, with some points having a considerably greater effect on the fit than others. The leave-one-out cross-validation test was also implemented as part of the SDAT module and is included in Appendix I. The third test was similar to the two-fold test, except that the data removed from the model to act as validation data were manually selected by station. All the data from each of the four stations were removed in turn to act as the validation sample and the remaining three stations used for the construction of the model. The objective of this test was to quantify the model’s dependency on each of the stations. A large RMSE for a given station would indicate that the model was highly dependant on the relationship between EST and air temperatures at the removed station, and that the three stations used to build the model were unable to predict air temperature at the removed station. The fourth test paired EST and air temperature measurements from six sum- mers in the time-series to act as validation samples (1989, 1990, 1995, 2002, 2003, 2006). The six summers were selected from the time series because they had high numbers of paired measurements (n > 100) and so could be used to form robust validation samples. Each summer was tested in turn, with the rest of the time-series used for model construction. The results were compared to the maximum, average and minimum EST and air temperatures for each of the six tested summers to identify relationships between cross-validation re- sults and different summer temperature regimes (e.g, cool, average and warm summers). The objective of this test was to assess whether the derived mod- els could predict air temperatures for the removed summers. This test would examine the accuracy of the model to estimate air temperatures during the extreme heatwave summer of 2003. Given the extreme temperatures recorded during the summer of 2003 (Burt, 2004a) it is reasonable to predict that these outlying temperatures will be less well characterised by the model than mean temperatures during the time-series. Therefore, the results from this test provide a key indicator as to whether the model is capable of estimating air temperatures during heatwave events. The third and fourth cross-validation tests were created as Python scripts which used the cross-validation functions
in the SDAT module (Appendix I), for brevity these scripts have not been in- cluded.