Cross validation - Data and Methods - A Synoptic Investigation into Lapse-Rate Variability at V

5. A Synoptic Investigation into Lapse-Rate Variability at Vestari

5.3. Data and Methods

5.4.4. Cross validation

The initial choice of parameters chosen in the stepwise regression model (Section 5.4.2) proved to be robust during the cross-validation procedure, as the same six terms reported in Table 5.2 were included in each of the final models. The coefficients for these retained variables were significant for each year (p <0.01) and their values exhibited little variability (Table 5.3), which gives confidence that the initial model had not been over fitted.

Table 5.3. Values for the regression coefficients obtained through the cross-validation procedure. Column ‘σ’

provides the standard deviation of the coefficient’s value. | ̅| is the mean absolute value of the t-statistic, and ̅ is the corresponding mean p value. The intercept obtained from the cross validation which completes the regression equation is equivalent to the mean lapse rate (-0.53°C 100 m^-1).

Parameter Mean σ | ̅|

When comparing the performance of the synoptic and regression models applied during the cross-validation, it is evident that there is relatively little difference in their ability to simulate the lapse rate, as the average RMSEs are in reasonably close agreement (Table 5.4). The range of each model’s ability to capture lapse-rate variability is illustrated further in Figure 5.4, which gives the correlation coefficient between the observed and modelled series on an annual basis. These results are consistent with the RMSEs, in showing that the regression model tracks lapse-rate variability slightly more closely: only in 2003 and 2008 does the weather-category model achieve a lower RMSE, and only in 2008 is the observed correlation higher than for the regression model. For context, it should be noted that both models offer significant improvements over the approach of forecasting lapse rates using temperature as the solitary predictor in a regression equation (cf. Gardner et al., 2009;

Hodgkins et al., 2012a, 2012b): simulating lapse rates in the cross-validation procedure with this technique resulted in a mean correlation coefficient of only 0.18;

this compares to coefficients of 0.51 and 0.62 achieved by the weather-category and lapse-rate models respectively (i.e. the dotted lines in Figure 5.4). Thus, the results presented here offer a significant improvement in capturing daily lapse-rate variability at this location.

Table 5.4. Annual error metrics for the regression and weather-category lapse-rate models. Note that all units are

°C 100 m^-1.

RMSE MSE Year Weather Category Regression Weather Category Regression

2001 0.147 0.143 -0.016 -0.026

Figure 5.4. Correlation coefficients between annual modelled and observed lapse rates.

Whilst the weather-category lapse-rate model explained an encouraging amount of variance, it was considered likely that its performance could be improved further by optimizing the analogue approach. The motivation behind this is that synoptic weather types are not completely homogeneous with regards to their meteorology, and within-type lapse-rate variability can therefore be expected to remain. This may be a limiting factor affecting the performance of the weather-category analogue model.

Consequently, within the cross-validation scheme, the effect of using analogues that were more synoptically similar to the day being simulated was investigated. More specifically, this technique involved parameterizing the lapse rate as a function of the most synoptically similar day(s), rather than as the mean of the synoptic type.

That is, for each day ( ) the lapse rate is simulated, a day ( ) in the calibration years (i.e. not in the year being simulated) is sought which minimizes:

. 5.7

This is similar to Equation 5.5, such that the still denotes the PC scores for the respective days; however, rather than minimizing the distance to the nearest synoptic weather type ( in Equation 5.5), the nearest day(s) is(are) sought. This technique addresses the effect of synoptic dissimilarity within groups by only using the most similar synoptic conditions as an analogue. In addition, through relaxing the condition set by Equation 5.7 to include the most similar days, and thence prescribing the lapse rate as the mean of these days, an estimate of the sensitivity of the synoptic model to group size may be obtained.

The results from application of this procedure, plotted in Figure 5.5, suggest that using the most synoptically similar day as a solitary analogue actually leads to a considerably poorer model performance in the lapse rate simulation. However, as the number of synoptically similar days is increased slightly, both the RMSE and the correlation metrics rapidly improve; performance then monotonically declines as the number of similar days is increased further.

Figure 5.5. Performance of the optimized analogue approach as a function of the number of analogue days used to simulate the lapse rate (x-axis). The grey line indicates the best solution according to Equation 5.8.

0.125

The optimum number of analogue days to average the lapse rate over is found by defining a skill score ( ) according to:

, 5.8

where is the highest RMSE observed, is the correlation coefficient, and the subscript denotes the number of most synoptically similar days over which the lapse rate was averaged during the cross-validation procedure (for example, when = 3, each day’s lapse rate was calculated as the mean of the 3 most synoptically similar days). Application of Equation 5.8 indicates that the optimum performance (highest ) is found when the lapse rate is forecast as the average of the 15 most similar synoptic analogues: this yields mean annual correlation coefficients and RMSEs of 0.58 and 0.127°C 100 m^-1, respectively. The optimised analogue technique therefore results in a mean correlation coefficient which is higher than that achieved by the original weather-category model, but still slightly lower than obtained with the regression model. However, the RMSE for this optimized model is slightly lower than either of those recorded in Table 5.4, and is therefore the best performing model in this regard. The performance of this optimized model during the cross validation is visually compared to that of the stepwise regression and the original weather-category model in Figure 5.6.

An interesting aspect of the models’ performance is that their ability to simulate interannual variability is limited (Figure 5.7). In particular, it is apparent that changes of the modelled lapse rates are relatively muted from year to year compared with observations. Furthermore, the models fail to reproduce the progressive shallowing of lapse rates that has occurred during the measurement period satisfactorily. This is quantitatively demonstrated with the aid of regression: when regressed upon year, the observed trend in the lapse rate between 2001-2010 is 0.018°C 100 m^-1 a^-1, which is significant at p = 0.05. None of the models produce a similarly significant trend.

Possible reasons for this shortcoming of the models are suggested in sections 5.5.2 and 5.5.3.

Despite these limitations, the advantages of parameterizing lapse rates with the regression and analogue models are evident when considering the potential effect on ablation. Table 5.5 provides the sum of positive temperatures calculated by extrapolating the temperature up/down glacier based on the simulated lapse rates.

Positive temperatures (frequently called positive degree days or ‘PDDs’) are given because this is a parameter often used in empirical melt models (Hock 2003, 2005), and Hodgkins et al. (2012a) previously demonstrated an improvement in forecasting PDDs at this location when the lapse rate was parameterized through regression, with temperature as a solitary predictor. The errors presented in Table 5.5 for the models developed in this chapter therefore represent an improvement in this regard.

Figure 5.6. Observed versus lapse rates predicted by the models discussed in the text. Colour ramp indicates relative bivariate density; orange line illustrates the 1:1 relationship.

Figure 5.7. Interannual variability of observed and modelled lapse rates.

Table 5.5. PDD errors using lapse rates modelled through the cross-validation procedure.^a ‘Elevation’

label indicates location (AWS station) to which temperatures were extrapolated (i.e. ‘500’ indicates PDDs calculated by extrapolating 1100 m temperature to 500 m using modelled lapse rates).^b Shows results from using temperature as the only predictor in a regression model, following the method of Hodgkins et al. (2012a), but employing 2 m reanalysis air temperature, which correlates more strongly with the lapse rate than 750 hPa air temperatures.

Model ^aElevation (m) Error (%)

Stepwise Regression 500 0.0

1100 -0.7 Weather Category Analogue 500 0.2

1100 -1.0

Optimized Analogue 500 -1.9

1100 -2.8

bTemperature Regression 500 6.7

1100 -1.9

ELR 500 15.1

1100 -38.0

In document Glacier-climate interactions: a synoptic approach (Page 160-167)