In previous sections, many statements on which method performs better have been made. But how can different methods be compared on performance and accuracy?
Estimation and validation
The accuracy of a forecasting method is often checked by forecasting for recent periods of which the actual values are known (Hanke & Reitsch, 1998). Data can be held out for estimation validation and for forecasting accuracy. The data that are not held out, are used for parameter estimation (for example the α and β). The model with this parameters is then tested on the data that is withheld for the validation period. When those results are satisfactory, the forecasts for the moments in the future (of which no values are known yet) (Kuchru, 2009). Figure 2.12 visualises this estimation- and validation periods.
Figure 2.12: Estimation- and Validation period (Nau, 2017)
Withholding data for validation purposes is one of the best indications of the accuracy of the model for forecasting the future. At least 20 percent of the data should be held out for validation purposes (Kuchru, 2009). Normally, a 1-step ahead forecast is computed in the estimation period and an n-step ahead forecast in the validation period. However, in our research, we do this a bit differently. We compute a 1-step ahead forecast in the validation period with what is called a rolling horizon which means that after each 1-step ahead prediction, we do as if the information for that period becomes available (as if the estimation period becomes one step longer and the validation period one step shorter).
Performance indicators
There are several methods that calculate the accuracy of a forecast. Let us define Yt−Ft
aset, called the one-step-ahead forecast error. When comparing forecasts on a single series,
several common methods could be used; the mean absolute deviation (MAD), the root mean squared error (RMSE), the mean absolute percentage error (MAPE), and the mean error (ME).
- Mean absolute deviation (MAD) =mean(|et|)
- Root mean squared error (RMSE)
s 1 n n P t=1 (et)2
- Mean absolute percentage error (MAPE) =mean(|pt|) wherept= 100et/Yt
- Mean error (bias)
Hyndman et al. (2008), Gardner (1985), Price & Sharp (1986), Taylor (2003) and many others use either the MAPE or MSE or both as accuracy measures. A good model should have small errors in both estimation and validation periods and its statistics in both periods should be similar (Kuchru, 2009). By using the MAPE, the positive and negative errors can- cel each other out. The RMSE is more accurate for that reason since it squares the errors and therefore does not let the positive and negative errors cancel each other out. A disadvantage is that the RMSE is scale-dependent (which means that for example an error of one is way worse on an actual observation of two compared to an actual observation of a thousand) so it can only be used to compare forecast performance of different methods on the same time series. The MAPE is similar to the MAD except that it is expressed in percentage terms (Hoshmand, 2009). The advantage of this is that it takes into account the relative size of the error to actual observations. The MAPE also comes with a big disadvantage: it is scale sensitive. Since the actual observation is in the denominator of the equation, the MAPE is not defined when actual usage was zero. Besides, when actual usage is low, the MAPE can take extreme values. Therefore, the MAPE should not be used for low-volume data.
Those error measures are used in three different ways: firstly, for comparison of the accu- racy of two different methods. Secondly, to find out whether a method is useful or reliable. Thirdly, it is used to select the optimal technique (Hanke & Reitsch, 1998).
However, errors are not the only aspect to take into account. The choice of model should also be based on the principle of parsimony which states that, other things being equal, simple methods are preferable to complex ones (Hoshmand, 2009).
Tracking signal
When a forecast model is chosen, it is important to monitor whether the system remains in control (Trigg, 1964; Gardner, 1983). For example, when SES is chosen, but after a while, a trend appears in the series, the user might want to change the forecasting model or change the value of the parameter(s). In other words, we want to monitor whether biased errors occur. A widely used method for this is to compute a tracking signal (Trigg, 1964). The updating equations are as follows:
Smoothed errort= (1−α)Smoothed errort−1+αet (2.41)
M ADt= (1−α)M ADt−1+α|et| (2.42)
whereet is the error at period t. The tracking signal is computed as follows:
T St= Smoothed errort/M ADt (2.43)
If the system is so much out of control that all errors have the same sign, this tracking signal will approach plus or minus one (Trigg, 1964). Both Trigg (1964) and Gardner (1983) advice to use α = 0.1 since at higher values of α, the performance of the smoothed error signal deteriorates badly. For α= 0.1, Trigg (1964) proposes limits of ±0.55. As long as the tracking signal is between these limits, the system is in control. However, when it is outside these limits, updating is advisable (where updating means either using another model or
updating the parameters).
To give an example, Figure 2.13 shows the forecast made by the degree-days method for a certain time series (Storage 6 in Appendix G) and the corresponding tracking signal. We see that, except for the first couple of observations (the tracking signal needs some warming up, since in the beginning the smoothed error is equal to theM AD), the forecasting system remains in control. What we see is that in the 79th week until the 82nd week, the forecast is structurally below the actual demand which is visible in the tracking signal figure in the sense that the tracking signal value rises in this period and almost hits the control limit.
Figure 2.13: Estimation- and Validation period (Nau, 2017)