Measuring forecast accuracy - Developing a Forecasting Model for the Power Production of Wind T

The most important criteria for assessing the accuracy of a forecasting model is the model accuracy. In this section we provide a list of accuracy metrics that is commonly used by researchers. In this list,

𝑌𝑎(𝑖) is the actual value of Y of the ith observation, 𝑌𝑒(𝑖) is the expected value of Y forecasted by the model, 𝑌𝑎 is the mean value of the actual observations of Y, N is the total number of observations and p is the number of parameters. The list consists of the Relative Error (RE), Mean Absolute Error (MAE), symmetric Mean Absolute Percentage Error (sMAPE), Normalized Mean Absolute Percentage Error (NMAPE), Root Mean Squared Error (RMSE), the Coefficient of Determination (R²) and lastly, the Standard Error of Regression (S).

Page | 27 𝑅𝑀𝑆𝐸 = √1 𝑁∑ (𝑌𝑒(𝑖) − 𝑌𝑎(𝑖)) 2 𝑁 𝑖=1 (2.18) 𝑅2= 1 −∑𝑁𝑖=1(𝑌𝑒(𝑖)−𝑌𝑎(𝑖))2 ∑ (𝑌𝑎(𝑖)−𝑌𝑎) 2 𝑁 𝑖=1 (2.19) 𝑆 = √∑𝑁𝑖=1(𝑌𝑒(𝑖)−𝑌𝑎(𝑖))2 𝑁−𝑝−1 (2.20)

The first metric in the list show the errors for a single observation, so this is not suited to measure the accuracy for a total sample size (Hyndman & Koehler, 2006). MAE is suited for the entire sample size and is very easy to interpret. However, this metric is scale-dependent, meaning that the metric will have higher outcomes as the scale (maximum power output) increases. RMSE has the same disadvantage, since it is scale-dependent as well. RMSE is also more sensitive to outliers than MAE due to its squared error, which led some researchers to recommend against the use of RMSE in accuracy evaluation (Hyndman & Koehler, 2006).

Percentage errors like MAPE have the advantage of being scale-independent (Hyndman & Koehler, 2006). This makes that they are frequently used to compare forecast performances across different data sets. However, MAPE has the disadvantage of being undefined if 𝑌𝑎(𝑖) = 0 for any observation or being extremely skewed if 𝑌𝑎(𝑖) is close to zero. MAPE also has the disadvantage of putting a heavier penalty on positive errors than on negative errors. To avoid this, the sMAPE can be used. However, according to Hyndman and Koehler (2006) the sMAPE is not as symmetrical as their name suggests. For the same value of 𝑌𝑎(𝑖), sMAPE gives a heavier penalty when forecasts are low compared to when forecasts are high. NMAPE has the advantage of showing the mean percentage compared to the maximum actual value, this makes the metric desirable since it is simple and easy to interpret (Hyndman & Koehler, 2006).

The coefficient of determination, R², expresses the fraction of variance that can be explained by the model. R² is a statistic that gives information about the goodness of fit of a model. For example, in regression, R² is used to indicate how well the regression line fits the data, a value of 1 indicates a perfect fit, which means that 100% of the variance can be explained by the model. However, we should note that R² can only be used with linear regression models. The assumptions of the linear regression model should be met, otherwise the interpretation of R² can lead to misleading conclusions. R² should not be used for nonlinear regression models (Frost, 2014; Spiess & Neumeyer, 2010).

Lastly, we discuss the standard error of regression, S, which is also called standard error of estimate (Hoshmand, 2009). In contrast to R², S can be used for both linear and nonlinear regression. According to Frost (2017), the standard error of regression is superior to the coefficient of determination for both linear and nonlinear regression. The S statistic is an absolute measure of the typical distance that the data points fall from the regression model. S is measured in the units of the dependent variable. The standard error of regression is interpreted like any other standard deviation. It means that if the dependent variable is distributed normally around the regression plane, approximately 68% of the values of the dependent variable fall within a range of ± S (Hoshmand, 2009). Furthermore, approximately 95% of the values fall within ±2S. This means that if the error terms are normally distributed with a mean of 0, then the statistic S can be used to calculate a 95% prediction interval (Frost, 2017). Frost (2017) prefers the standard error of regression over the coefficient of determination, because it is better at evaluating the precision of the predictions.

Page | 28

Model validation

To validate a regression model, the dataset can be split into training data and test data. Training data are used to estimate parameters and the test data are used to evaluate the model accuracy. When calculating the forecast accuracy, always use test data that were not used when computing the forecasts (Hyndman, 2014). If there is a big difference in accuracy between the training data and the test data, then we are probably overfitting the model to the training data.

Figure 2.13: A time series divided into training- and test data (Hyndman, 2014).

The size of the test data is typically 20% of the total sample, although this value depends on the sample size and the forecast horizon (Hyndman, 2014). The size of the test set should be at least as large as the forecast horizon.

In case of a small sample size or a short time series, we do not want to split the data since the conclusions we draw from the forecast accuracy measures are not very reliable due to the small data set. To avoid this problem, cross-validation can be used. A lot of types of cross-validation are available, they all have the same underlying idea. The entire dataset is split into training- and test data several times. Each time a different part of the dataset is used as training- and test data, cross-validation combines the measure of fit to derive a more accurate estimate of model performance. If the sample size is large, there is no need to use cross-validation. The dataset can simply be split into training and test data.

In document Developing a Forecasting Model for the Power Production of Wind Turbines (Page 39-41)