The present thesis uses for most of the developed methods the same eval-uation values, interval forecasts being the one exception. These values de-termine how well the methods estimate quantiles of the values being fore-cast, which reflects on the accuracy of their forecast uncertainty descrip-tion. The quantile estimates used are for instance, the values of the quantile regressions based on the NNQF (cf. Section 2.2.1), the quantiles of the non-parametric distribution forecast (cf. Section 2.2.3), the quantiles of the parametric distribution forecasts (cf. Section 2.2.4), or the empirical quan-tiles of the values forming a scenario forecast at a specific point in time (cf. Section 2.2.5). These quantile estimates are then evaluated on a test set comprised of NT ∈ N>0input vectors and their corresponding desired outputs, i.e. xnand ynwith n = 1, . . . , NT. Just as in Equation (1.21), yn
represents the time series values to be forecast.
As already mentioned, quantile estimates of the desired outputs, i.e.
ˆ˜
y(q),n, can be obtained with the methods developed as part of the present thesis5. Afterwards, the accuracy of these quantile estimates can be mea-sured using several values. For instance, the average pinball-loss QPL,(q)
is one of the values used in the present thesis. It is an error measure that describes how large is the deviation of the estimates from the true quantiles and is given as:
QPL,(q)= 1 NT
NT
X
n=1
((q − 1) · (yn− ˆy˜(q),n) , if yn ≤ ˆy˜(q),n q · (yn− ˆy˜(q),n) , else .
(2.22)
Another value that is used in the present work is the reliability deviation QRD,(q), which is based on the following: if an estimate of a quantile with a probability q of being greater or equal to a desired output is given, then
5Please note that the tilde superscript is again used to denote the fact that all developed methods are based on quantile regressions obtained via the NNQF.
2.3 Evaluation Values
the difference between the percentage of desired outputs that are actually lower or equal to that estimate and q should be close to zero. The reliability deviation is defined as:
QRD,(q) = 1
A great disadvantage of the reliability deviation is that it may consider a triv-ial quantile regression (i.e. a model that estimates the same value regardless of the input vector used) to be perfect, even though that may not be neces-sarily the case; this effect is later shown in Section 2.4.2 and Figure 2.17.
Therefore, a modified version of the reliability deviation, QMRD,(q), is also used in the present thesis. The value QMRD,(q)is an average of the absolute values of reliability deviations calculated separately on ST∈ N>0segments of the used test set, i.e.:
QMRD,(q)= 1 In the previous equation, QRD,(q)irepresents the reliability deviation of the ithsegment, nTis the minimal number of values in a given segment, floor(·) is a function that rounds its input to its lowest closer integer, I(·) is an indicator function, and STis a free parameter representing the number of segments to be tested. In Section 2.4, ST= 10 is used.
Evaluating a single quantile estimate is not enough to assess the possi-bility of obtaining quantile regressions with a given data mining technique nor to determine the quality of the forecast uncertainty described as a non-parametric CDF, a non-parametric CDF, or a scenario forecast. Therefore, the
2 Probabilistic Forecasting
averages of the values of QPL,(q), the absolute values of QRD,(q), and the values of QMRD,(q) obtained across L ∈ N>0 different quantile estimates are also calculated; in other words:
QPL= 1
In the previous equations, qlrepresents the probability corresponding to the lthquantile estimate, while QPL, QRD, and QMRDare the average pinball-loss6, average reliability deviation, and average modified reliability devia-tion, respectively. Since all average values obtained represent a mean de-viation from an optimum, the closer they are to zero the better the quantile regressions and the forecasts are.
In the case of evaluating interval forecasts other values have to be used;
values that evaluate the intervals formed by pairs of quantile estimates (cf.
Equation (2.5)) and not only the quantile estimates themselves. Therefore, the present thesis uses the following values to assess the quality of an in-terval forecast. The first value is the inin-terval width QIW, which is given as follows:
6The non-parametric and parametric distribution forecasts are evaluated with Equation (2.25) and not with the more traditional continuous ranked probability score (CRPS) [28], since the former is related to the latter [29] and even approximates it if the results of several quantiles are used (as shown in Appendix A.4).
2.3 Evaluation Values
with ˆy˜(qu),nand ˆy˜(ql),nrepresenting the quantile estimates forming the upper and lower interval bounds (cf. Equation (2.5)) of the given interval forecast for the nth desired output. Since broad intervals are undesired, the lower QIW,(qu,ql)is, the better the interval is considered. Using Equation (2.28) a value taken from a definition in [28] and referred to in the present work as the interval score QIS,(qu,ql)can then be calculated. This value is given by the next equation: As it is shown in Equation (2.29), QIS,(qu,ql)considers not only the devia-tions outside the given interval, but also its width. Finally, the last value used is referred to as the modified interval reliability deviation QMIRD,(qu,ql)and it is utilized to determine if an interval forecast with a desired probability (qu− ql) of containing the future time series values actually fulfills that goal. Similarly as in Equation (2.24), the modified interval reliability de-viation is given as the average of the absolute values of interval reliability deviations QIRD,(qu,ql)i; i ∈ [1, ST] obtained on STsegments of the used
with nTdefined as in Equation (2.24).
2 Probabilistic Forecasting
The capability of a data mining technique for creating accurate interval forecasts cannot be determined by looking at a single interval. Therefore, the evaluation of a data mining technique should consists in averaging the QIW,(qu,ql), QIS,(qu,ql), and QMIRD,(qu,ql) values for L ∈ N>0 different interval forecast, i.e.:
QIW = 1 L
L
X
j=1
QIW,(qu,j,ql,j), (2.31)
QIS= 1 L
L
X
j=1
QIS,(qu,j,ql,j), (2.32)
QMIRD= 1 L
L
X
j=1
QMIRD,(qu,j,ql,j); (2.33)
with qu,j and ql,j representing the probabilities of the upper and lower bounds of the jthinterval tested. Furthermore, QIWis the average interval width, QISis the average interval score, and QMIRDis the average modified interval reliability deviation. Just as before, the closer these average values are to zero, the better the interval forecasts are considered.