1.2 Theoretical Background
1.2.3 Time Series Forecasting
A finite time series {y[k]; k = 1, . . . , K} can be described as a set of K ∈ N>0 observations measured at equidistant points in time [47]. These observations form an orderly sequence in which the position of each obser-vation is based on its corresponding timestep value k. A time series forecast-ing model uses some available information to estimate the unknown future of a desired time series at a forecast horizon H ∈ N>0. For example, a forecasting model able to estimate a future time series value using current and past values of the time series in question and from several exogenous time series can be given as:
ˆ
y[k + H] = f (y[k], · · · , y[k − H1], uT[k], · · · , uT[k − H1]; θf); k > H1. (1.17) In the previous equation the vector θf describes the forecasting model pa-rameters, the vector u[k] contains the observations of various exogenous time series at timestep k, and the value H1 ∈ N0represents the number of used time lags.
Two of the most common time series forecasting techniques are exponen-tial smoothing and the auto-regressive integrated moving average (ARIMA)
1.2 Theoretical Background
models [25]. A simple exponential smoothing model is given as a weighted average of past observations of the desired time series6, i.e.:
ˆy[k + H] =
y[1] , if k = 1
k−2
X
i=0
α(1 − α)iy[k − i] + (1 − α)k−1y[1] , else ,
(1.18) with α ∈ [0, 1] being the smoothing parameter. As Equation (1.18) shows, the weights decay exponentially as the observations get older, therefore giv-ing more recent observations a greater influence [14, 25]. Additionally, since the simple exponential smoothing in Equation (1.18) uses only past ob-servations to conduct its forecast, it can also be defined as an auto-regressive time series model. More complex and non-linear exponential smoothing variants can be found in [48].
An autoregressive (AR) integrated (I) moving average (MA) model, i.e.
ARIMA, is a generalization for non-stationary time series of the autoregres-sive moving average (ARMA) [47] model, which can be further divided into its auto-regressive (AR) and moving average part (MA). These models are based on the idea that time series are realizations of a stochastic pro-cess [49]. Moreover, the ARIMA model and all of its simplifications allow the use of exogenous time series as input; this inclusion is denoted by the
6Equation (1.18) delivers accurate forecasts for H > 1 only if the forecast time series has no trend or seasonal component.
1 Introduction
letter X at the end of their names, e.g., ARIMAX [50]. For example, an ARIMAX model of first order difference can be given as:
ˆ
with ai, bj and cl representing the model parameters. It needs to be mentioned, that while Equation (1.19) defines the ARIMA(X) model for a generic forecast horizon H, it is traditionally used for H = 1 (i.e.
ˆy[k + H − 1] = y[k]). In such case, forecasts for greater forecast hori-zons are obtained using the ARIMA(X) model iteratively and setting all unknown residuals (i.e. [k + 1], [k + 2], etc.) equal to zero, as exempli-fied in [25]. Interested readers are referred to [47] and [51] for additional information regarding ARIMA(X).
According to [25] time series can be decomposed into three distinct com-ponents: a trend-cycle component, a seasonal component, and a remainder.
The way in which the various components form the original time series de-pends on the used assumption. For instance if an additive decomposition is assumed, an observation of a given time series can be described as:
y[k] = yT[k] + yS[k] + yR[k]; (1.20) with yT[k] representing the trend-cycle component, yS[k] the seasonal com-ponent, and yR[k] the remainder. These components can be used to obtain models, like the Holt-Winters model [48] (i.e. an expansion of the tradi-tional exponential smoothing technique that estimates a trend and a season-ality) or the seasonal ARIMA(X) model (SARIMA(X)).
1.2 Theoretical Background
Since linear models can be inadequate for some real-world applications [49], non-linear techniques, as e.g., artificial neural networks (ANN) [14] and support vector regressions (SVR) [52], have also been found useful in fore-casting [53, 54]. Notice that models obtained by non-linear techniques can also be classified as NARIMA(X) models (with the letter N denoting their non-linearity), depending on their utilized input values [55].
Time series forecasting models can be in general separated into white-box, data-driven (i.e. black-box), and gray-box models [30]. The first use known relations, expert knowledge, etc. to determine the relation between the used inputs and the future of the time series of interest, the second try to estimate such relation by applying data mining techniques (e.g., linear regressions, ANNs), and the third are a combination of the previous two.
In the present thesis, forecasting models are going to be generalized as regressions, since they map – just as a regression – a given input to an es-timate of a desired output. For example, to eses-timate Equation (1.17) using regression data mining techniques, both the input and desired output have to be defined as:
y :=y[k + H]
x :=[y[k], · · · , y[k − H1],
uT[k], · · · , uT[k − H1]]T; k ∈ [H1+ 1, K − H] .
(1.21)
Likewise, the forecasting model parameters (cf. Equation (1.17)) become then estimated regression parameters, i.e. θf:= ˆθ.
Readers interested in other forecasting approaches, as e.g., state space models, autoregressive conditional heteroscedasticity (ARCH) models, gen-eralized ARCH (GARCH) models, deep learning, Gaussian process regres-sion, techniques using compressed sensing, etc., are referred to [49], [56], [57], [58], and [59].
In addition, forecasts describing the future of various time series and their aggregation at different aggregation levels, i.e. hierarchical forecasts, are
1 Introduction
also relevant in the present thesis. Their importance is attributed to the fact that the aggregation of single time series (as e.g., load time series of individ-ual households) results in values whose forecast is in some cases of interest (e.g., a substation’s load time series). Hierarchical forecasts are mostly di-vided into bottom-up and top-down [60]. The former start by obtaining forecasts at the lowest aggregation level and then aggregating them accord-ingly, while the latter begin with a forecast for the highest aggregation level that is later distributed to the lower levels. Of course, there is also the pos-sibility of independently forecasting each time series at every aggregation level. Nonetheless, such approach does not assure the obtainment of co-herent hierarchical forecasts, i.e. that the forecast of time series at higher aggregation levels actually represent the sum of the ones at the lower levels.