4.2 Multi-Model Forecasting Using RBF-ANN with an On-line
4.2.1 Discrete Derivative as a Feature Extraction Method
The Feature Extraction module performs the task of transforming raw data into a set of useful features (GE06). The function of the feature ex- traction is to improve the classification performance of the data. The discrete derivative (or differentiation) of the time series is implemented as feature extraction method described by
where Yt0is given by the difference of the data at time t at lag t − 1.
The effect of this transformation consists in removing the mean and trend taking the relative increments of the time series used for the mode identification (MG08).
4.2.2
On-line Mode Recognition for the Multi-Model Pre-
dictor Approach
The proposed methodology for building the Multi-Model Predictor is designed to use machine learning regression methods that learn a model that associates the input with the output vectors from a subset of samples (AYK+11). In order to build the Multi-Model Predictor, the specific algo-
rithms for the MMP training and On-line Mode Recognition with the Multi-Model Predictor (RBFMMP+OR) implementation are described next.
Preparation Phase
In the preparation phase is performed the training of the Multi-Model predictor. This phase involves the Decomposition, Identification and
Trainingsteps performed off-line. The implementation of each step in- volved is described next.
Decomposition:
This step decomposes the time series for the identification of the different behavior patterns. The decomposition starts assuming the availability of an univariate time series described by Equation 4.3,
Y = {Y1, . . . , Yi, . . . , Yn} (4.3)
where Y1, Yiand Ynare the first, the i-th and the last element of the time
series respectively. The time series Y is decomposed in samples S as presented in Equation 4.4.
where N0 is the total number of samples defined by bn−h−m+1τ c and Si
is the sample obtained splitting the original time series Y according to Equation 20
Si= {Y(τ ·i−m+1), . . . , Yτ ·i, . . . , Yτ ·i+h} (4.5)
where m is the number of observations, h is the desired prediction hori- zon, and τ is the period parameter that defines a split point that is ap- plied every τ number of displacements. The τ displacements are per- formed by a sliding window of size m + h inside each sample Siin order
to construct the training samples TS. A graphical description of the data organisation used for the construction of the training set is shown in the Figure 20, where S1and S2are samples of size m + h + τ − 1 that allows
τdisplacements of the sliding window inside each sample.
m h
Figure 20:Time Series processing data
Identification and training
The objective of the identification analysis is to detect the number of dif- ferent modes in the samples S. The samples S are classified into a suit- able number of classes k using a clustering algorithm to group them ac- cording to their similarity. Then, a local forecast model is trained with classified data for each specific mode. In order to choose a number of clusters k, the classification is validated by maximizing the mean silhou- ette coefficient shown in Equation 3.89. When the Feature Extraction
module is implemented, the Pattern Classifier uses the samples S0 that are transformed in the same feature space to classify S. After the classi- fication of S, labeled samples LSi= {Ci, Si} are obtained where Ci ∈ K
are the possible labels of Ciwith K = {1, . . . , k}. Then, the class informa-
tion is incorporated using extended input vectors that include the m last observations concatenated with the codified class information. The pro- cedure for generating training samples TS is implemented in the Data
Formattingmodule. To generate the training samples TS, a sliding win- dow of size m + h is used to obtain from the labeled samples LS the training samples TS = {{C0, Input}, y}where C0
i = {ICi,1, . . . , ICi,k} is the codification of the class in ci using the Ci − th row of the identity
matrix Ik×kas shown in:
k1→ k01 .. . kk → k0k = 1 . . . 0 .. . . .. ... 0 . . . 1 k,k = Ik,k (4.6)
The vector Inputi = {Si,j, . . . , Si,j+m−1} is a sequence of m inputs
and yi = {Si,j+m, . . . , Si,j+m+h} is the output or target vector of size
hwhere j = {1, . . . , τ } is the index that moves the sliding window τ times composing the training set by extending the each input vector to xi = {Ci, Inputi} obtaining a set of training samples denoted by TS =
{xi, yi}N
0
i .
Training
The machine learning method is trained using the training samples TS to optimize the objective function
minimise w N0 X i=1 (f (xi) − yi)2 (4.7)
where w is the set of weights that minimises the sum of the squared errors between the estimated output by the neural network denoted by f (xi)and the real output vectors yi.
Operational Phase
The operational phase consists in the implementation of the execution the Multi-Model Predictor. After training the RBF-ANN, arises the chal- lenge of designing a mechanism for the Mode Recognition module to estimate and discover the current operation mode. For this module, a mode discovery based on the nearest neighbour rule using a variable queue of observations is proposed. The idea is based on the exploitation of the current and limited historical information to estimate the class as- sociated with class of the observed measurements using the Euclidean distance. The implementation consists in a queue W0 of variable size
that stores from m to m + τ − 1 measurements as follows in Equation 4.8,
W1 W2 .. . Wm+mod(t,τ ) = Yt−(m+mod(t,τ ))+1 Yt−(m+mod(t,τ ))+2 .. . Yt (4.8)
where W1 is the first element of the queue that stores a delayed mea-
surement Yt−(m+mod(t,τ ))+1, and Wm+mod(t,τ )stores the current measure-
ment Yt. Notice that W1is realigned each period defined by τ . In order
to simplify the notation, we define in Equations 4.9 and 4.10, the queue {W0}sup(t)and the prototype vectors {P
k}sup(t). {W0}sup(t)= {W 1, . . . , Wsup(t)}, (4.9) {Pk}sup(t)= {P1k, . . . , P k sup(t)}, (4.10)
where sup(t) = m+mod(t, τ ). The estimation of the class associated with the current observations in time t stored in {W0}sup(t)is performed using
the nearest neighbour rule defined by Equation 4.11
arg min k∈K {Pk} sup(t)− {Z}sup(t) (4.11)
where the class selected is the argument k that minimises the distance between the prototype {Pk}sup(t) and the current measurement stored
in {W0}sup(t).