Discrete Derivative as a Feature Extraction Method

4.2 Multi-Model Forecasting Using RBF-ANN with an On-line

4.2.1 Discrete Derivative as a Feature Extraction Method

The Feature Extraction module performs the task of transforming raw data into a set of useful features (GE06). The function of the feature extraction is to improve the classification performance of the data. The discrete derivative (or differentiation) of the time series is implemented as feature extraction method described by

where Yt0is given by the difference of the data at time t at lag t − 1.

The effect of this transformation consists in removing the mean and trend taking the relative increments of the time series used for the mode identification (MG08).

4.2.2 On-line Mode Recognition for the Multi-Model Pre-

dictor Approach

The proposed methodology for building the Multi-Model Predictor is designed to use machine learning regression methods that learn a model that associates the input with the output vectors from a subset of samples (AYK+_{11). In order to build the Multi-Model Predictor, the specific algo-}

rithms for the MMP training and On-line Mode Recognition with the Multi-Model Predictor (RBFMMP+OR) implementation are described next.

Preparation Phase

In the preparation phase is performed the training of the Multi-Model predictor. This phase involves the Decomposition, Identification and

Trainingsteps performed off-line. The implementation of each step in- volved is described next.

Decomposition:

This step decomposes the time series for the identification of the different behavior patterns. The decomposition starts assuming the availability of an univariate time series described by Equation 4.3,

Y = {Y1, . . . , Yi, . . . , Yn} (4.3)

where Y1, Yiand Ynare the first, the i-th and the last element of the time

series respectively. The time series Y is decomposed in samples S as presented in Equation 4.4.

where N0 is the total number of samples defined by bn−h−m+1τ c and Si

is the sample obtained splitting the original time series Y according to Equation 20

Si= {Y(τ ·i−m+1), . . . , Yτ ·i, . . . , Yτ ·i+h} (4.5)

where m is the number of observations, h is the desired prediction hori- zon, and τ is the period parameter that defines a split point that is ap- plied every τ number of displacements. The τ displacements are performed by a sliding window of size m + h inside each sample Siin order

to construct the training samples TS. A graphical description of the data organisation used for the construction of the training set is shown in the Figure 20, where S1and S2are samples of size m + h + τ − 1 that allows

τdisplacements of the sliding window inside each sample.

m h

Figure 20:Time Series processing data

Identification and training

The objective of the identification analysis is to detect the number of different modes in the samples S. The samples S are classified into a suit- able number of classes k using a clustering algorithm to group them according to their similarity. Then, a local forecast model is trained with classified data for each specific mode. In order to choose a number of clusters k, the classification is validated by maximizing the mean silhou- ette coefficient shown in Equation 3.89. When the Feature Extraction

module is implemented, the Pattern Classifier uses the samples S0 that are transformed in the same feature space to classify S. After the classification of S, labeled samples LSi= {Ci, Si} are obtained where Ci ∈ K

are the possible labels of Ciwith K = {1, . . . , k}. Then, the class informa-

tion is incorporated using extended input vectors that include the m last observations concatenated with the codified class information. The pro- cedure for generating training samples TS is implemented in the Data

Formattingmodule. To generate the training samples TS, a sliding window of size m + h is used to obtain from the labeled samples LS the training samples TS = {{C0_{, Input}, y}}_{where C}0

i = {ICi,1, . . . , ICi,k} is the codification of the class in ci using the Ci − th row of the identity

matrix Ik×kas shown in:

    k1→ k01 .. . kk → k0k     =     1 . . . 0 .. . . .. ... 0 . . . 1     k,k = Ik,k (4.6)

The vector Inputi = {Si,j, . . . , Si,j+m−1} is a sequence of m inputs

and yi = {Si,j+m, . . . , Si,j+m+h} is the output or target vector of size

hwhere j = {1, . . . , τ } is the index that moves the sliding window τ times composing the training set by extending the each input vector to xi = {Ci, Inputi} obtaining a set of training samples denoted by TS =

{xi, yi}N

i .

Training

The machine learning method is trained using the training samples TS to optimize the objective function

minimise w N0 X i=1 (f (xi) − yi)2 (4.7)

where w is the set of weights that minimises the sum of the squared errors between the estimated output by the neural network denoted by f (xi)and the real output vectors yi.

Operational Phase

The operational phase consists in the implementation of the execution the Multi-Model Predictor. After training the RBF-ANN, arises the chal- lenge of designing a mechanism for the Mode Recognition module to estimate and discover the current operation mode. For this module, a mode discovery based on the nearest neighbour rule using a variable queue of observations is proposed. The idea is based on the exploitation of the current and limited historical information to estimate the class associated with class of the observed measurements using the Euclidean distance. The implementation consists in a queue W0 _{of variable size}

that stores from m to m + τ − 1 measurements as follows in Equation 4.8,

      W1 W2 .. . Wm+mod(t,τ )       =       Yt−(m+mod(t,τ ))+1 Yt−(m+mod(t,τ ))+2 .. . Yt       (4.8)

where W1 is the first element of the queue that stores a delayed mea-

surement Yt−(m+mod(t,τ ))+1, and Wm+mod(t,τ )stores the current measure-

ment Yt. Notice that W1is realigned each period defined by τ . In order

to simplify the notation, we define in Equations 4.9 and 4.10, the queue {W0}sup(t)_{and the prototype vectors {P}

k}sup(t). {W0_}sup(t)_{= {W} 1, . . . , Wsup(t)}, (4.9) {Pk}sup(t)= {P1k, . . . , P k sup(t)}, (4.10)

where sup(t) = m+mod(t, τ ). The estimation of the class associated with the current observations in time t stored in {W0_}sup(t)_{is performed using}

the nearest neighbour rule defined by Equation 4.11

arg min k∈K {Pk} sup(t)_{− {Z}}sup(t) (4.11)

where the class selected is the argument k that minimises the distance between the prototype {Pk}sup(t) and the current measurement stored

in {W0}sup(t)_.

In document Time series forecasting based on classification of dynamic patterns (Page 82-87)