Chapter 3 Short-term Traffic Prediction Frameworks
3.2 Data smoothing
3.2.2 The SSA method
Singular Spectrum Analysis (SSA) is a data smoothing and de-noising method used in the analysis of time series (Broomhead & King, 1986). It is widely used in many fields such as hydrology (e.g., Sivapragasam et al. (2001); Simões et al. (2011)) and atmospheric and geophysical research (e.g., Ghil & Vautard (1991)) but has not been applied to short-term traffic prediction.
SSA that is a model-free, adaptive noise-reduction algorithm based on the Karhunen-Loeve transform (Sivapragasam et al., 2001) was first published by Broomhead & King (1986). It can be used as a data de-noising method by decomposing an original time series to a smoothed trend curve and a noise series
Page | 95 (Hassani, 2007). Mineva & Popivanov (1996) present a comprehensive description and discussion of the SSA method and identify a number of advantages of SSA compared to other data smoothing techniques. These advantages include the ability to characterise both trend and oscillatory components, the capability to reduce local noise, enhance pattern recognition and computational efficiently. Therefore, SSA is chosen as an example of data smoothing and de-noising methods in this research.
A detailed explanation of the SSA method can be found in Chapter 1 of Golyandina et al. (2001). Only one-dimensional real-valued time series is considered in the basic SSA algorithm. SSA is based on the singular-value decomposition of a specific matrix constructed upon time series (Zhigljavsky, 2010). The SSA methods can be summarised in the following four steps:
Step 1: Embedding
This step is an embedding step that transfers the original one-dimensional time series to a multi-dimensional series, which can form the trajectory matrix.
Let { } be an original real nonzero series, where N is the length of a time series. The embedding procedure forms the ( ) lagged vectors [ ] , where the value of is the
embedding dimension or called window length. This step uses embedding method in order to transfer an original series to a trajectory matrix, , - with the size of . A trajectory matrix is a Hankel matrix where all the elements along the diagonal are equal. Obviously, the newly-formed lagged vector is the row vector of this matrix. In other words, the trajectory matrix is written as
Page | 96
( ) ( ) (
) (3.1)
Step 2: Singular Value Decomposition (SVD)
This step uses Singular Value Decomposition (SVD) to change the trajectory matrix formed in the Step 1 into a decomposed trajectory matrix.
Applying SVD to the trajectory matrix, the matrix is decomposed into , where is a orthonormal matrix, is a square orthonormal matrix, and ( ) is a diagonal matrix. In this step, denotes the non-zero eigenvalues of in a decreasing order . The corresponding singular value of the trajectory matrix is √ ( ) and is the rank of . The diagonal matrix can be rewritten as
[ ]
[ ] [ ] [ ]
(3.2)
Therefore, the trajectory matrix can be written as
∑ (3.3)
where and are the left and right eigenvectors of the trajectory matrix. The element is called the ith eigentriple of the SVD.
Page | 97 The decomposed trajectory matrix will be reconstructed in this step.
This step is a grouping step and corresponds to splitting the matrices, computed at the SVD step, into several groups and summing the matrices within each group. The grouping procedure turns a partition of the set * + into the collection of disjoined subsets of * + , which is called eigentriple grouping. is a sum of .Thus, the expansion of can be written as
( )
( ) (3.4) Assume that there are only two groups of the eigentriples of the trajectory matrix, namely and , and , where is the entire set . Therefore,
, ∑ and ∑ .
Step 4: Reconstruction using diagonal averaging
A new time series of length is created by the grouped matrices in Step 3. The corresponding operation in this step uses diagonal averaging for recovery. It is a linear operation and maps the trajectory matrix of the initial series into the original series itself. In this way, a decomposition of the initial series into several additive components can be obtained.
The basic SSA algorithm can be summarised in two main stages: decomposition and reconstruction. The basic idea of the SSA approach is to undertake a spectral analysis of the raw input data in order to separate out high frequency “noisy” components thus allowing the remaining components to be reconstructed into a smoothed version of the original series. Step 1 and Step 2 are in the decomposition
Page | 98 stage; reconstruction stage includes Step 3 and Step 4. Figure 3.2 shows the outline and procedural steps of the SSA method described above (Golyandina et al., 2001).
Stage: decomposition Stage: reconstruction Time series X Embedding: Lagged Trajectory Matrix Tx Decomposition using SVD Grouping of components Reconstruction of time series
Figure 3.2: Flow-chart of a basic SSA method (adapted from Golyandina et al. (2001)) In this research, SSA introduced above is used in data smoothing step before a machine learning tool is applied to prediction. The original traffic time series can be divided into two parts: the smoothed series and the residuals. In Figure 3.3 it is shown a plot example of 24-hour time series traffic flow data, the smoothed part and its residuals using SSA.
Page | 99 Figure 3.3: Traffic data, smoothed series and residuals