SINGULAR SPECTRUM ANALYSIS – HYBRID FORECASTING
METHODS WITH APPLICATION TO AIR TRANSPORT DEMAND
K. Adjenughwure, Delft University of Technology, Transport Institute, Ph.D. candidate V. Balopoulos, Democritus Thrace University, Dep. of Civil Engineering, Associate Professor G. Botzoris, Democritus Thrace University, Dep. of Civil Engineering, Assistant Professor
TITLE OF THE SLIDE
TRANSPORTATION DEMAND FORECASTING
•
Transportation demand forecasting is the process of estimating
the number of people or vehicles that will use a specific
transport facility over a particular time interval.
•
Accurate forecasting of demand is particularly important in air
transport, influencing decisions such as ticket pricing, operation
of new or closing of existing routes, aircraft purchase, building
of new or abandoning of old terminals, etc.
•
The numerous methods that have been developed for or
employed in air transport demand forecasting may be classified
as qualitative (such as market surveys, Delphi method, and
expert meetings), or quantitative (such as econometric, time
series, etc.).
TITLE OF THE SLIDE
•
Statistical time-series prediction methods, such as Autoregressive
Integrated Moving Average, have long been preferred for modeling
of airport passenger demand, but recently artificial intelligence
methods, such as Artificial Neural Networks, Fuzzy Logic, and the
Adaptive Neuro-Fuzzy Inference System, have gained recognition
and have been applied to the same task.
•
All time-series prediction methods are reasonably accurate, but are
inherently sensitive to noise. To increase the accuracy of
time-series prediction, various methods have been developed to remove
noise from raw data and to decompose any time series into its
trend, its oscillatory components and its noise components. One of
these methods is the Singular Spectrum Analysis which
decomposes any time series into various components.
TITLE OF THE SLIDE
•
The Singular Spectrum Analysis (SSA) has been combined with
other classical time-series prediction methods to help improve
their results. Most related research use the SSA as a noise
removal. A very recent hybrid approach, however, is to first use
SSA to decompose a time series into many component time
series (trend, seasonal and noise), then predict each non-noise
component separately by a chosen time-series prediction model,
and finally employ SSA to aggregate the predicted components
into predictions for the original time series.
SINGULAR SPECTRUM ANALYSIS
trend cyclical variation
Y
t= T
t+ C
t+ S
t+R
t seasonal variation random variationTITLE OF THE SLIDE
0 1000 2000 3000 4000 5000 6000 7000 8000 Jan -05 M ay -05 Sep -05 Jan -06 M ay -06 Sep -06 Jan -07 M ay -07 Sep -07 Jan -08 M ay -08 Sep -08 Jan -09 M ay -09 Sep -09 Jan -10 M ay -10 Sep -10 Jan -11 M ay -11 Sep -11 Jan -12 M ay -12 Sep -12 Jan -13 M ay -13 Sep -13Heathrow airport, monthly passenger demand (thousands)
TIME-SERIES OF A VARIABLE – SINGLE DECOMPOSITION
=
0 1000 2000 3000 4000 5000 6000 7000 8000 Jan -05 Sep -05 M ay -06 Jan -07 Sep -07 M ay -08 Jan -09 Sep -09 M ay -10 Jan -11 Sep -11 M ay -12 Jan -13 Sep -13 TREND+
Jan -05 Jul -05 Jan -06 Jul -06 Jan -07 Jul -07 Jan -08 Jul -08 Jan -09 Jul -09 Jan -10 Jul -10 Jan -11 Jul -11 Jan -12 Jul -12 Jan -13 Jul -13 OSCILLATIONTITLE OF THE SLIDE
•
The contribution of this paper is to show that SSA decomposition of a
time series and the subsequent prediction of its components can
improve forecasting results. ANFIS was chosen as a method to allow
easy comparison with the work of Xiao et al. (2014). We demonstrate
this fact by using the statistical data of two international airports
(Heathrow, London and El. Venizelos, Athens), with very different
traffic volume and characteristics.
SCOPE OF THE PAPER
2005 2007 2009 2011 2013 4,000 4,500 5,000 5,500 6,000 6,500
7,000 Passengers (in thousands), LHR airport
Training Testing
TITLE OF THE SLIDE
•
ANFIS = ANN + FIS
•
The acronym ANFIS derives its name from
adaptive neuro-fuzzy
inference system
. Using a given input/output data set, the anfis
constructs a Fuzzy Inference System (FIS) whose membership function
parameters are tuned (adjusted) using either a back propagation
algorithm (i.e. a Artificial Neural Network) alone or in combination
with a least squares type of method. This adjustment allows your
fuzzy systems to learn from the data they are modeling.
ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM (ANFIS)
A1 A2 B1 B2 x y
Layer 1 Layer 2 Layer 3 Layer 4 Layer 5
1 2 1 w 2 w 1 w 2 w 1 1 w f 2 2 w f f x y Layer 0
Layer 1: Fuzzification Layer Layer 2: Rule Layer
Layer 3: Normalization Layer Layer 4: Defuzzification Layer Layer 5: Summation Layer
TITLE OF THE SLIDE
•
To improve the generalization capability of an ANFIS model, a method
known as cross-validation is used. In this method, all the available
data is split into three sets: a training set, a validation or checking set
and a testing set. The data in the training set is used to train the
model while the validation data set is used to prevent the model from
overfitting by monitoring the error in their output. The training of the
model is stopped when the error of the validation set is minimized.
Note that the validation data is only used after the model have been
trained and is not part of the training. Thus this can be considered as
an independent check on how well the trained model is doing. After
training and validation, the test set is then used as a second
independent test of the generalization ability of the model. The final
model chosen is the model that gives the minimum error in the
output of the test set.
TITLE OF THE SLIDE
•
The first stage is the decomposition of the series and the second stage is
the reconstruction of the decomposed series to get the original series.
•
The three parameters to be selected for the SSA algorithm are the
window length L, the number of elementary matrices to use for the
reconstruction r, and the number of groups m. The most important
parameter is the window length L. The other two parameters can be
omitted, depending on the way the SSA will be used (for pure
decomposition only the window length is required, and for noise removal
the grouping stage can be omitted).
•
The window length is the only parameter needed for the decomposition
of the time series. There is currently no algorithm for selecting the
window length but many researchers have suggested choosing L<(N/2) as
a general rule, where N is the number of available time series data.
1
TITLE OF THE SLIDE
•
For a time series data with a known period T, Golyandina et al.
(2001) recommend choosing L such that L/T is an integer. For
instance, if the time series data is seasonal and the period is
4, then choosing L to be multiples of 4 (4, 8, 12, 16,...) will help
capture the periodic components with periods 4. If the series has
multiple periods (T
1, T
2, T
3…), then L should be chosen such that
L/T
iis an integer for all i.
•
To extract only a trend component, L should be chosen large
enough so that the trend is
separable
from other components
such as the noise but not too large because large values of L
mix-up the trend with other components. In conclusion, L should be
chosen such that all the components from the decomposition of
the time series are separable or non-correlated.
1
TITLE OF THE SLIDE
• The proposed hybrid models combine the SSA with ANFIS. The goal is to
improve the performances of the ANFIS model by first decomposing the time series into a sum of simple components (time series) which are easier to predict using these methods and then combining the predictions of each component.
THE HYBRID MODELS
GC1 GC2 GCm prediction with ANFIS PGC1 PGC2 PGCm Summation with Singular Spectrum Analysis (SSA) Predicted time series Grouped
components componentsPredicted
Original
time series Decompositionwith Singular Spectrum Analysis (SSA) PC1 PC2 PC3 PC4 PCL-1 … PCL … Time series components prediction with ANFIS prediction with ANFIS
1
TITLE OF THE SLIDE
LHR
ATH
THE TIME SERIES CHARACTERISTICS OF THE LONDON
HEATHROW (LHR) AND ATHENS (ATH) AIRPORT
1
TITLE OF THE SLIDE
1
TITLE OF THE SLIDE
1
TITLE OF THE SLIDE
COMPARISON OF RESULTS BETWEEN PURE ANFIS AND
HYBRID SSA – ANFIS MODELS
1
TITLE OF THE SLIDE
The results of the prediction of the pure ANFIS model re-emphasise the advantages in using the hybrid models. Although the pure models did not perform well on
average on two airports with MAPE between 4.38% and 8.69%, the hybrid SSA– ANFIS models gave far better predictions with MAPE less than 2% for both airports. In terms of the RMSE, the predictions made by the hybrid models were an average 5.3 times better than the pure ANFIS. Also the coefficient of determination R2 had
an average improvement of 21% across both airports
IMPROVEMENT OF THE FORECASTING ABILITY BY USING
THE HYBRID SSA - ANFIS MODEL
Statistics Pure ANFIS model Hybrid SSA –
ANFIS model Airport Root Mean Square Error (RMSE) 335.49 89.68 Heathrow
112.96 16.26 Athens Mean Absolute Error (MAE) 263.99 72.27 Heathrow 73.70 14.32 Athens Mean Absolute
Percentage Error (MAPE)
4.38 1.21 Heathrow 8.69 1.52 Athens Coefficient of determination, R2 0.77 0.98 Heathrow 0.85 0.98 Athens
1
TITLE OF THE SLIDE
1
TITLE OF THE SLIDE
• Although econometric methods are currently being used to forecast
transport demand, the success of time series forecasting models, especially for short-term demand forecasting, has shifted research focus to
development of methods to improve the forecasting ability of these
models. Consequently, specialized statistical models like ARIMA and more recently artificial intelligence (AI) methods like ANN and ANFIS have been applied successfully to forecast air transport demand time series.
• Despite the success of AI models, their poor performance when used to predict noisy and seasonal time-series data, like monthly passenger
demand of airports, has necessitated better forecasting models that can forecast in the presence of noise and also exploit the seasonality of the data to improve forecasting results. Methods like seasonal ARIMA have been
used to forecast seasonal data, while Singular Spectrum Analysis (SSA) has been used as a noise removal tool to forecast noisy data.
1
TITLE OF THE SLIDE
• In this paper, hybrid models that combine SSA and ANFIS have been calibrated to forecast the passenger demand of two international
airports, London Heathrow and Athens. Forecast results have shown that decomposing a time series by means of SSA into simpler
components, predicting the future values of the components using any established prediction method, and then summing the predictions using SSA, can greatly improve forecasting performance.
• The main reasons for the remarkably improved forecasting achieved by the SSA-hybrid prediction methods are the simplicity, since the component time series are simpler and, hence, easier to predict, the exploitation of
seasonality, since each seasonal component is predicted separately and the noise removal, since noise in the data is reduced by removing components
with no seasonality or no significant contribution.