CHAPTER 3: RESEARCH METHODOLOGY
3.12 Transfer function-noise model
It is typically assumed that effect of the intervention model is gained from a
deterministic dummy variable. However, the effects of the safety measures
(interventions) may be gained from something other than a deterministic dummy variable
due to the fact that the factors which influence the level of the intervention may be
different from time to time. For instance, the effect of the safety helmet law on the number
and rate of road traffic casualties is influenced by the percentage of helmet usage. The
level of enforcement also contributes significantly to the effectiveness of the law. Hence,
it is assumed that the intervention variable can be any exogenous stochastic process. With
this assumption, the transfer function-noise model is suitable to estimate the impact of the
safety measures. In addition, the impact of legislations, standards, and road safety
programmes follows a gradual, permanent step function of the intervention model.
Consequently, the patterns follow the transfer function-noise model (Box & Jenkins,
1976), otherwise known as the dynamic regression model (Pankratz, 1991). The general
ππ‘ = π + βππ(π΅) πΏπ(π΅)π΅ππ πΌπ,π‘ π + (1 β πππ΅) (1 β β ππ΅) (1 β π΅)π ππ‘ (3.46) where: ΞΌ = Constant mean
πΌπ,π‘ = ith input time series or a difference of the ith input series at time t ππ = Pure time delay for the effect of the ith input series
ππ(π΅) = Numerator polynomial of the transfer function for the ith input series
πΏπ(π΅) = Denominator polynomial of the transfer function for the ith input series
ππ‘ = white noise term.
The final part is the noise process that can be represented by the ARIMA model. The
initial impact is given by π0 while the long-term effect (steady-state gain) is given by
(π0β π1π΅ β β― β πππ΅π)
(1 β πΏ1π΅ β β― β πΏππ΅π) .
In general, the transfer function can be written as (Yaffee & McGee, 2000):
π£(π΅) = (π0β π1π΅ β π2π΅ 2β β― β π π π΅π ) (1 β πΏ1π΅ β πΏ2π΅2β β― β πΏ ππ΅π) πΌπ‘βπ (3.47)
The order of the transfer function refers to the levels of b, s and r. The order of b
presents the delay or dead time, which is the time delay between incidences of changes
in input, It, and the apparent impact on response, Yt. The delay time b determines the pause
before the input begins to have an effect on the response variable. The order of the
regression is also represented by the values of s, which denotes the number of lags for
unpatterned spikes in the transfer function. The order of decay is also denoted by the value
of r. This parameter represents the patterned changes in the slope of the function. The
function. The denominator of the transfer function ratio consists of decay weights, ο€r, in
which the time, t, varies from 1 to r. The magnitude of these weights controls the rate of
attenuation in the slope. If there is more than one decay rate, the rate of attenuation may
fluctuate.
There are several fundamental modelling strategies used in intervention analysis. In
this study, the impact of the intervention is modelled on the entire data series before the
residual noise is modelled. The SAS software is used to execute the transfer function-
noise model, which involves determining the orders of the transfer function (b, s, r) and
the noise process. The steps taken for intervention modelling are listed as follows (Box
et al., 1994; Brocklebank & Dickey, 2003 and Montgomery et al., 2008).
Step 1. Prewhiten the It series to attain stationary series, e.g., by some combination of
Box-Cox and differencing transformation and generate the ARMA model. This
step is similar to the step involved in ARIMA modelling.
Step 2. Obtain the estimates of ο·0 and ο€1.
Step 2. Determine the orders of the transfer function, b, s and r. Tentatively use the
sample cross-correlations function (CCF) or refer to the basic transfer function
model structures shown in Figure 3.8. Box et al. (1994) have provided the
details on calculating the CCF and the way to determine the orders of b, s and
r based on the CCF. In this study, the CCF diagram from the software and the
basic transfer function model structures are used to determine the orders.
Step 4. Model the noise in order to determine ARMA(p, q).
Step 5. Fit the overall model to attain the representation given in Equation (3.46).
Step 6. Use SBC to select the best model among the competing models.
Step 7. Check the adequacy of the model with the lowest SBC. If it is adequate, the
process using the model with the second lowest SBC and so forth until a model
that fulfils the adequacy check is attained.
Step 8. Use the best model for forecasting.
r, s, b Impulse response Step response
003 013 023 103 113 123
Figure 3.8: Examples of impulse and step response functions Source: Adapted from Box et al. (1994)
r, s, b Impulse response Step response 203 213 223 Figure 3.8, continued 3.13 State-space model
The state-space model is based on the Markov property, which implies the
independence of the future of a process from its past, given the state of the present system.
In a state-space model, the state of the process at a present time contains all of the past
information that is required to predict the process behaviour in the future. According to
Yaffee and McGee (2000), state-space model is a model that consists of stationary
multivariate time series processes with dynamic interactions. This model is constructed
from two basic equations. The state transition equation consists of a state vector of
auxiliary variables as a function of a transition matrix and an input matrix, whereas the
measurement equation consists of a state vector which is canonically extracted from
highlighted that state-space models are an extremely rich class of models for time series,
which are well beyond the linear ARIMA and classical decomposition models.
A state-space model for a multivariate time series (Yt) consists of two equations. The
first equation is given by (Brockwell & Davis, 2002):
ππ‘ = πΊπ‘ ππ‘+ ππ‘ π‘ = 1, 2, 3, β¦ (3.48)
This equation is known as the observation equation which expresses the w-
dimensional observation Yt as a linear function of a v-dimensional state variable Xt (the
explanatory variables that have strong relationship to number of fatalities) and noise. {Wt}
βΌ WN(0, {Rt}) and {Gt} is a sequence of w Γ v matrices. The second equation is known
as the state equation or state vector that determines the state Xt+1 at time t + 1 in terms of
the previous state Xt and a noise term. The state vector is given by:
ππ‘+1 = πΉπ‘ ππ‘+ ππ‘ π‘ = 1, 2, 3, β¦ (3.49)
where {Ft} is a sequence of v Γ v matrices. It shall be noted that {Vt} βΌ WN(0, {Qt}),
and {Vt} is uncorrelated with {Wt} (i.e. E(Wtππ β²) = 0 for all s and t). It is assumed that the
initial state X1 is uncorrelated with all of the noise terms {Vt} and {Wt}.
The SAS software is used to determine the state vector and the state-space model
representation. This procedure is similar to the modelling strategy proposed by Akaike
(1976), which involves a canonical correlation analysis to identify the state-space model.
Firstly, a sequence of unrestricted vector autoregressive (VAR) models are fitted and the
AIC for each model is computed. The vector autoregressive models are estimated using
model which yields the smallest AIC is chosen as the order (number of lags into the past)
for use in the canonical correlation analysis.
The elements of the state vector are then determined from a sequence of canonical
correlation analyses of the sample autocovariance matrices using the selected order. The
sample canonical correlations of the past are computed with an increasing number of steps
into the future. The variables that yield significant correlations are added to the state
vector whereas those that yield insignificant correlations are excluded. Once the state
vector is determined, the state-space model is fitted to the data. The free parameters in
the F, G, and W matrices are estimated using the approximate maximum likelihood.
In summary, the following steps are taken to develop the state-space model.
Step 1. Identify the independent and explanatory variables.
Step 2. Perform stepwise regression analysis to obtain the explanatory variables that
can be used to represent the model without collinearity problems.
Step 3. Prewhiten the data series to attain stationary series, e.g., by some combination
of Box-Cox and differencing transformation.
Step 4. Execute the STATESPACE procedure in the SAS program to determine the
state vector and the state-space model representations.
Step 5. Select the state-space model representation and preliminary estimates.
Step 6. Estimate the final state-space model representation.
Step 7. Use the final model for forecasting.
In state-space modelling, the data series should be stationary. Checks for stationarity
are conducted using the Augmented Dickey-Fuller (ADF) test. The differencing
3.14 Chapter summary
ARIMA, transfer function-noise and state-space models will be used in this study. The
multiple regression is used to select the explanatory variables that are correlated
significantly with the number of road traffic fatalities. Then, these variables are the input
variables for the state-space model. The effectiveness of a road safety measure is checked
with the ARIMA and transfer function-noise model. Whilst, for forecasting the fatalities
up to year 2020, the ARIMA, transfer function-noise and state-space models are
employed, respectively.
The software packages are used to analyse the data, namely Microsoft Excel (2010),
Minitab (Version 16), Statgraphics and SAS (Version 9). Minitab, Statgraphics and SAS
are leading software packages in statistical analysis. Minitab is convenient for univariate
analysis whereas SAS package is capable of performing numerous statistical operations.
In addition, Eviews (Version 6) is used to crosscheck the unit root test results and is more
likely to produce consistent results.
The forecasting period is chosen to be 8 years, which can be classified as medium-
term forecast. In this study, it is deemed necessary to develop and evaluate several
different models in order to determine the best forecasting model. The forecasting error
serves as the basis for model construction and evaluation. Extrapolation is conducted to
generate data from the observed historical values in order to check the accuracy of models
and determine forecasting errors. Forecasting errors consist of the sum of the absolute
values of errors, average forecast error and the sum of squared errors, which are used to
compare the accuracy of a particular method from the alternative forecasting methods
considered. These error patterns will reveal whether the method is suitable for forecasting,
In order to make a series stationary, either one of the following techniques can be used
such as log transformation, power transformation, or Box Cox transformation techniques.
Box-Cox transformation is the most commonly used variance-stabilizing transformation
technique. When there is high autocorrelation in the data, the model itself changes rather
than the levels, which often eliminates serial correlation. In this study, differencing is
considered when the probability obtained from the ADF test is greater or equal to 0.1.
Further analysis is required to select a satisfactory model. This can be done by computing
the mean square error for each tentative model. A model which generates the smallest
mean square error is generally preferable. In this study, the AIC and SBC are used to
select the best model. The best model is the one which has minimum AIC and SBC. In
most cases, the AIC and SBC will produce the same results. However, since the SBC
criterion imposes a greater penalty for a number of parameters compared to the AIC