Transfer function-noise model - RESEARCH METHODOLOGY

CHAPTER 3: RESEARCH METHODOLOGY

3.12 Transfer function-noise model

It is typically assumed that effect of the intervention model is gained from a

deterministic dummy variable. However, the effects of the safety measures

(interventions) may be gained from something other than a deterministic dummy variable

due to the fact that the factors which influence the level of the intervention may be

different from time to time. For instance, the effect of the safety helmet law on the number

and rate of road traffic casualties is influenced by the percentage of helmet usage. The

level of enforcement also contributes significantly to the effectiveness of the law. Hence,

it is assumed that the intervention variable can be any exogenous stochastic process. With

this assumption, the transfer function-noise model is suitable to estimate the impact of the

safety measures. In addition, the impact of legislations, standards, and road safety

programmes follows a gradual, permanent step function of the intervention model.

Consequently, the patterns follow the transfer function-noise model (Box & Jenkins,

1976), otherwise known as the dynamic regression model (Pankratz, 1991). The general

𝑌_𝑡 = 𝜇 + ∑𝜔𝑖(𝐵) 𝛿_𝑖(𝐵)𝐵𝑘𝑖 𝐼𝑖,𝑡 𝑖 + (1 − 𝜃𝑞𝐵) (1 − ∅_𝑝𝐵) (1 − 𝐵)𝑑 𝑎𝑡 (3.46) where: μ = Constant mean

𝐼_𝑖,𝑡 = ith input time series or a difference of the ith input series at time t 𝑘_𝑖 = Pure time delay for the effect of the ith input series

𝜔_𝑖(𝐵) = Numerator polynomial of the transfer function for the ith_{input series}

𝛿_𝑖(𝐵) = Denominator polynomial of the transfer function for the ith_{input series}

𝑎_𝑡 = white noise term.

The final part is the noise process that can be represented by the ARIMA model. The

initial impact is given by 𝜔0 while the long-term effect (steady-state gain) is given by

(𝜔₀− 𝜔1𝐵 − ⋯ − 𝜔𝑖𝐵𝑖)

(1 − 𝛿₁𝐵 − ⋯ − 𝛿_𝑖𝐵𝑖₎ .

In general, the transfer function can be written as (Yaffee & McGee, 2000):

𝑣(𝐵) = (𝜔0− 𝜔1𝐵 − 𝜔2𝐵 2_{− ⋯ − 𝜔} 𝑠𝐵𝑠) (1 − 𝛿₁𝐵 − 𝛿₂𝐵2_{− ⋯ − 𝛿} 𝑟𝐵𝑟) 𝐼_𝑡−𝑏 (3.47)

The order of the transfer function refers to the levels of b, s and r. The order of b

presents the delay or dead time, which is the time delay between incidences of changes

in input, It, and the apparent impact on response, Yt. The delay time b determines the pause

before the input begins to have an effect on the response variable. The order of the

regression is also represented by the values of s, which denotes the number of lags for

unpatterned spikes in the transfer function. The order of decay is also denoted by the value

of r. This parameter represents the patterned changes in the slope of the function. The

function. The denominator of the transfer function ratio consists of decay weights, r, in

which the time, t, varies from 1 to r. The magnitude of these weights controls the rate of

attenuation in the slope. If there is more than one decay rate, the rate of attenuation may

fluctuate.

There are several fundamental modelling strategies used in intervention analysis. In

this study, the impact of the intervention is modelled on the entire data series before the

residual noise is modelled. The SAS software is used to execute the transfer function-

noise model, which involves determining the orders of the transfer function (b, s, r) and

the noise process. The steps taken for intervention modelling are listed as follows (Box

et al., 1994; Brocklebank & Dickey, 2003 and Montgomery et al., 2008).

Step 1. Prewhiten the It series to attain stationary series, e.g., by some combination of

Box-Cox and differencing transformation and generate the ARMA model. This

step is similar to the step involved in ARIMA modelling.

Step 2. Obtain the estimates of 0 and 1.

Step 2. Determine the orders of the transfer function, b, s and r. Tentatively use the

sample cross-correlations function (CCF) or refer to the basic transfer function

model structures shown in Figure 3.8. Box et al. (1994) have provided the

details on calculating the CCF and the way to determine the orders of b, s and

r based on the CCF. In this study, the CCF diagram from the software and the

basic transfer function model structures are used to determine the orders.

Step 4. Model the noise in order to determine ARMA(p, q).

Step 5. Fit the overall model to attain the representation given in Equation (3.46).

Step 6. Use SBC to select the best model among the competing models.

Step 7. Check the adequacy of the model with the lowest SBC. If it is adequate, the

process using the model with the second lowest SBC and so forth until a model

that fulfils the adequacy check is attained.

Step 8. Use the best model for forecasting.

r, s, b Impulse response Step response

003 013 023 103 113 123

Figure 3.8: Examples of impulse and step response functions Source: Adapted from Box et al. (1994)

r, s, b Impulse response Step response 203 213 223 Figure 3.8, continued 3.13 State-space model

The state-space model is based on the Markov property, which implies the

independence of the future of a process from its past, given the state of the present system.

In a state-space model, the state of the process at a present time contains all of the past

information that is required to predict the process behaviour in the future. According to

Yaffee and McGee (2000), state-space model is a model that consists of stationary

multivariate time series processes with dynamic interactions. This model is constructed

from two basic equations. The state transition equation consists of a state vector of

auxiliary variables as a function of a transition matrix and an input matrix, whereas the

measurement equation consists of a state vector which is canonically extracted from

highlighted that state-space models are an extremely rich class of models for time series,

which are well beyond the linear ARIMA and classical decomposition models.

A state-space model for a multivariate time series (Yt) consists of two equations. The

first equation is given by (Brockwell & Davis, 2002):

𝑌_𝑡 = 𝐺_𝑡 𝑋_𝑡+ 𝑊_𝑡 𝑡 = 1, 2, 3, … (3.48)

This equation is known as the observation equation which expresses the w-

dimensional observation Yt as a linear function of a v-dimensional state variable Xt (the

explanatory variables that have strong relationship to number of fatalities) and noise. {Wt}

∼ WN(0, {Rt}) and {Gt} is a sequence of w × v matrices. The second equation is known

as the state equation or state vector that determines the state Xt+1 at time t + 1 in terms of

the previous state Xt and a noise term. The state vector is given by:

𝑋_𝑡+1 = 𝐹_𝑡 𝑋_𝑡+ 𝑉_𝑡 𝑡 = 1, 2, 3, … (3.49)

where {Ft} is a sequence of v × v matrices. It shall be noted that {Vt} ∼ WN(0, {Qt}),

and {Vt} is uncorrelated with {Wt} (i.e. E(Wt𝑉𝑠′) = 0 for all s and t). It is assumed that the

initial state X1 is uncorrelated with all of the noise terms {Vt} and {Wt}.

The SAS software is used to determine the state vector and the state-space model

representation. This procedure is similar to the modelling strategy proposed by Akaike

(1976), which involves a canonical correlation analysis to identify the state-space model.

Firstly, a sequence of unrestricted vector autoregressive (VAR) models are fitted and the

AIC for each model is computed. The vector autoregressive models are estimated using

model which yields the smallest AIC is chosen as the order (number of lags into the past)

for use in the canonical correlation analysis.

The elements of the state vector are then determined from a sequence of canonical

correlation analyses of the sample autocovariance matrices using the selected order. The

sample canonical correlations of the past are computed with an increasing number of steps

into the future. The variables that yield significant correlations are added to the state

vector whereas those that yield insignificant correlations are excluded. Once the state

vector is determined, the state-space model is fitted to the data. The free parameters in

the F, G, and W matrices are estimated using the approximate maximum likelihood.

In summary, the following steps are taken to develop the state-space model.

Step 1. Identify the independent and explanatory variables.

Step 2. Perform stepwise regression analysis to obtain the explanatory variables that

can be used to represent the model without collinearity problems.

Step 3. Prewhiten the data series to attain stationary series, e.g., by some combination

of Box-Cox and differencing transformation.

Step 4. Execute the STATESPACE procedure in the SAS program to determine the

state vector and the state-space model representations.

Step 5. Select the state-space model representation and preliminary estimates.

Step 6. Estimate the final state-space model representation.

Step 7. Use the final model for forecasting.

In state-space modelling, the data series should be stationary. Checks for stationarity

are conducted using the Augmented Dickey-Fuller (ADF) test. The differencing

3.14 Chapter summary

ARIMA, transfer function-noise and state-space models will be used in this study. The

multiple regression is used to select the explanatory variables that are correlated

significantly with the number of road traffic fatalities. Then, these variables are the input

variables for the state-space model. The effectiveness of a road safety measure is checked

with the ARIMA and transfer function-noise model. Whilst, for forecasting the fatalities

up to year 2020, the ARIMA, transfer function-noise and state-space models are

employed, respectively.

The software packages are used to analyse the data, namely Microsoft Excel (2010),

Minitab (Version 16), Statgraphics and SAS (Version 9). Minitab, Statgraphics and SAS

are leading software packages in statistical analysis. Minitab is convenient for univariate

analysis whereas SAS package is capable of performing numerous statistical operations.

In addition, Eviews (Version 6) is used to crosscheck the unit root test results and is more

likely to produce consistent results.

The forecasting period is chosen to be 8 years, which can be classified as medium-

term forecast. In this study, it is deemed necessary to develop and evaluate several

different models in order to determine the best forecasting model. The forecasting error

serves as the basis for model construction and evaluation. Extrapolation is conducted to

generate data from the observed historical values in order to check the accuracy of models

and determine forecasting errors. Forecasting errors consist of the sum of the absolute

values of errors, average forecast error and the sum of squared errors, which are used to

compare the accuracy of a particular method from the alternative forecasting methods

considered. These error patterns will reveal whether the method is suitable for forecasting,

In order to make a series stationary, either one of the following techniques can be used

such as log transformation, power transformation, or Box Cox transformation techniques.

Box-Cox transformation is the most commonly used variance-stabilizing transformation

technique. When there is high autocorrelation in the data, the model itself changes rather

than the levels, which often eliminates serial correlation. In this study, differencing is

considered when the probability obtained from the ADF test is greater or equal to 0.1.

Further analysis is required to select a satisfactory model. This can be done by computing

the mean square error for each tentative model. A model which generates the smallest

mean square error is generally preferable. In this study, the AIC and SBC are used to

select the best model. The best model is the one which has minimum AIC and SBC. In

most cases, the AIC and SBC will produce the same results. However, since the SBC

criterion imposes a greater penalty for a number of parameters compared to the AIC

In document A time series analysis of road traffic fatalities in Malaysia / Yusria Darma (Page 187-196)