Empirical (partial) autocorrelation functions

In practice, for a given economic or business time series yt, the autocorrelation and partial autocorrelation functions have to be estimated. The k-th order autocorrelation can be estimated by means of

ρk= ˆγk/ ˆγ0, (3.83)

where ˆγkis an estimate of the k-th order autocovariance, that is

ˆ γk = 1 T T t=k+1 (yt − ¯y)(yt−k − ¯y), (3.84)

where ¯y denotes the sample mean of yt, t = 1, 2, 3, . . . , T . The ˆρkfor k = 0, 1, 2 . . . form the empirical ACF [EACF]. As an illustration, consider ˆρkfor k= 1, . . . , 20 for annual differences of the log monthly US industrial production for the period 1959– 2012, as shown in Figure3.7. It is clear from this graph that the EACF values dies out quite quickly.

The sample equivalents ofψk, which form the empirical partial ACF [EPACF], can be obtained by applying ordinary least squares [OLS] to

yt− ¯y = η1(yt−1− ¯y) + · · · + ηk−1(yt−k+1− ¯y) + ψk(yt−k− ¯y) + vt, (3.85)

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 2 4 6 8 10 12 14 16 18 20

Figure 3.7: Empirical autocorrelation function of annual differences of the log monthly US industrial production (not seasonally adjusted), 1959–2012.

for any value of k, wherevt is not necessarily a white noise time series. Notice that (3.85) only renders an estimate of the k-th order partial autocorrelation ˆψkparameter. To obtain the complete EPACF, (3.85) should be estimated for k= 1, 2, 3, . . .

In principle, we may consider the estimated t-statistics for theψkin (3.85) to establish the significance of the EPACF values. For the EACF values we should consider the distribution of the ˆρk. This distribution can be shown to depend on the underlying true model, seeBox and Jenkins(1970), among others. In practice, we usually approximate the distribution of the ˆρk and ˆψk by setting their asymptotic standard errors to 1/

√

T .

Details of how good this approximation is can be found in, for example,Granger and Newbold(1986). In our book, we follow the usual approach by saying thatρk andψk are significant at the 5 percent level in case for their empirical counterparts it holds that the intervals ( ˆρk− 2/ √ T, ˆρk+ 2/ √ T ) and ( ˆψk− 2/ √ T, ˆψk+ 2/ √ T ), respectively,

do not include zero.

As an illustration of the EACF and EPACF, consider their first 12 values as these are given in Table3.1for the annual differences of log monthly revenue-passenger kilometres of European airlines, for the period 1994.1–2006.12. The E(P)ACFs are computed using all 156 observations in this period, and omitting the observations

Table 3.1: Empirical (partial) autocorrelation functions for annual differences of log monthly revenue-passenger kilometres of European airlines, 1994.1–2006.12

All observations Without 2001.9–2001.12

Lag EACF EPACF EACF EPACF

1 0.803∗ 0.803∗ 0.713∗ 0.713∗ 2 0.598∗ −0.131 0.478∗ −0.064 3 0.429∗ −0.023 0.299∗ −0.038 4 0.301∗ −0.012 0.090 −0.189∗ 5 0.269∗ 0.178∗ −0.001 0.061 6 0.266∗ 0.038 −0.048 −0.013 7 0.272∗ 0.043 −0.045 0.059 8 0.298∗ 0.093 0.047 0.136 9 0.242∗ −0.156 0.078 −0.043 10 0.152 −0.073 0.148 0.118 11 0.057 −0.067 0.187∗ −0.001 12 −0.069 −0.162 0.109 −0.121

Note: An asterisk indicates significance at the 5% level. The estimated standard error for the full

sample is 0.160, and 0.163 for the sample without the observations 2001.7–2001.12.

2001.9–2001.12. The first obvious feature of the EACF in Table3.1is that these four aberrant data points have a large effect on the EACF. In fact, ˆρ1 is 0.803 for the

complete sample, while it is 0.71 for the sample less the four last months of 2001. Furthermore, the EACF declines much faster towards zero for the interrupted sample. For the full sample, the empirical autocorrelations stay significant until lag 9. Hence it is difficult to suggest a AR type of model as there is no exponential decrease. On the contrary, the EACF and EPACF patterns of the interrupted sample seem to suggest the possible adequacy of an AR(1).

In practice, we usually do not go through all possible models that are indicated as possibly useful by the EACF and EPACF. In fact, the key issues are often (i) whether the EACF values die out sufficiently quickly, where sufficiency here is not a formal concept

but merely a rule based on experience, (ii) whether the EACF signals overdifferencing, and (iii) whether the EACF and EPACF show any significant and easily interpretable peaks at certain lags, preferably at short horizons. The main reason for this less formal approach in practice is that each variant of an ARMA model implies certain properties of the ACF and PACF, but given the fact that these functions have to be estimated, a given set of EACF and EPACF values may suggest a wealth of possibly useful models. Hence, usually we select a seemingly reasonable set of tentative models, that is, we pick values for p and/or q, then estimate model parameters and we apply diagnostic checks to see whether the models capture the dynamics of the time series sufficiently well. If so, we may employ additional criteria to select a final model, as discussed in Section3.4.

3.3 Estimation and diagnostic measures

r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r A useful specification strategy for ARMA time series starts with an inspection of the EACF and EPACF values, to check which values are significant such that reasonably simple ARMA model structures can be hypothesized, to estimate the parameters of the various models, and to investigate whether the estimated residuals can be viewed approximately as white noise. This strategy amounts to a subtle interplay between identification, estimation and modification, and practical experience is needed to get some skill. Usually, it does not make much sense to start off with a very large ARMA model and to simplify it by deleting insignificant parameters. The reason for this is that we are likely to encounter a situation in which parts of the AR and MA components cancel out. Intuitively, if data are generated from an AR(1) model, but we estimate the parameters in the ARMA(2,1) model

(1− α1L)(1− α2L)yt = (1 + θ1L)εt, (3.86) the true parameter values are α2= φ1, α1= 0, and θ1 = 0 (where α1 andα2 could

be interchanged, of course). However, (3.86) holds for any values ofα1 andθ1 with

α1 = −θ1 because then the model reduces to the true AR(1) specification. Hence,

we may expect estimation problems for the parameters, and also problems with the distribution of the t-test statistics forα1andθ1. Of course, when we only consider AR

models, we may start off with an AR( p∗) with p∗ large, and work downwards to an AR( p), where p is a smaller value than the initial p∗. Notice that this constitutes an additional advantage of AR models over ARMA models.

Once the parameters have been estimated, the residuals ˆεt are usually inspected for the presence of some remaining autocorrelation. Again this is in contrast to regression models based on cross-sectional data, as in case of ARMA models the results of the diagnostic checks can provide clear-cut suggestions for modification of the model.

In this section, we discuss methods for estimating the unknown parameters in AR and ARMA models. Other estimation routines can be found in the advanced literature on ARMA models, seeHamilton(1994), among others. Furthermore, we consider two often applied tests for correlation in the ˆεttime series. Usually several other diagnostic measures are applied to check the adequacy of the estimated model, including tests for the presence of aberrant observations, heteroskedasticity, and non-linearity, but these are discussed later in the relevant chapters.

In document Rwd11.Time.series.models.for.Business.and.Economic.forecasting.2.Edition (Page 69-73)