Autoregressive moving-average (ARMA) pro cesses

Replacing the wn errors of an AR model by an MA term defines the ‘mixed’ ARMA model (autoregressive moving-average), which is an important gen- eralization of the AR and MA classes:

Xt=µ+φ1Xt−1+. . .+φpXt−p+εt+θ1εt−1+. . .+θqεt−q. (23)

The thus defined process is formally denoted as ARMA(p, q). It inherits is properties from its two components.

Property. The ARMA process (23) is asymptotically stationary (stable), if all zeros of the AR polynomial φ(z) = 1₋φ1z−. . . φpzp have modulus

larger than one. It is uniquely defined, if all zeros of the MA polynomial

θ(z) = 1 +θ1z+. . .+θqzq have modulus ≥ 1 and if the polynomials φ(z)

and θ(z) have no common zeros.

Example. The ARMA(1,1) process Xt = φXt−1 +εt+θεt−1 has the

polynomials φ(z) = 1 ₋φz and θ(z) = 1 +θz. For φ+θ = 0 there is a common root at 1/φ. It is easily seen that the process is simply wn in this case. Therefore, neither φ nor θ can be determined empirically. Problems in estimation, particularly failure of convergence in non-linear optimization steps, of ARMA models are often rooted in nearly ‘cancelling’ roots.

The ACF of an ARMA process is not as easily determined as the ACF of an MA process. As an exercise, we recommend doing it for the ARMA(1,1). It follows directly from their definition and construction that neither ACF nor PACF can cut off. Because for stable processes the series of (Wold) MA coefficients must converge quadratically, it is easily seen that both the ACF and the PACF must decrease to zero for large values. This suggests to classify time series as mixed ARMA, if neither the ACF nor the PACF cuts off. In finite samples, however, the distinction becomes often blurred. Furthermore, it is quite difficult to determine the correct orderspandqfrom the correlogram for the mixed ARMA model.

For the identification and estimation of ARMA models, various sugges- tions are available. According to Box&Jenkins, one may try the following pattern.

Example. The data were actually generated by the ARMA process

Xt = 0.5Xt−1 +εt+ 0.8εt−1. Estimation yields the correlogram and partial

correlogram in Figures 17 and 18.

Figure 17: Correlogram of a time series with 100 observations that was generated from the process Xt = 0.5Xt−1+εt+ 0.8εt−1.

Visual inspection of the plots may suggest an AR(2) or maybe an AR(3) model, as the third partial autocorrelation is close to the significance bound- ary. The ACF decays relatively fast without cutting off sharply, such that an MA model is not plausible. Thus, one may now estimate the two AR model specifications and inspect the correlograms of their residuals. Doing so, we note that the residuals of an AR(1) model still show significant ACF and PACF values, while these are marginally significant for AR(2) and insignificant for AR(3). Additionally, all 3 coefficients of an AR(3) model are well established on the basis of their t–statistics. Thus, AR(3) is an acceptable model for the data.

Next, we may try ARMA(3, q) models, with q_≤3. Indeed estimation of ARMA(3,1) yields a highly significant MA term and insignificant coefficients

φ2andφ3. When these are eliminated and an ARMA(1,1) model is estimate,

we obtain ˆθ1 = 0.85 and ˆφ1 = 0.54, which are good approximations to the—

Figure 18: Empirical PACF of a time series with 100 observations that was generated form the process Xt= 0.5Xt−1+εt+ 0.8εt−1.

R2_{using one parameter less than AR(3) and is to be preferred for that reason,}

although both models give an acceptable description of the data.

It is obvious that this trial-and-error–procedure is cumbersome. There- fore, many current time-series analysts prefer more mechanical techniques. For example, one may estimate a sizeable set of different ARMA models for

p= 0, . . . , P andq= 0, . . . , Q. Some criterion is evaluated for each estimated model, and the accordingly best model is then used. Some researchers recommend to inspect the two or three best models more closely and to subject them to specification tests. The most customary criteria are the information criteria AIC (by Akaike) and BIC (by Schwarz). They are defined by

AIC = log ˆσ2(p, q) + 2

T (p+q), BIC = log ˆσ2(p, q) + logT

T (p+q),

or alternatively via the likelihood. ˆσ2₍_{p, q}_{) is the error variance estimated}

from an ARMA(p, q) model. If we increase the lag ordersp and q, the first term ˆσ2 _{will decrease due to a closer fit, while the second term grows, a}

a good model. AIC tends to select slightly larger orders than BIC. Whereas BIC yields the true pandq for T _{→ ∞} under certain conditions (it is a con- sistent selection criterion), AIC will over-estimate lag orders asymptotically. Some maintain that AIC has better properties in intermediate samples, while some authors suggest modifications in very small samples (B&D support the AICC, a modified AIC).

The textbook by Hamilton ignores these criteria and recommends se- quences of statistical hypothesis tests, for example a t–test for the coefficient

φp in an AR(p) model. If the null hypothesis φp = 0 is accepted, we prefer

the AR(p₋1) model. Because of the danger of common zeros in the ARMA model, this procedure can become unreliable in some cases.

In document Applied Time Series Analysis Part I (Page 34-37)