2 Univariate Time Series Analysis
2.4 Parameter Estimation
A more extensive discussion of ARCH-type processes is provided in Chapter 5. For the remainder of this chapter, a basic knowledge of these models is sufficient.
2.3.5 Deterministic Terms
So far we have considered purely stochastic processes with zero mean. In prac-tice, such processes are rarely sufficient for an adequate representation of real-life time series. Consider, for instance, the U.S. investment series in Figure 2.1, which may be generated by a stationary process. Its mean is not likely to be zero, however. Consequently, we have to allow at least for a nonzero mean term. For many series, more general deterministic terms may be required. For example, a polynomial trend term or seasonal dummy variables may have to be included.
We will do so by adding such deterministic terms simply to the stochastic part of the process, that is, we assume that the observable process ytis equal toµt+ xt, whereµtis a purely deterministic part and xtis a purely stochastic process.
For example, xtmay be an ARIMA process, whereasµt = µ or µt = µ0+ µ1t orµt = µ0+ µ1t+ δ1s1t+ · · · + δqsqt are examples of deterministic terms.
Here si trepresents a seasonal dummy variable that has the value 1 if t refers to the i th season but is zero otherwise. The number of seasons is assumed to be q.
Although there may be series for which our assumption of an additive relation between the deterministic and stochastic parts of the DGP does not hold, the assumption is often not very restrictive in practice and it is therefore usually supposed to hold in the following chapters.
2.4 Parameter Estimation
2.4.1 Estimation of AR Models
Estimation of AR processes is particularly easy because it can be done by ordinary least squares (OLS). Therefore, it will be considered first before we comment on the estimation of more complicated models. If the deterministic
term is linear in the unknown parameters, it can be included in a straightforward way in the regression model used for estimation. To simplify the presentation, we assume that the deterministic term consists of a constant only, that is,µt = µ and thus yt = µ + xt; hence, α(L)yt = α(L)µ + α(L)xt= α(1)µ + ut. The have unit roots, then, under standard assumptions [see, e.g., Brockwell & Davis (1987)],
or, written in a more intuitive although less precise way,
α ≈ Nˆ
The residual variance may be estimated consistently as σˆu2= 1
As an example we have estimated an AR(4) model for the U.S. investment series. The first four observations are set aside as presample values, and con-sequently we have sample values for 1948Q2− 1972Q4; hence, T = 99. The resulting estimated model is in parentheses. Here ˆσαˆi denotes an estimator of the standard deviation of ˆαi. In other words, ˆσαˆi is the square root of the diagonal element of σˆu2(T
t=1Yt−1Yt−1)−1, which corresponds to ˆαi.
It may be worth noting that OLS estimation of the model (2.10) is equivalent to maximum likelihood (ML) estimation conditional on the initial values if
the process is normally distributed (Gaussian). In that case, the estimators have asymptotic optimality properties. Moreover, the results for the AR coefficients also hold if ytis I(1) and the AR order is greater than one ( p> 1). In that case, the covariance matrix of the asymptotic distribution is singular, however [see, e.g., Sims, Stock & Watson (1990)]. This fact has, for instance, implications for setting up F -tests for hypotheses regarding the coefficients. Therefore, although the asymptotic theory remains largely intact for unit root processes, it may still be preferable to treat them in a different way, in particular, if inference regarding the unit root is of interest. This issue is discussed in more detail in Section 2.7.
If ytis known to be I(d), then it is preferable to set up a stable model fordyt.
2.4.2 Estimation of ARMA Models
If a model for the DGP of a time series involves MA or GARCH terms, es-timation becomes more difficult because the model is then nonlinear in the parameters. It is still possible to set up the Gaussian likelihood function and use ML or, if the conditional distributions of the observations are not Gaussian (normally distributed), quasi-ML estimation. The joint density of the random variables y1, . . . , yTmay be written as a product of conditional densities
f (y1, . . . , yT)= f (y1)· f (y2|y1)· · · f (yT|yT−1, . . . , y1).
Hence, the log-likelihood function for an ARMA( p, q) process α(L)yt= m(L)uthas the form
l(α1, . . . , αp, m1, . . . , mq)=
T t=1
lt(·), (2.11)
where
lt(·) = −1
2log 2π − 1
2logσu2− (m(L)−1α(L)yt)2/2σu2
if the conditional distributions of the yt are normally distributed. Maximiz-ing the log-likelihood results in ML or quasi-ML estimators in the usual way.
The optimization problem is highly nonlinear and should observe inequality restrictions that ensure a unique, stable ARMA representation. Notice that, for uniqueness, the model must be such that cancellation of parts of the MA term with parts of the AR operator is not possible. Under general conditions, the resulting estimators will then have an asymptotic normal distribution, which may be used for inference.
Because iterative algorithms usually have to be used in optimizing the log-likelihood, start-up values for the parameters are required. Different procedures may be used for this purpose. They depend on the model under consideration.
For example, for an ARMA model, one may first fit a pure AR model with long
order h by OLS. Denoting the residuals by ˆut(h) (t= 1, . . . , T ), one may then obtain OLS estimates of the parameters from the regression equation
yt = α1yt−1+ · · · + αpyt−p+ ut+ m1uˆt−1(h)+ · · · + mquˆt−q(h). These estimates may be used for starting up an iterative maximization of the log-likelihood function.