3 Vector Autoregressive and Vector Error Correction
3.2 VARs and VECMs
In this section, we first introduce the basic vector autoregressive and error correction models, neglecting deterministic terms and exogenous variables.
How to account for such terms will be discussed afterwards.
3.2.1 The Models
For a set of K time series variables yt = (y1t, . . . , yKt), a VAR model captures their dynamic interactions. The basic model of order p (VAR( p)) has the form yt = A1yt−1+ · · · + Apyt−p+ ut, (3.1) where the Ai’s are (K × K ) coefficient matrices and ut = (u1t, . . . , uKt)is an unobservable error term. It is usually assumed to be a zero-mean independent white noise process with time-invariant, positive definite covariance matrix E(utut)= u. In other words, the ut’s are independent stochastic vectors with ut ∼ (0, u).
The process is stable if
det(IK− A1z− · · · − Apzp)= 0 for |z| ≤ 1, (3.2) that is, the polynomial defined by the determinant of the autoregressive operator has no roots in and on the complex unit circle. On the assumption that the process has been initiated in the infinite past (t = 0, ±1, ±2, . . .), it generates stationary time series that have time-invariant means, variances, and covariance
structure. If the polynomial in (3.2) has a unit root (i.e., the determinant is zero for z= 1), then some or all of the variables are integrated. For convenience we assume for the moment that they are at most I(1). If the variables have a common stochastic trend, it is possible there are linear combinations of them that are I(0). In that case they are cointegrated. In other words, a set of I(1) variables is called cointegrated if a linear combination exists that is I(0). Occasionally it is convenient to consider systems with both I(1) and I(0) variables. Thereby the concept of cointegration is extended by calling any linear combination that is I(0) a cointegration relation, although this terminology is not in the spirit of the original definition because it can happen that a linear combination of I(0) variables is called a cointegration relation.
Although the model (3.1) is general enough to accommodate variables with stochastic trends, it is not the most suitable type of model if interest centers on the cointegration relations because they do not appear explicitly. The VECM form
yt = !yt−1+ 1yt−1+ · · · + p−1yt−p+1+ ut (3.3) is a more convenient model setup for cointegration analysis. Here ! =
−(IK− A1− · · · − Ap) andi = −(Ai+1+ · · · + Ap) for i= 1, . . . , p − 1.
The VECM is obtained from the levels VAR form (3.1) by subtracting yt−1
from both sides and rearranging terms. Becauseytdoes not contain stochas-tic trends by our assumption that all variables can be at most I(1), the term
!yt−1is the only one that includes I(1) variables. Hence,!yt−1must also be I(0). Thus, it contains the cointegrating relations. Thejs ( j = 1, . . . , p − 1) are often referred to as the short-run or short-term parameters, and!yt−1 is sometimes called the long-run or long-term part. The model in (3.3) will be abbreviated as VECM( p− 1). To distinguish the VECM from the VAR model, we sometimes call the latter the levels version. Of course, it is also possible to determine the Ajlevels parameter matrices from the coefficients of the VECM.
More precisely, A1= 1+ ! + IK, Ai= i− i−1for i= 2, . . . , p − 1, and Ap= −p−1.
If the VAR( p) process has unit roots, that is, det(IK− A1z− · · · − Apzp) = 0 for z = 1, the matrix ! = −(IK− A1− · · · − Ap) is singular.
Suppose rk(!) = r. Then ! can be written as a product of (K × r) matrices α andβ with rk(α) = rk(β) = r as follows: ! = αβ. Premultiplying an I(0) vector by some matrix results again in an I(0) process. Thus,βyt−1 is I(0) because it can be obtained by premultiplying!yt−1= αβyt−1with (αα)−1α. Hence, βyt−1contains cointegrating relations. It follows that there are r = rk(!) lin-early independent cointegrating relations among the components of yt. The rank of! is therefore referred to as the cointegrating rank of the system, and β is a cointegration matrix. For example, if there are three variables with two
cointegration relations (r = 2), we have
The matrixα is sometimes called the loading matrix. It contains the weights attached to the cointegrating relations in the individual equations of the model.
The matricesα and β are not unique, and thus there are many possible α and β matrices that contain the cointegrating relations or linear transformations of them. In fact, using any nonsingular (r× r) matrix B, we obtain a new loading matrixαB and cointegration matrix βB−1, which satisfy! = αB(βB−1). Con-sequently, cointegrating relations with economic content cannot be extracted purely from the observed time series. Some nonsample information is required to identify them uniquely.
The model (3.3) contains several special cases that deserve to be pointed out.
If all variables are I(0), r= K and the process is stationary. If r = 0, the term
!yt−1 disappears in (3.3). In that case,yt has a stable VAR representation.
In other words, a stable VAR representation exists for the first differences of the variables rather than the levels variables. Clearly, these boundary cases do not represent cointegrated systems in the usual sense of having a common trend. There are also other cases in which no cointegration in the original sense is present, although the model (3.3) has a cointegrating rank strictly between 0 and K . Suppose, for instance, that all variables but one are I(0); then, the cointegrating rank is K− 1, although the I(1) variable is not cointegrated with the other variables. Similarly, there could be K− r unrelated I(1) variables and r I(0) components. Generally, for each I(0) variable in the system there can be a column in the matrixβ with a unit in one position and zeros elsewhere.
These cases do not represent a cointegrating relation in the original sense of the term. Still it is convenient to include these cases in the present framework because they can be accommodated easily as far as estimation and inference are concerned. Of course, the special properties of the variables may be important in the interpretation of a system and, hence, a different treatment of the special
cases may be necessary in this respect. The VECM in (3.3) also indicates that, for a cointegrating rank r > 0, the vector of first differences of the variables,
yt, does not have a finite order VAR representation.
3.2.2 Deterministic Terms
Several extensions of the basic models (3.1) and (3.3) are usually necessary to represent the main characteristics of a data set of interest. From Figure 3.1, it is clear that including deterministic terms, such as an intercept, a linear trend term, or seasonal dummy variables, may be required for a proper representation of the DGP. One way to include deterministic terms is simply to add them to the stochastic part,
yt = µt+ xt. (3.4)
Here µt is the deterministic part, and xt is a stochastic process that may have a VAR or VECM representation, as in (3.1) or (3.3). In other words, xt = A1xt−1+ · · · + Apxt−p+ ut or xt = !xt−1+ 1xt−1+ · · · +
p−1xt−p+1+ ut. On the assumption, for instance, thatµt is a linear trend term, that is,µt = µ0+ µ1t, such a model setup implies the following VAR( p) representation for yt:
yt = ν0+ ν1t+ A1yt−1+ · · · + Apyt−p+ ut. (3.5) This representation is easily derived by left-multiplying (3.4) with A(L)= IK − A1L− · · · − ApLp, where L is the lag operator, as usual. Noting that A(L)xt = utand rearranging terms, we find thatν0= A(1)µ0+ (p
j=1 jAj)µ1
andν1= A(1)µ1. Hence,ν0andν1satisfy a set of restrictions implied by the trend parametersµ0andµ1and the VAR coefficients.
Alternatively, one may view (3.5) as the basic model without restrictions for νi (i = 0, 1). In that case, the model can, in principle, generate quadratic trends if I(1) variables are included, whereas in (3.4), with a deterministic term µt = µ0+ µ1t, a linear trend term is permitted only. It is sometimes advantageous in theoretical derivations that, in (3.4), a clear partitioning of the process in a deterministic and a stochastic component be available. In some instances it is desirable to subtract the deterministic term first because the stochastic part is of primary interest in econometric analyses. Then the analysis can focus on the stochastic part containing the behavioral relations.
Of course, a VECM( p− 1) representation equivalent to (3.5) also exists. It has the form
yt = ν0+ ν1t+ !yt−1+ 1yt−1+ · · · + p−1yt−p+1+ ut. We will see in Section 3.4.2 that the restrictions onν0andν1sometimes allow absorption of the deterministic part into the cointegrating relations.
3.2.3 Exogenous Variables
Further generalizations of the model are often desirable in practice. For ex-ample, one may wish to include further stochastic variables in addition to the deterministic part. A rather general VECM form that includes all these terms is
yt = !yt−1+ 1yt−1+ · · · + p−1yt−p+1+ CDt+ Bzt+ ut, (3.6) where the zts are “unmodeled” stochastic variables, Dt contains all regressors associated with deterministic terms, and C and B are parameter matrices. The zts are considered unmodeled because there are no explanatory equations for them in the system (3.6). For example, if interest centers on a money demand relation, sometimes a single-equation model formt is set up and no separate equations are set up for the explanatory variables such as gnptand Rt.
Including unmodeled stochastic variables may be problematic for inference and analysis purposes unless the variables satisfy exogeneity requirements. Dif-ferent concepts of exogeneity have been considered in the literature [see Engle, Hendry & Richard (1983)]. A set of variables ztis said to be weakly exogenous for a parameter vector of interest, for instanceθ, if estimating θ within a condi-tional model (condicondi-tional on zt) does not entail a loss of information relative to estimating the vector in a full model that does not condition on zt. Furthermore, ztis said to be strongly exogenous if it is weakly exogenous for the parameters of the conditional model and forecasts of ytcan be made conditional on ztwithout loss of forecast precision. Finally, zt is termed super exogenous forθ if zt is weakly exogenous forθ and policy actions that affect the marginal process of ztdo not affect the parameters of the conditional process. Hence, weak, strong, and super exogeneity are the relevant concepts for estimation, forecasting, and policy analysis, respectively [Ericsson, Hendry & Mizon (1998)]. In this chap-ter the chap-term exogeneity refers to the relevant concept for the respective context if no specific form of exogeneity is mentioned.
All the models we have presented so far do not explicitly include instan-taneous relations between the endogenous variables yt. Therefore, they are reduced form models. In practice, it is often desirable to model the contem-poraneous relations as well, and therefore it is useful to consider a structural form
Ayt = !∗yt−1+ ∗1yt−1+ · · · + ∗p−1yt−p+1+ C∗Dt
+B∗zt+ vt, (3.7)
where the!∗,∗j( j= 1, . . . , p − 1), C∗, and B∗are structural form parameter matrices andvt is a (K× 1) structural form error term that is typically a zero mean white noise process with time-invariant covariance matrixv. The matrix Acontains the instantaneous relations between the left-hand-side variables. It
has to be invertible. The reduced form corresponding to the structural model (3.7) is given in (3.6) withj = A−1∗j( j= 1, . . . , p − 1), C = A−1C∗,! = A−1!∗, B= A−1B∗, and ut = A−1vt. In this chapter we will primarily focus on reduced form models. Structural form models are discussed in more detail in Chapter 4. Estimation of the model parameters will be considered next.
3.3 Estimation
Because estimation of the unrestricted levels VAR representation (3.1) and the VECM (3.3) is particularly easy, these models are considered first. Afterwards, estimation under various restrictions is discussed. In this section we make the simplifying assumption that the lag order and, where used, the cointegrating rank are known. Of course, in practice these quantities also have to be specified from the data. Statistical procedures for doing so will be presented in Section 3.4.
Estimation is discussed first because it is needed in the model specification procedures.
3.3.1 Estimation of an Unrestricted VAR
Given a sample y1, . . . , yT and presample values y−p+1, . . . , y0, the K equa-tions of the VAR model (3.1) may be estimated separately by ordinary least squares (OLS). The resulting estimator has the same efficiency as a generalized LS (GLS) estimator, as shown by Zellner (1962). Following L¨utkepohl (1991), we use the notation Y = [y1, . . . , yT], A= [A1:· · · : Ap], U = [u1, . . . , uT] and Z = [Z0, . . . , ZT−1], where
Zt−1=
yt−1
... yt−p
.
Then the model (3.1) can be written as
Y = AZ + U (3.8)
and the OLS estimator of A is
Aˆ= [ ˆA1:· · · : ˆAp]= YZ(ZZ)−1. (3.9) Under standard assumptions [see, e.g., L¨utkepohl (1991)], the OLS estimator
A is consistent and asymptotically normally distributed,ˆ
√T vec( ˆA− A)→ N(0, d Aˆ). (3.10)
Here vec denotes the column stacking operator that stacks the columns of a matrix in a column vector, and→ signifies convergence in distribution. A mored
intuitive notation for the result in (3.10) is vec( ˆA)∼ N(vec(A), a Aˆ/T ),
where ∼ indicates “asymptotically distributed as”. The covariance matrixa of the asymptotic distribution is Aˆ = plim(ZZ/T )−1⊗ u and thus an even more intuitive, albeit imprecise, way of writing the result in (3.10) is
vec( ˆA)≈ N(vec(A), (ZZ)−1⊗ u).
For a normally distributed (Gaussian) I(0) process yt, the OLS estimator in (3.9) is identical to the maximum likelihood (ML) estimator conditional on the initial values.
The OLS estimator also has the asymptotic distribution in (3.10) for non-stationary systems with integrated variables [see Park & Phillips (1988, 1989), Sims et al. (1990) and L¨utkepohl (1991, Chapter 11)]. In that case it is im-portant to note, however, that the covariance matrixAˆis singular, whereas it is nonsingular in the usual I(0) case. In other words, if there are integrated or cointegrated variables, some estimated coefficients or linear combinations of coefficients converge with a faster rate than T1/2. Therefore, the usual t-,χ2-, and F -tests for inference regarding the VAR parameters may not be valid in this case, as shown, for example, by Toda & Phillips (1993). As an example consider a univariate first-order autoregressive process yt = ρyt−1+ ut. If yt
is I(1) and, hence,ρ = 1, the OLS estimator ˆρ of ρ has a nonstandard limiting distribution. The quantity√
T ( ˆρ − ρ) converges to zero in probability, that is, the limiting distribution has zero variance and is degenerate, whereas T ( ˆρ − ρ) has a nondegenerate nonnormal limiting distribution (see Chapter 2). It is per-haps worth noting, however, that even in VAR models with I(1) variables, there are also many cases where no inference problems occur. As shown by Toda
& Yamamoto (1995) and Dolado & L¨utkepohl (1996), if all variables are I(1) or I(0) and if a null hypothesis is considered that does not restrict elements of each of the Ais (i= 1, . . . , p), the usual tests have their standard asymptotic properties. For example, if the VAR order p≥ 2, the t-ratios have their usual asymptotic standard normal distributions because they are suitable statistics for testing that a single coefficient is zero. In other words, they test a null hypothesis constraining one coefficient only in one of the parameter matrices while leaving the other parameter matrices unrestricted.
The covariance matrixu may be estimated in the usual way. Denoting by uˆtthe OLS residuals, that is, ˆut= yt− ˆAZt−1, the matrices are possible estimators. Both estimators are consistent and asymptotically nor-mally distributed independently of ˆA, that is,√
T ( ˆu− u) and√
T (u− u)
have asymptotic normal distributions if sufficient moment conditions are im-posed [see L¨utkepohl (1991) and L¨utkepohl & Saikkonen (1997)]. These prop-erties are convenient for inference purposes.
As an example consider a system consisting of the German long-term interest (Rt) and inflation rate (Dpt) plotted in Figure 3.1. Obviously, both series appear to fluctuate around a nonzero mean and, in addition, the inflation rate has a clear seasonal pattern. Therefore, in contrast to the theoretical situation just discussed, it seems useful to include deterministic components in the model.
We just mention here that when they are included, the general estimation strategy remains unchanged. In other words, estimation is still done by OLS for each equation separately if the same deterministic terms are added to each equation.
They can be included by extending the Zt−1vectors in the foregoing formulas straightforwardly. Adding such terms does not affect the general asymptotic properties of the VAR coefficient estimators mentioned earlier. Of course, these properties are in general valid only if the model is specified properly. Hence, deleting the deterministic terms from a system for which they are needed for a proper specification may have an impact on the asymptotic properties of the estimators.
Using data from 1972Q2–1998Q4 and estimating a model for (Rt, Dpt)of order p = 4 with constant terms and seasonal dummies gives
Rt
Notice that, owing to the four lagged values on the right-hand side, only data from 1973Q2–1998Q4 are actually used as sample values, and thus the sample size is T = 103. The values for 1972Q2–1973Q1 are treated as presample values. In Equation (3.12) t-values are given in parentheses underneath the coefficient estimates. If the series are generated by a stationary process, the ratios actually have asymptotic standard normal distributions; thus, the t-values have the usual interpretation. For example, the coefficient estimates are significant (more precisely: significantly different from zero) at the 5% level if
the t-ratios have absolute values greater than 1.96. Using this rule, one finds for example, that the coefficient of Dpt−1in the first equation is significant, whereas the one in the second equation is not. Generally, there are many insignificant coefficients under this rule. Therefore, model reductions may be possible. On the other hand, two of the t-ratios in the coefficient matrix attached to lag 4 are larger than 2 in absolute value. Consequently, simply reducing the VAR order and thus dropping the larger lags may not be a good strategy here. We will discuss estimation and specification of models with parameter constraints of various forms later on.
In fact, a univariate analysis of the two series reveals that both variables are well described as I(1) variables. The earlier discussion of integrated variables implies that the t-ratios maintain their usual interpretation for the VAR coeffi-cient estimates even in this case because we have estimated a model of order greater than 1. Notice that adding deterministic terms into the model does not affect these results. The t-ratios of the parameters associated with the deter-ministic part may not be asymptotically standard normal, however. Therefore, the proper interpretation of the t-ratios of the coefficients in the last parameter matrix in (3.12) is not clear. It makes sense, however, that the t-ratios of the seasonal dummy variables in the inflation equation have larger absolute values than the ones in the first equation of the estimated system because Dpt has a seasonal pattern whereas Rtis free of obvious seasonality. In general, seasonal dummies may be needed in an equation for a nonseasonal variable if some of the right-hand-side variables have a seasonal pattern, as is the case in the present model, where lags of Dptalso appear in the Rt equation.
The estimated residual correlation matrix Corr(ut) is the one corresponding to the estimated covariance matrix ˆu, as given in JMulTi. In the present example system, the instantaneous correlation between the two variables is obviously quite small and is not significantly different from zero (at a 5% level) because it is within an interval±1.96/√
T = ±0.2 around zero.
3.3.2 Estimation of VECMs
Reduced rank ML estimation. If the cointegrating rank of the system un-der consiun-deration is known, working with the VECM form (3.3) is conve-nient for imposing a corresponding restriction. In deriving estimators for the parameters of (3.3), the following notation is used:Y = [y1, . . . , yT], Y−1= [y0, . . . , yT−1], U = [u1, . . . , uT], = [1:· · · : p−1], and X= [X0, . . . , XT−1] with
Xt−1=
yt−1
...
yt−p+1
.
For a sample with T observations and p presample values, the VECM (3.3) can now be written compactly as
Y = !Y−1+ X + U. (3.13)
Given a specific matrix!, the equationwise OLS estimator of is easily seen to be
= (Y − !Yˆ −1)X(XX)−1. (3.14)
Substituting in (3.13) and rearranging terms gives
YM = !Y−1M+ ˆU, (3.15)
where M = I − X(XX)−1X . For a given integer r , 0< r < K , an estimator
! of ! with rk( ˆ!) = r can be obtained by a method known as canonicalˆ correlation analysis [see Anderson (1984)] or, equivalently, a reduced rank (RR) regression based on the model (3.15). Following Johansen (1995a), the estimator may be determined by defining
S00 = T−1YMY, S01= T−1YMY−1, S11= T−1Y−1MY−1 and solving the generalized eigenvalue problem
det(λS11− S01 S00−1S01)= 0. (3.16) Let the ordered eigenvalues beλ1≥ · · · ≥ λK with corresponding matrix of eigenvectors V = [b1, . . . , bK] satisfyingλiS11bi= S01 S00−1S01bi and normal-ized such that VS11V = IK. The reduced-rank estimator of! = αβis then obtained by choosing
β = [bˆ 1, . . . , br] and
α = YMYˆ −1 β( ˆβˆ Y−1MY−1 β)ˆ −1, (3.17) that is, ˆα may be viewed as the OLS estimator from the model
YM = α ˆβY−1M+ ˜U.
The corresponding estimator of ! is ˆ! = ˆα ˆβ. Using (3.14), we find that a feasible estimator of is ˆ = (Y − ˆ!Y−1)X(XX)−1. Under Gaussian as-sumptions these estimators are ML estimators conditional on the presample values [Johansen (1988, 1991, 1995a)]. They are consistent and jointly asymp-totically normal under general assumptions,
√T vec([ ˆ1:· · · : ˆp−1]− [1:· · · : p−1])→ N(0, d ˆ)
and
√T vec( ˆ! − !)→ N(0, d !ˆ).
Here the asymptotic distribution of ˆ is nonsingular; thus, standard inference may be used for the short-term parametersj. On the other hand, the (K2× K2) covariance matrix!ˆ can be shown to have rank K r and is therefore singular if r < K . This result is due to two factors. On the one hand, imposing the rank constraint in estimating! restricts the parameter space and, on the other hand, ! involves the cointegration relations whose estimators have specific asymptotic properties.
In this approach the parameter estimator ˆβ is made unique by the normal-ization of the eigenvectors, and ˆα is adjusted accordingly. However, these are not econometric identification restrictions. Therefore, only the cointegration space but not the cointegration parameters are estimated consistently. To esti-mate the matricesα and β consistently, it is necessary to impose identifying (uniqueness) restrictions. Without such restrictions only the productαβ= ! can be estimated consistently. An example of identifying restrictions that has received some attention in the literature assumes that the first part ofβ is an identity matrix, that is,β= [Ir :β(K −r)], whereβ(K−r)is a ((K − r) × r) ma-trix. For r= 1, this restriction amounts to normalizing the coefficient of the
In this approach the parameter estimator ˆβ is made unique by the normal-ization of the eigenvectors, and ˆα is adjusted accordingly. However, these are not econometric identification restrictions. Therefore, only the cointegration space but not the cointegration parameters are estimated consistently. To esti-mate the matricesα and β consistently, it is necessary to impose identifying (uniqueness) restrictions. Without such restrictions only the productαβ= ! can be estimated consistently. An example of identifying restrictions that has received some attention in the literature assumes that the first part ofβ is an identity matrix, that is,β= [Ir :β(K −r)], whereβ(K−r)is a ((K − r) × r) ma-trix. For r= 1, this restriction amounts to normalizing the coefficient of the