5.2
Non-linearity Testing
5.2.1
Why Multivariate Tests?
Many routinely applied non-linearity tests are the so called neglected non-linearity tests. It means that non-linearity is inspected from residuals obtained from a linear filter. The most often applied filter in the time series literature is an ARMA model. Although this type of models may be perfectly reasonable for exogenous stochastic processes (e.g. sunspots), it might be questionable for economic time series, which are dependent (co-integrated/correlated) in nature. The problem is that applying linear ARMA models to individual components of a multivariate time series process is subject to misspecification (due to omitting some explanatory variables). This fact can lead to size and/or power distortions of the standard non-linearity tests discussed in Chapter 2, which in turn, may lead to misleading inference. In order to make this point clear, we consider the following two examples.
A size distortion: Let us consider a bivariate stationary VAR(1) model xt =
ξ1xt−1+at, where xt= (X1t, X2t)0and at = (a1t, a2t)0is a vector of model innovations
such that ait ∼ IID(0, σi2), for i ∈ {1, 2}, and innovations are independent each
other, regardless of the time index. In addition to that, let us consider a simple, yet very general, definition of (conditional) linearity used in Lee et al. (1993). According to their definition, the process xt is called linear in the conditional mean if and only
if
P(E(xt|xt−1) = Υ0+ Υ1xt−1) = 1.
The alternative hypothesis is that xt is not conditionally linear
P(E(xt|xt−1) = Υ0+ Υ1xt−1) < 1.
Obviously, the system is linear since Υ0 = 0 and Υ1 = ξ1 in our case. Now suppose
that non-linearity is not tested for the whole vector xt but all variables are treated
individually using an univariate version of the definition above. For example, let us consider the variable X1t and suppose that an appropriate filter is, only for simplic-
5.2 Non-linearity Testing 134
E(X1t|xt−1), it also holds that E(X1t|X1t−1) = ξ11X1t−1 + ξ12E(X2t−1|X1t−1) =
(ξ11 + ξ12ρ)X1t−1. Therefore, the incorrect conditioning does not have to lead to
a serious size distortion under the null hypothesis of linearity.3
A power distortion: Now let us consider a specific non-linear VAR(1) model xt = g1(xt−1) + at, where xt = (X1t, X2t)0 and at = (a1t, a2t)0 is a vector of model
innovations such that ait ∼ IID(0, σi2), for i ∈ {1, 2}, and innovations are indepen-
dent each other, regardless of the time index. The functional form of the model is given by
X1t = ξ12X2t−12 + a1t,
X2t = ξ22X2t−1+ a2t.
Obviously, the system is non-linear since E(xt|xt−1) 6= Υ0+Υ1xt−1in our case. Now
suppose that non-linearity is not tested for the whole vector xt but all variables are
treated individually using an univariate version of the definition above. Only for simplicity of exposition, suppose that an appropriate filter is an AR(1) process for both variables. It is clear that E(X2t|X2t−1) = ξ22X2t−1 and the linearity condition
is satisfied. It holds that E(X1t|X1t−1) = ξ12σ22 + ξ222 X1t−1− ξ222 a1t−1 and the lin-
earity condition is not formally satisfied in this case. However, the condition fails only due to some IID error term. As a results, we might expect a significant power loss of standard univariate non-linearity tests. This example shows how easily the probability of type II error may arise due to incorrect filtration (conditioning) under the alternative hypothesis.
Before we proceed to a formal testing procedure, we state an assumption about a stochastic process under consideration. The assumption is of the crucial importance for setting the null hypothesis of linearity.
3However, it is worth noting that a size distortion can be expected to increase especially in large-dimensional time series models. The author is very grateful to Professor Timo Ter¨asvirta from the Aarhus University for making this point.
5.2 Non-linearity Testing 135
Assumption 5 Let us assume the following stationary real-valued finite-order linear VAR model under the null hypothesis
xt= ξ0+ P
X
i=1
ξixt−i+ at, (5.1)
where xtdenotes a (k×1) vector, {at : t ∈ Z} is a sequence of multivariate WN(0,Σ)
innovations with zero means and the variance-covariance matrix Σ, which is sym- metric and positive definite, such that E(katk8) < ∞. Let β = (ξ00, vec(ξ1)0, . . . , vec(ξp)0)0
be a (k2P + k × 1) parameter vector, which is assumed to lie in the interior of the parameter space given by
B = {β ∈ Rk2P +k : det(I − P X i=1 ξizi) 6= 0 for all |z| ≤ 1}.
The assumption ensures that a given linear process is stationary, parameters do not lie on the boundary, and all moment conditions are satisfied. These conditions are sufficient to ensure consistency of the estimated parameters in β, the estimated residuals, and subsequently, the non-linearity test statistics. Note that the null hy- pothesis of linearity can be extended to VARMA models as well.4
Although there are many different non-linearity tests in the literature, special at- tention is paid to a multivariate version of the ARCH test proposed by Engle (1982) and the TSAY test proposed by Tsay (1986). There are three good reasons for considering this couple of tests: (i) It is shown in Chapter 2 that both test statistics capture rather different types of non-linear features. The TSAY test is a simple test for non-linearity in the conditional mean, whereas the ARCH test for non-linearity in the conditional variance; (ii) Both tests suffer from a dimensionality problem in a similar way; (iii) Both tests are very well known, they are easy to construct and follow standard limiting distributions.
4Recall that identifying and estimating VARMA models is computationally expensive. For this reason, only a VAR model is considered under the null hypothesis.
5.2 Non-linearity Testing 136
5.2.2
Multivariate TSAY Test
Harvill and Ray (1999) proposed a multivariate version of the TSAY test. The test is based on running the following auxiliary equation
ˆ
at= b0+ B1zt+ B2vt+ ut, (5.2)
where ˆat is a (k × 1) vector of residuals from a particular model under the null
hypothesis (e.g. a VAR model), zt = (y0t−1, . . . , y 0 t−P)
0 denotes a (kP × 1) vector of
aggregated predetermined variables, vt = vech(zt⊗ z0t) represents an (s × 1) final
vector of predetermined variables consisting of all square and cross-product elements (s = kP (kP +1)/2). b0denotes a (k×1) vector of constants, B1represents a (k×kP )
matrix of parameters, and finally, B2 is a (k × s) matrix of parameters. The null
hypothesis of linearity of the vector xt is given by: H0 : B2 = 0 versus H1 : B2 6= 0.
The null hypothesis can be tested both by the LM- and LR-based test statistics.5
The appropriate LR-type test statistic is given by6
M T SAY (org) = (T − τ )(log(| ˆΣr|) − log(| ˆΣu|)) d
−→ χ2(sk), (5.3)
where | · | denotes the determinant of a square matrix, ˆΣr and ˆΣu represent the
estimated variance-covariance matrix of the restricted model, unrestricted model respectively, T is the sample size, and τ = (k + s + 1)/2 is a small sample correction term recommended by Anderson (2003, p. 321-3), where k is the number of vari- ables, and s represents the number of additional variables. He argues that the small sample correction works well, provided that k2 + s2 < T /3. Note that the model
in (5.2) can be easily estimated by a multivariate LS method, see L¨utkepohl (2005, Ch. 3) for details. The proof of the limiting distribution can be found in Anderson (2003, Ch. 8.5).
5Godfrey (1988, Chapter 2) shows that for a linear regression model such as (5.2), the LR-based test is slightly more powerful as compared to the LM-based test. In addition, both the LR- and LM-based tests are computed in the same way (using the auxiliary equation) in this particular case and have the same limiting distribution, see Davidson and MacKinnon (1999, p. 423–428). The LM-based test for multivariate systems is discussed in Deschamps (1993). Monte Carlo comparison of both LM- and LR-based tests for multivariate systems can be found in Deschamps (1996).
5.2 Non-linearity Testing 137
It is worth mentioning, however, that testing for non-linearity using the above de- fined MTSAY(org) test statistic displays at least two shortcomings: (i) Due to a large number of terms in the vector vt, the original test requires a large number of
observations; (ii) In addition, terms in the vector vt are highly collinear, which sig-
nificantly increases the degrees of freedom of the test, but actually does not improve the fit in (5.2). As a result, multicollinearity in the vector vt can reduce the power
of the multivariate tests. It is clear from the dimension of the vt vector that, for a
given set of k variables, the test requires T > s observations. For example, consider a small model consisting of k = 5 different economic variables and the moderate lag order P = 4 of a VAR model under the null. Then, the MTSAY(org) test requires T > 210 observations but the degrees of freedom surge to 1050. Table 5.1 depicts some other examples of the degrees of freedom and the number of observations re- quired by the original TSAY(org) test.
Table 5.1 Requirements of the original multivariate TSAY test
number of observations degrees of freedom
variables k/lags P 1 2 3 4 5 1 2 3 4 5
2 3 10 21 36 55 6 20 42 72 110
3 6 21 45 78 120 18 63 135 234 360
4 10 36 78 136 210 40 144 312 544 840
5 15 55 120 210 325 75 275 600 1050 1625
A dimensionality problem of the MTSAY test can be efficiently diminished using a principal component analysis (PCA). Details about PCA are discussed in the next section. The modified multivariate TSAY test is based on running the following auxiliary equation
ˆ
at= c0+ C1zt+ C2wt+ ut, (5.4)
where ˆatis a (k ×1) vector of residuals from a particular filter, wtis an (n×1) vector
of principal components, such that k ≤ n ≤ s.7, c
0 is a (k × 1) vector of constants, 7Note that the upper bound of the number of components might be restricted to s = [T /2], where [·] denotes an integer part.
5.2 Non-linearity Testing 138
C1 and C2 are (k × kP ) and (k × n) coefficient matrices. The null hypothesis of
linearity of the vector xt, is given by: H0 : C2 = 0 versus H1 : C2 6= 0. The
appropriate LR-based test statistic is given by
T SAY = (T − τ )(log(| ˆΣr|) − log(| ˆΣu|)) d
−→ χ2(nk). (5.5)
Although principal component analysis can reduce a dimensionality problem, in practice, there is still an ultimate question of how many components to retain. De- tails about a principal component analysis and various stopping rules for determining the number of principal components are discussed in Section 5.2.4.
5.2.3
Multivariate ARCH Test
L¨utkepohl (2005, Ch. 16) considers a multivariate version of the ARCH test. The test is based on running the following auxiliary equation
vech(ˆat⊗ ˆa0t) = b0+ Bvt+ ut, (5.6)
where vech(·) is a half-stacking operator, ˆatis a (k ×1) vector of residuals from a par-
ticular VAR model under the null hypothesis, vt = (vech(ˆat−1⊗ˆa0t−1)0, . . . , vech(ˆat−Q⊗
ˆ
a0t−Q)0)0 denotes an (s × 1) vector of the predetermined variables of the test, b0 is an
(m×1) vector of constants, and B is an (m×s) matrix of parameters (m = k(k+1)/2 and s = mQ) matrix of parameters. The null hypothesis of homoscedasticity of the vector at is given by: H0 : B = 0 versus H1 : B 6= 0. The null hypothesis can
be tested both by the LM- and LR-based test statistics in the multivariate setup. In order to be consistent with the multivariate TSAY test discussed earlier, the LR-based test is implemented. The appropriate LR-type test statistic is given by8
M ARCH(org) = (T − τ )(log(| ˆΣr|) − log(| ˆΣu|)) d
−→ χ2(sm), (5.7)
where | · | denotes the determinant of a square matrix, ˆΣr and ˆΣu represent the
estimated variance-covariance matrix of the restricted model, unrestricted model 8Note that the LR-based multivariate ARCH test is considered in other studies as well, see Hacker and Hatemi-J (2005), among others. The LM version of the test can be found in L¨utkepohl (2005, Ch. 16).
5.2 Non-linearity Testing 139
respectively, T is the sample size, and τ = (m + s + 1)/2 is a small sample correc- tion recommended by Anderson (2003, p. 321-3), where T stands for the sample size, k is the number of variables, and n represents the number of principal compo- nents. He argues that the small sample correction works very well, provided that m2+ s2 < T /3. Note that the model in (5.6) can be easily estimated by a multivari- ate LS method, see L¨utkepohl (2005, Ch. 3) for details. The proof of the limiting distribution can be found in Anderson (2003, Ch. 8.5).
It is worth mentioning, however, that testing the conditional variance using the above defined MARCH(org) test statistic displays similar shortcomings like the MT- SAY test. The main problem does not lie in the number of observations directly required for running the test but in the rapidly increasing degrees of freedom of the test. It is clear from the dimension of the vt vector that, for a given set of k vari-
ables, the test requires T > s observations but sm degrees of freedom. For example, consider a small model consisting of k = 5 different variables and the moderate lag order Q = 4 of the MARCH test. As a result, the test requires T > 60 observations but the degrees of freedom surge to 900. Table 5.2 depicts some other examples of the degrees of freedom and number of observations required by the original MARCH test.
Table 5.2 Requirements of the original multivariate ARCH test
number of observations degrees of freedom
variables k/lags Q 1 2 3 4 5 1 2 3 4 5
2 3 6 9 12 15 9 18 27 36 45
3 6 12 18 24 30 36 72 108 144 180
4 10 20 30 40 50 100 200 300 400 500
5 15 30 45 60 75 225 450 675 900 1125
Another problem is related to the specification of the alternative hypothesis H1. Lee
and King (1993) show that the alternative hypothesis should be configured as one- sided hypothesis (i.e. H1 : B > 0) in order to reflect the fact that the conditional
5.2 Non-linearity Testing 140
proposed a locally most mean powerful test for (G)ARCH models. Nevertheless, their results indicate that, although the size and power properties of the locally most mean powerful test are better as compared to standard LM-based ARCH test, the standard LM-based ARCH test can still be considered as a conservative test with good size and power properties. Since a multivariate extension of their results is not known, at least to the best of our knowledge, the standard two-sided form of the alternative hypothesis is used in this paper. We leave this interesting issue to further research.
Two modifications of the MARCH test are introduced. First, note that the original test is based on running the auxiliary equation for vech(ˆat ⊗ ˆa
0
t), which contains
many cross products. For example, in a bivariate case (i.e. k = 2), vech(ˆat⊗ ˆa0t) =
(ˆa21t, ˆa1taˆ2t, ˆa21t)0. The cross-elements might be important for the modelling purposes
but not necessarily for testing heteroscedasticity itself. The main argument is that provided diag(ˆat⊗ ˆa
0
t) is homoscedastic, then the cross elements are very likely to be
homoscedastic as well. This useful property holds for many multivariate (G)ARCH models (e.g. VEC, BEKK, CCC-GARCH models), but not all (e.g. a DCC-GARCH model). This diagonal modification immediately reduces the required degrees of freedom from Qm2 to Qkm. Using the same example as above, the required degrees of freedom would decrease from 900 to 300. Second, a principal component analysis is implemented in order to further diminish the dimensionality problem. The modified multivariate ARCH test is based on running the following auxiliary equation
diag(ˆat⊗ ˆa 0
t) = c0+ C1wt+ ut, (5.8)
where diag(ˆat⊗ ˆa0t) is a (k × 1) vector of diagonal elements, c0 is a (k × 1) vector of
constants, C1 is an appropriate (k × n) matrix of coefficients, and wt is an (n × 1)
vector of principal components, such that k ≤ n ≤ s.9 The null hypothesis of
homoscedasticity of the vector at, is given by: H0 : C1 = 0 versus H1 : C1 6= 0. The 9Note that the upper bound of the number of components might be restricted to s = [T /2], where [·] denotes an integer part.
5.2 Non-linearity Testing 141
appropriate LR-based test statistic is given by
M ARCH = (T − τ )(log(| ˆΣr|) − log(| ˆΣu|)) d
−→ χ2(nk). (5.9)
Although principal component analysis can reduce the dimensionality problem, in practice, there is still an ultimate question of how many components to retain. De- tails about a principal component analysis and various stopping rules for determining the number of principal components are discussed in Section 5.2.4.
5.2.4
Principal Component Analysis
A principal component analysis (PCA) is concerned with explaining the variance- covariance or correlation structure of a set of variables by a few linear combinations of original variables. Formally, the principal components are defined as follows
wjt = e0jvt, for j = 1, . . . , s, t = 1, . . . , T, (5.10)
where wjt is the jth-principal component, ej is a particular eigenvector associated
with the eigenvalue λj estimated from the variance-covariance or correlation matrix.
It is important to point out that there is no one-to-one mapping between the roots calculated from the variance-covariance matrix and correlation matrix. A problem is that, unlike the correlation matrix, the variance-covariance matrix is not scale invariant and, hence, neither the calculated roots. Therefore, comfortable or not, the use of the correlation matrix is often recommended, especially for heterogenous data sets and/or indicators originally measured in different units, see Jackson (1991, 64–65) for details. For this reason, the correlation matrix is used in this chapter unless otherwise stated. For instance, a vector of additional variables vt takes the
following form for the MTSAY test based on a VAR(P ) filter vt = vech(zt⊗ z0t),
where zt = (y0t−1, . . . , y 0 t−p)
0 is a vector of predetermined variables. Note that the
vector vt for the MARCH test is defined in a similar way, see the previous section.
A characteristic feature of PCA is that components are uncorrelated linear combi- nations of original variables due to orthogonality of the estimated eigenvectors (i.e. e0iej = 0 for all i 6= j). The advantage of this approach is that principal components
5.2 Non-linearity Testing 142
basically eliminate multicollinearity from a testing procedure.10 Another advantage of this approach is that, at least for testing purposes, no interpretation of the calcu- lated principal components is required, which significantly simplifies the use of PCA. Another interesting property is that var(wj) = e0jvar(vt)ej = λj, for j = 1, . . . , s,
which immediately implies that
s X j=1 var(wj) = s X j=1 var(vj) = s X j=1 λj. (5.11)
It is clear from (5.11) that s components are required to reproduce the total variance of the original data set. In practice, however, most of the variance can be accounted for by just a small number of the first components, say n < s.
Although PCA can reduce a dimensionality problem, there is still an ultimate ques- tion of how many components to retain in practice. Unfortunately, there is no definitive answer to this question. Stopping rules can be theoretically split into four basic categories: (i) purely statistical rules (e.g. a Bartlett test); (ii) graph- ical rules (e.g. a scree plot); (iii) rule-of-thumb stopping rules; and finally (iv) simulation/bootstrap-based rules. The interested reader is referred to Peres-Neto et al. (2005) for details.
We omit all statistical rules since they might be problematic in the context of a time series analysis. For example, let us consider the Bartlett test of the equivalence of the last s − n roots calculated from the variance-covariance matrix. There are two shortcomings of this test. The main disadvantage is that the limiting distribution