Capturing Cross-Sectional Correlation with Time Series: with an Application to Unit Root Test

(1)

Capturing Cross-Sectional Correlation with Time

Series: with an Application to Unit Root Test

Chor-yiu SIN

Wang Yanan Institute of Studies in Economics (WISE)

Xiamen University, Xiamen, Fujian, P.R.China, 361005

Email: [email protected]

January 2, 2007

Abstract

Traditional panel data analysis assumes cross-sectional uncorrelatedness. This is plausible when the cross-sectional units are households. Notably in the study of growth empirics or gravity models, panel data analysis is re-cently applied to cross-country data, in which cross-sectional dependence and unit root are common. Existing asymptotic theory either assumes that (i) T (the number of time-series units) goes to in nity while N (the number of cross-sectional units) is xed; or (ii) the cross-sectional units are well-ordered or well-indexed with "economic distance". In this paper, we assume that N goes to in nity while T is xed (as long as T is greater than q, where q is the number of restrictions under the null), which is more plausible in many panel data sets. On the other hand, no prior knowledge of the ordering or the indexation is assumed. Using the central limit theorem for stationary mixing random variables, we rst show thepN-consistency of an OLS estimator. We then construct a simple robust testing procedure that is insensitive to many possible cross-sectional correlations. The asymptotic critical values of this test can be simulated via Monte Carlo method. A number of Monte Carlo experiments suggest that our test has reasonable sizes in nite samples, even when N andT are as small as 50 and 2 respectively. It has non-trivial power when T is greater than or equal to 10.

Key Words: Correlation-insensitive test; Cross-sectional correlation; Eco-nomic distance; Robust testing; Stationary mixing random elds; Unit root.

JEL Classi cationC12; C21; C23

Acknowledgments: SIN thanks the comments from the participants at the Workshop on Se-quential Analysis, Time Series and Related Topics held in Academia Sinica on December 27-28, 2004; and those at Departmental Seminar of National Taipei University on January 29, 2007. The usual disclaimers apply.

(2)

1 Introduction

Throughout the paper, we consider the following linear regression model:

yit=xitβ+uit, (1.1)

where _i= 1_{, . . . , N} and _t = 1_{, . . . , T}, _T ≥ 2, _x_it is a _kx1-vector while both _y_it and

uit are scalars.

By now there is a huge literature covering cases that the time-series dimension

T goes to inﬁnity. In this paper, we focus on cases that the cross-section dimension

N goes to inﬁnity. Given this assumption, it turns out that our analyses are much easier when we assume that _T is ﬁxed. We maintain these two assumptions in the balance of the paper.

One major drawback in making inference on the parameter _β is that as far as (asymptotic) distributions are concerned, it is hard to model and estimate the cross-sectional correlations. More precisely, in one or the other estimators, one may need to model and estimate, for _t= 1_{, . . . , T}, the following _N(_N −1)_/2 terms:

E[_x_it_u_it_u_jt_x_jt]_, (1.2)

where _{i < j}, and _{i, j} = 1_{, . . . , N}.

In a purely time-series context, the time-series correlations can be modelled using the natural ordering (that is, time) of the series. In a purely cross-section context, Conley (1999) models the cross-sectional correlations using a metric of

economic distance. In virtue of the use of economic distance may be controversial, recently voluminous papers in the literature, in one way or the other, capture the cross-sectional correlations by further assuming that _T also goes to inﬁnity. See, for instance, Bai (2003). While the models suggested in many papers are found applicable to many data sets, it is unclear if they perform well when _T is rather small.

(3)

In this paper, following the lines in Conley (1999), we ﬁrst invoke some moment conditions and some mixing conditions to justify the asymptotic normal of our esti-mators. For making statistical inference on_β, instead of using a metric of economic distance, we capture the cross-sectional correlation with the time-series units. In principle, _T can be as small as 2. This contrasts tremendously with the existing literature in which _T is assumed to go to inﬁnity.

After constructing a generic Wald test with an _OLS estimator in Section 2, we show in Section 3 that a unit root test and a cointegration test can be cast as special cases of a Wald test. Some generalization and some extension can be found in Section 4. Unlike the conventional Wald test, our test does not distributed as a

χ2. The critical values need to be simulated by Monte Carlo method. Some values of certain special cases are tabulated in Section 5. We close this paper with some Monte Carlo experiments in Section 6 and conclusions and discussions in Section 7.

2 The OLS Estimator and the Wald Test

yit=xitβ+uit, (2.1)

where _i= 1_{, . . . , N} and _t = 1_{, . . . , T}, _T ≥ 2, _x_it is a _kx1-vector while both _y_it and

uit are scalars.

Assumption (a). _N → ∞and _T is ﬁxed. 2

In an extension of this paper, we will let_T → ∞ at an appropriate rate of_N. For sake of exposition, we ﬁrst consider the case in which the time period is divided into two parts. The ﬁrst part (with _T₁ observations) is for estimating the parameter _β (denoted as ˆ_β) while the second part (with _T₂ observations) is for estimating the ”variance-covariance” matrix of ˆ_β. Note _T = _T₁ +_T₂. As one can see in the subsequent discussions, it is possible to have the estimation period and the testing period overlap to each other, though we still require that _T ≥2.

(4)

Assumption (b). For _t = 1_{, . . . , T}, N−1/2 N i=1xituit −→LΓWtk, (2.2)

where Γ is a positive deﬁnite matrix and_W_tk is a_k−_dimensional standard normal random vector. 2

Though not necessary, at some point we may assume that_W_tk’s areindependent

across _t.

In a purely cross-sectional context, Conley (1999) shows that Assumption (b) holds in an even more general setting (see the proof of Proposition 2 there). Conley (1999) applies the _CLT (central limit theorem) of the stationary mixing random ﬁeld suggested in Bolthausen (1982).

In an extension to, say a ﬂuctuation test, we will modify Assumption (b) as:

Assumption (b*). For _t= 1_{, . . . , T}, N−1/2 [rN] i=1xituit −→LΓWtk(r),∀r∈[0,1].2 Assumption (c). For _t = 1_{, . . . , T}, N−1 N i=1xitx it →Ma.s., (2.3)

where _M is an _kxk- invertible constant matrix. 2

Note Assumption (c) of homogeneity can be relaxed a little at the expense of a more complicated estimation for the ”variance-covariance” matrix.

Given Assumption (c), for _N suﬃciently large, (T1

s=1Ni=1xisxis)−1 exists. We

may consider the usual_OLS estimator for_βin this context (see, for instance, Hsiao, 2003): ˆ β= ( T1 s=1 N i=1xisx is)−1( T1 s=1 N i=1xisyis )_. (2.4)

It is not diﬃcult to see that:

(_N−1 T1 s=1 N i=1xisx is) √ N( ˆ_β−_β) = T1 s=1N −1/2N i=1xisuis −→L Γ T1 s=1W k s.

(5)

Thus we have the following theorem for the limiting distribution of ˆ_β.

Theorem 2.1. Suppose Assumptions (a)-(c) hold.

√ N( ˆ_β−_β)−→_L_M−1Γ( 1 T1 T1 s=1W k s). (2.5) 2

The diﬃcult part in (2.5) is the estimation for Γ. In a time series context, there are numerous methods of estimation. See, for instance, de Long and Davidson (2000) and the references therein. Those references essentially assume some kind of mixing conditions in an ordered time series. In a cross-sectional context, Conley (1999) models the spatial correlation with a metric of economic distance; while Bai (2003) (and the reference therein) assumes that the time-series dimension (_T in our context) goes to ∞. In this paper, we do not model the economic distance on the one hand, and we allow _T to be ﬁxed on the other hand.

First note that given the _OLS estimator ˆ_β, we may deﬁne the residual in a straightforward way:

ˆ

uit =yit−xitβ,ˆ

where _t= 1_{, . . . , T}. For _t=_T₁+ 1_{, . . . , T}₁+_T₂, consider the following term:

xituˆit = xit(yit−xitβˆ) = _x_it(_u_it+_x_it _β−_x_it_βˆ) = _x_it_u_it−_x_it_x_it( ˆ_β−_β) = _x_it_u_it−_x_it_x_it( T1 s=1 N i=1xisx is)−1( T1 s=1 N i=1xisuis )_.

Therefore, in view of Assumptions (b)-(c),

N−1/2 N i=1xit ˆ uit = _N−1/2 N i=1xituit −(_N−1 N i=1xitx it)( T1 s=1N −1N i=1xisx is)−1(N−1/2 T1 s=1 N i=1xisuis )

(6)

= _N−1/2 N i=1xituit− 1 T1(N −1/2T1 s=1 N i=1xisuis ) +_o_p(1) = (_N−1/2 N i=1xituit − 1 T1 T1 s=1N −1/2N i=1xisuis ) +_o_p(1) −→L Γ(Wtk− 1 T1 T1 s=1W k s). (2.6)

By (2.5)-(2.6), we are about to construct a Wald test statistic for the null hypothesis

H0 :β =β0. We need an additional assumption: Assumption (d). T_t₌_T₁₊₁(_W_tk− _T1 1 _T₁ s=1Wsk)(Wk t − T11 _T₁ s=1Wsk) is positive deﬁnite a.s. 2

Assumption (d) is non-trivial. Consider the simple case that _T₁ = _T₂ = 1. If

W₁k =_W₂k a.s., the term (_W_tk−_T1

1 _T₁

s=1Wsk) is identically zero a.s. and Assumption

(d) is not satisﬁed. In fact Assumption (d) tells us that our method does not apply to one single time point. We require that _T ≥2.

Assumption (d) assures that a.s., [T_t₌_T₁₊₁(_N−1/2N_i₌₁_x_it_uˆ_it)(_N−1/2N_i₌₁_x_it_uˆ_it)]−1 exists for _N sufficiently large. Thus we may consider the following Wald test statis-tic: ˆ W =√_N( ˆ_β−_β₀)_Vˆ−1√_N( ˆ_β−_β₀)_, (2.7) where ˆ_V−1 = (_N−1T1 s=1Ni=1xisxis)[Tt=T1+1(N−1/2 _N i=1xituît)(N−1/2Ni=1xituît)]−1 (_N−1T1 s=1Ni=1xisxis).

Now we are able to state the major theorem in this section:

Theorem 2.2. Suppose Assumptions (a)-(d) hold. ˆ W −→L T1 s=1W k s [ T t=T1+1 (_W_tk− 1 T1 T1 s=1W k s)(Wk t − 1 T1 T1 s=1W k s )]−1 T1 s=1W k s. (2.8) 2

It should be noted despite the fact that a_CLT is assumed for _N−1/2N_i₌₁_x_it_u_it (see Assumption (b)), unlike the conventional Wald test, the limiting distribution is not _χ2_k even when _T₁ = 1. It is because unlike the usual estimator for the variance-covariance matrix of √_N( ˆ_β−_β), ˆ_V around (2.7) does not converge in probability

(7)

to a constant matrix. Instead, as with a term in Abadir and Paruolo (1997), it converges in distribution to:

M−1Γ[ 1 T12 T t=T1+1 (_W_tk− 1 T1 T1 s=1W k s)(Wk t − 1 T1 T1 s=1W k s )]ΓM−1 . (2.9)

As a result, the critical values from the limiting distribution in Theorem 2.2 needs to be simulated. Some critical values of certain special cases will be given in Section 5, after we discuss a general version of _OLS and an extension to_IV.

For a general null hypothesis _H₀ : _Rβ =_r₀, where _R is a _qxk- matrix with full row rank of _q and _r₀ is a _qx1- vector, _q≤_k. It is easy to see from (2.5) that under the null: √ N(_R_βˆ−_r₀)−→_L_RM−1Γ( 1 T1 T1 s=1W k s).

RM−1Γ_W_skis normally distributed with mean zero and variance_RM−1ΓΓ_M−1_R. Note there exists a_qxqpositive semi-deﬁnite matrix Λ such that ΛΛ =_RM−1ΓΓ_M−1_R. Abusing the notation, we can write:

√ N(_R_βˆ−_r₀)−→_L Λ( 1 T1 T1 s=1W q s), (2.10)

where _W_sq is a q-dimensional standard normal random vector. Similar to (2.9),

ˆ VR ≡ R(N−1 T1 s=1 N i=1xisx is)−1[ T t=T1+1 (_N−1/2 N i=1xit ˆ uit)(N−1/2 N i=1xit ˆ uit)] ·(_N−1 T1 s=1 N i=1xisx is)1R −→L RM−1Γ[ 1 T12 T t=T1+1 (_W_tk− 1 T1 T1 s=1W k s)(Wk t − 1 T1 T1 s=1W k s )]ΓM−1 R.

By the arguments similar to those for (2.10),

ˆ VR−→LΛ[ 1 T12 T t=T1+1 (_W_tq− 1 T1 T1 s=1W q s)(Wq t − 1 T1 T1 s=1W q s )]Λ. (2.11)

All in all, instead of Assumption (d), we impose the following condition which suﬃces for, a.s., the invertibility of ˆ_V_R.

(8)

Assumption (e). Λ deﬁned around (2.10) is positive deﬁnite. T_t₌_T₁₊₁(_W_tq− 1 T1 _T₁ s=1Wsq)(Wq

t − _T1₁ Ts=11 Wsq) is positive deﬁnite a.s. 2

Thus, for the general null hypothesis_H₀ :_Rβ =_r₀, we may consider the following Wald test statistic:

ˆ

WR=

√

N(_R_βˆ−_r₀)_Vˆ_R−1√_N(_R_βˆ−_r₀)_. (2.12) The limiting distribution of the test statistic is presented in the next theorem.

Theorem 2.3. Suppose Assumptions (a)-(c) and Assumption (e) hold.

ˆ WR −→L T1 s=1W q s [ T t=T1+1 (_W_tq− 1 T1 T1 s=1W q s)(Wq t − 1 T1 T1 s=1W q s )]−1 T1 s=1W q s. (2.13) 2

3 Unit Root Test and Test for Cointegration

In this section, we investigate a unit root test for the time series {_w_it}. Following the lines in Fuller (1996), we assume that for each _i, {_w_it} follows an _AR(_k). In other words, we have a linear regression model:

wit=xitβ+uit, (3.1)

where _x_it = (_w_it₋₁_, _w_it₋₁_{, . . . ,} _w_it₋_k₊₁), _t= 1_{, . . . , T} and _i= 1_{, . . . , N}.

In other words, the Augmented Dickey-Fuller test in this setting is simply testing

β1 = 0 or Rβ = 0, where R is a 1xk- vector with the ﬁrst element equals 1 and all

other elements equal 0.

If Assumption (a) holds, following the lines in Conley (1999), we may assume that some moment conditions and some mixing conditions hold for the cross-section series

{xituit}, t = 1, . . . , T. In other words, Assumptions (b)-(c) hold. In addition, if we

are willing to assume Assumption (e) holds, Theorem 2.3 applies. In fact, although the ﬁrst part of Assumption (e) (the part about Λ) is hard to check, it may be easy to justify Assumption (d), which implies the second part of Assumption (e). First

(9)

we let F_tN =_σ{_{. . . , u}N_t₋₁_{, u}_tN}, where _uN_t = (_u₁_t_{, . . . , u}_Nt). It is plausible to assume that for all _i= 1_{, . . . , N},

E[_uN_t |F_tN₋₁] = 0_. Thus, for all _{t, s}∈ T,_t =_s,

LimN→∞E[(N−1/2 N i=1xituit )(_N−1/2 N i=1xisuis )] = 0_, (3.2)

which justiﬁes the independence between the _W_tk’s and thus Assumption (d). With a similar setting, Levin and Lin (1992, 1993), Quah (1994) and Levin, Lin and Chu (2002) construct similar tests, with the assumption that both _N and _T go to inﬁnity. While their approaches may not be applicable to cases in which _T is rather small, more importantly, all the papers mentioned above assume that the data are identically and independently distributed across _i. The approach adopted in this paper dispenses with the assumption of independence across _i. Furthermore, it is our conjecture that with more elaborated analyses, the assumption of identical distribution in this paper can be relaxed, following the lines in Im, Pesaran and Shin (2003).

It is interesting to note that unlike the pure time-series analysis, the rate of convergence of this unit root test is the same as that of tests of other parameters

β2, . . . , βk. In this paper we assume thatT is ﬁxed and only N goes to inﬁnity. It is

unclear if we obtain the some results on convergence, when _T also goes to inﬁnity. Next we consider a test for cointegration among a (_k + 1)_x1- vector _w_it ≡ (_w_it₀_{, w}_it₁_{, . . . , w}_itk). Following the lines in Phillips and Durlauf (1986), as well as a long series of subsequent papers, we ﬁrst consider the following linear regression model:

wit0 =xitβ+uit, (3.3)

(10)

Presumably all the elements of_w_it are _I(1). Based on (3.3), oneform of testing for no cointegration can be cast as _H₀ :_β = 0. As a result, provided that Assump-tions (a)-(d) hold, the limiting distribution of the Wald test deﬁned in (2.7) (with

β0 = 0) can be found in Theorem 2.2. Having said that, as suggested in Phillips and

Durlauf (1986), it is difficult to justify (3.2) and thus likely the _W_tk’s in Theorem 2.2 are independent. Our results here are similar to many in the literature. See, for instance, Pedroni (2004). However, as we argue in the discussions on our unit root test, those papers are deficient in two aspects, namely, assuming that _T also goes to infinitely; and assuming that the data are independent across _i.

4 Generalization and Extension

In Section 2, we consider the case in which the time period is divided into two parts, the first part of which is used to estimate the parameter _β (the _OLS estimator is denoted as ˆ_β), while the second part is used to estimate the ”variance-covariance” matrix of ˆ_β. In this section, we first generalize our test to a more flexible case. Then we consider an _IV (instrumental variable) estimator.

Deﬁne T ≡ {1_{, . . . , T}}, where _T ≥ 2. Consider two subsets of T, T₁ and T₂, not necessarily disjoint. The numbers of elements inT₁ andT₂, denoted as #T₁ and #T₂ respectively, are non-zero.

Instead of the_OLS estimator in (2.4) above, we may consider an alternative one: ˆˆ β = ( s∈T1 N i=1xisx is)−1( s∈T1 N i=1xisyis )_. (4.1)

We have the following theorem for the limiting distribution of_βˆˆ.

Theorem 4.1. Suppose Assumptions (a)-(c) hold.

√ N(_βˆˆ−_β)−→_L_M−1Γ( 1 #T₁ s∈T1 W_sk)_. (4.2) 2

(11)

Correspondingly, we deﬁne the residual as: ˆ_ˆ

uit =yit−xitβ,ˆˆ

where _t= 1_{, . . . , T}. Correspondingly we replace Assumption (e) with the following:

Assumption (e’). Λ deﬁned around (2.10) is positive deﬁnite. _t_∈T₂(_W_tq −

1 #T1 s∈T1Wsq)(W q t − _#1_T₁ s∈T1Wq

s ) is positive deﬁnite a.s. 2

A Wald test statistic for the null hypothesis_H₀ :_Rβ =_r₀ is deﬁned as: ˆˆ WR= √ N(_R_βˆˆ−_r₀)_Vˆˆ_R−1√_N(_R_βˆˆ−_r₀)_, (4.3) where_Vˆˆ_R =_R(_N−1_s_∈T₁N_i₌₁_x_is_x_is)−1[_t_∈T₂(_N−1/2N_i₌₁_x_it_uˆˆ_it)(_N−1/2N_i₌₁_x_itˆ_uˆ_it)] (_N−1_s_∈T₁N_i₌₁_x_is_x_is)−1_R.

The limiting distribution of Wˆˆ is presented in the next theorem.

Theorem 4.2. Suppose Assumptions (a)-(c) and Assumption (e’) hold. ˆˆ W −→L s∈T1 W_sq[ t∈T2 (_W_tq− 1 #T₁ s∈T1 W_sq)(_W_tq − 1 #T₁ s∈T1 W_sq)]−1 s∈T1 W_sq. (4.4) 2

Next we consider a Wald test derived from an _IV estimator. Instead of the

OLS estimator in (4.1) above, suppose we have an instrument _z_it, which is also a

kx1-vector. Deﬁne the following _IV estimator: ˜ β = ( s∈T1 N i=1zisx is)−1( s∈T1 N i=1zisyis )_. (4.5)

Correspondingly, we replace Assumptions (b) and (c) with the followings:

Assumption (b’). For _t= 1_{, . . . , T}, N−1/2 N i=1zituit −→L Γ_W_tk_, (4.6)

where Γ is a positive deﬁnite matrix and_W_tk is a_k−_dimensional standard normal random vector. 2 Assumption (c’). For _t= 1_{, . . . , T}, N−1 N i=1zitx it →Ma.s., (4.7)

(12)

where _M is an _kxk- invertible constant matrix. 2

We have the following theorem for the limiting distribution of ˜_β.

Theorem 4.3. Suppose Assumptions (a), and Assumptions (b’)-(c’) hold.

√ N( ˜_β−_β)−→_L_M−1Γ( 1 #T₁ s∈T1 Wsk). (4.8) 2

Deﬁne the residual as:

˜

uit =yit−xitβ,˜

where _t = 1_{, . . . , T}. A Wald test statistic for the null hypothesis _H₀ : _Rβ = _r₀ is deﬁned correspondingly as:

˜ WR= √ N(_R_β˜−_r₀)_V˜_R−1√_N(_R_β˜−_r₀)_, (4.9) where ˜_V_R=_R(_N−1_s_∈T₁_iN₌₁_x_is_z_is )−1[_t_∈T₂(_N−1/2N_i₌₁_z_it_u˜_it)(_N−1/2N_i₌₁_z_it_u˜_it)] (_N−1_s_∈T₁N_i₌₁_z_is_x_is)−1_R.

The limiting distribution of ˜W_R is presented in the next theorem.

Theorem 4.4. Suppose Assumption (a), Assumptions (b’)-(c’), and Assump-tion (e’) hold.

˜ WR−→L s∈T1 Wsq[ t∈T2 (_W_tq− 1 #T₁ s∈T1 Wsq)(Wq t − _#1_T 1 s∈T1 Wsq)]−1 s∈T1 Wsq. (4.10) 2

It should be noted that the limiting distribution in Theorem 4.4 is the same as that in Theorem 4.2, which is free from the nuisance parameters Γ and _M. Nevertheless, it depends on the dependence between the _W_tq’s as well as the way we choose T₁ and T₂. In the next section, assuming that the _W_tq’s are independent across _t,_q = 1_,2, we tabulate the critical values of the following two cases: (i) The disjoint case. More precisely, T₁ ={1_{, . . . , T}₁} and T₂ ={_T₁ + 1_{, . . . , T}₁ +_T₂}; (ii) The fully overlapping case. More precisely, T₁ =T₂ =T.

(13)

5 Simulating the Critical Values

TABLE 5.1

Quantiles of the Limiting Distribution in (5) or (8), k = 1.

α−th simulated quantiles T rv .800 .900 .950 .980 .990 2 DISJ 2.806 10.502 40.500 267.384 1063.563 OV ER 18.948 79.502 320.144 2118.335 8564.449 2z₁2 18.948 79.733 322.885 2025.152 8104.427 3 DISJ 9.273 36.517 147.250 947.310 3452.401 OV ER 5.375 12.882 27.866 74.468 151.616 3 2z22 5.335 12.790 27.774 72.767 147.758 4 DISJ 1.775 3.593 7.110 17.652 35.444 OV ER 3.579 7.386 13.491 27.004 44.591 4 3z32 3.577 7.382 13.500 27.494 45.490 5 DISJ 3.225 6.918 14.079 34.709 70.060 OV ER 2.927 5.680 9.597 17.531 26.328 5 4z42 2.938 5.682 9.633 17.550 26.496 6 DISJ 1.639 2.906 4.834 9.117 14.410 OV ER 2.625 4.853 7.914 13.631 19.782 6 5z52 2.614 4.872 7.932 13.588 19.508 7 DISJ 2.423 4.411 7.426 14.067 22.642 OV ER 2.412 4.410 6.974 11.525 15.841 7 6z62 2.419 4.404 6.986 11.525 16.032 8 DISJ 1.627 2.719 4.119 7.006 10.005 OV ER 2.233 3.978 6.058 10.145 13.845 8 7z72 2.288 4.104 6.392 10.272 13.992 9 DISJ 2.156 3.693 5.742 9.869 15.029 OV ER 2.177 3.827 5.812 9.132 12.295 9 8z82 2.195 3.892 5.982 8.858 12.663 10 DISJ 1.615 2.637 3.957 6.100 8.419 OV ER 2.090 3.694 5.617 8.636 11.601 10 9z92 2.125 3.733 5.685 8.842 11.736 20 DISJ 1.614 2.570 3.607 5.020 6.104 OV ER 1.835 3.073 4.518 6.654 8.273 20 19z192 1.856 3.147 4.611 6.786 8.616 30 DISJ 1.600 2.560 3.577 4.964 6.215 OV ER 1.761 2.963 4.232 6.082 7.589 30 29z292 1.778 2.986 4.326 6.270 7.857 50 DISJ 1.602 2.572 3.676 5.066 6.084 OV ER 1.715 2.854 4.108 5.852 7.247 50 49z492 1.722 2.868 4.121 5.902 7.329 100 DISJ 1.627 2.643 3.732 5.208 6.266 OV ER 1.667 2.782 3.981 5.692 6.824 100 99z992 1.681 2.785 3.977 5.648 6.968 121 DISJ 1.643 2.721 3.846 5.249 6.173 OV ER 1.670 2.792 4.023 5.702 7.004 121 120z1202 1.652 2.772 3.953 5.606 6.906 χ2 1 1.642 2.706 3.841 5.412 6.635

(14)

6 Monte Carlo experiments

TABLE 6.1(a)

Rejection Percentage under _H₀ :_β₁ = 0, _ρ= 0. Size T T est 10% 5% 1% 2 _DISJ 10.00 4.65 0.75 OV ER 9.85 4.65 0.70 W HIT E 11.45 6.95 1.75 10 _DISJ 10.00 5.05 1.00 OV ER 10.15 4.80 0.90 W HIT E 10.65 4.80 1.25 50 _DISJ 10.45 5.15 0.90 OV ER 10.05 4.25 1.05 W HIT E 9.95 5.15 0.75 TABLE 6.1(b)

Rejection Percentage under_H₀ :_β₁ = 0, _ρ= 0_.5. Size T T est 10% 5% 1% 2 _DISJ 9.75 5.60 1.35 OV ER 9.65 5.35 1.35 W HIT E 22.35 15.85 5.80 10 _DISJ 10.80 5.65 1.05 OV ER 10.40 5.60 1.55 W HIT E 20.65 13.20 4.55 50 _DISJ 11.25 5.75 1.45 OV ER 11.30 5.20 1.25 W HIT E 20.80 13.55 4.85

(15)

TABLE 6.1(c)

Rejection Percentage under_H₀ :_β₁ = 0, _ρ= 0_.9. Size T T est 10% 5% 1% 2 _DISJ 12.45 6.45 1.10 OV ER 11.85 6.05 1.00 W HIT E 62.05 55.45 44.30 10 _DISJ 12.10 7.40 2.25 OV ER 11.35 5.85 1.65 W HIT E 58.00 51.70 39.60 50 _DISJ 11.05 6.10 2.10 OV ER 10.75 5.80 1.35 W HIT E 57.30 49.70 36.70 TABLE 6.2(a)

Rejection Percentage under _H_a :_β₁ = 0_.1, _ρ= 0. Size T T est 10% 5% 1% 2 _DISJ 15.10 7.35 1.35 OV ER 14.60 7.15 1.40 W HIT E 29.50 20.00 7.80 10 _DISJ 49.25 37.65 14.95 OV ER 67.70 52.80 24.35 W HIT E 73.50 62.65 39.40 50 _DISJ 95.70 92.90 83.75 OV ER 99.95 99.90 98.65 W HIT E 99.95 99.95 99.20

(16)

TABLE 6.2(b)

Rejection Percentage under _H_a :_β₁ = 0_.5, _ρ= 0. Size T T est 10% 5% 1% 2 _DISJ 56.05 31.45 5.60 OV ER 57.20 31.15 5.45 W HIT E 99.95 99.65 99.05 10 _DISJ 100.00 100.00 99.50 OV ER 100.00 100.00 100.00 W HIT E 100.00 100.00 100.00 50 _DISJ 100.00 100.00 100.00 OV ER 100.00 100.00 100.00 W HIT E 100.00 100.00 100.00 TABLE 6.2(c)

Rejection Percentage under _H_a :_β₁ = 0_.9, _ρ= 0. Size T T est 10% 5% 1% 2 _DISJ 80.25 52.60 11.50 OV ER 83.80 53.30 11.20 W HIT E 100.00 100.00 100.00 10 _DISJ 100.00 100.00 100.00 OV ER 100.00 100.00 100.00 W HIT E 100.00 100.00 100.00 50 _DISJ 100.00 100.00 100.00 OV ER 100.00 100.00 100.00 W HIT E 100.00 100.00 100.00

(17)

7 Conclusions and Discussions

In this paper, we propose a Wald test for the parameters in a linear regression model, in which the correlations between _N cross-sectional units (where_N goes to infinity) are hard to model, and are hard to estimate. We need _T time-series unit, where _T is fixed and in principle it may be as small as 2. This is in contrast to cases in the literature that either (i) the correlations between the cross-sectional units do not exist, see, for instance, Anderson (1978), Anderson and Hsiao (1981), Holz-Eakon, Newey and Rosen (1988), and Moon and Phillips (1999); (ii) the cross-sectional correlations are modelled with the geographicaldistance (see, for instance, Kelejian and Prucha, 1999 and the references in the field of spatial statistics therein) or the

economic distance proposed by Conley (1999); or (iii) _T also goes to inﬁnity, see, for instance, Kao (1999), Bai and Ng (2002), and Bai (2003).

In Section 3, we also consider a unit root test and a test for cointegration when we have data in both the time-series dimension and the cross-sectional dimension. Both topics are overwhelming in the ﬁeld of economics over the past ten years. See, for instance, Levin and Lin (1992, 1993), Quah (1994), Levin, Lin and Chu (2002), Im, Pesaran and Shin (2003), and Pedroni (2004), all of which assume that both _N and _T go to inﬁnity. As in the aforementioned papers, in Section 3 of this paper, these two tests are found to be special cases of the Wald test in the linear regression model developed in Section 2. On the other hand, we extend our _OLS estimation to _IV (instrumental variable) estimation in Section 4.

The method proposed in this paper has an interesting analogy with the classical

z−test for the population mean. Consider a special case of Model (1.1), in which

k = 1, _x_it = 1 and T₁ =T₂ =T ={1_{, . . . , T}}:

yit =β+uit. (7.1)

(18)

get: vNt= √ Nβ+_N−1/2 N i=1uit, (7.2)

where _v_Nt ≡_N−1/2N_i₌₁_y_it. Given Assumption (b), for _t= 1_{, . . . , T},

vNt− √ N β =_N−1/2 N i=1uit −→L Γ_W_t1_, (7.3)

where Γ2 = _Lim_N_→∞_E(_N−1/2N_i₌₁_u_it)2. Refer to the Wald test statistic in (2.7). First, we re-write the null hypothesis as _H₀ :√_{N β} =√_{N β}₀. Second, it is easy to see that: √ Nβˆ = 1 T T s=1vNs ≡ ¯ vN, ˆ V = 1 T2 T t=1 (_v_Nt−_v¯_N)2_.

And if we further assume the _W_t1 in (7.3) be identically distributed, by Theorem 4.1 (for the case when _k = 1), as_N → ∞,

√ T(¯_v_N −√_{N β}₀) _T t=1(vNt−¯vN)2 = T T −1 (¯_v_N −√_{N β}₀) _T t=1(vNt−v¯N)2/(T −1) −→L T T −1tT−1, (7.4) where _t_T₋₁ denotes a random variable which is _t distributed with _T −1 degrees of freedom.

In spite of the analogy, hypotheses with the linear regression model in (1.1) is much more general than a classical _z −_test. In the further study, we will extend Model (1.1) to cases with more heterogeneity. In particular, we will allow some fixed effects, random effects, random coefficients (see, for instance, Hsiao, 2003), and common factors (see, for instance, Bai, 2003).

In a classical _z−_test, as _T → ∞, the limiting distribution is standard normal. It is interesting to derive the limiting distribution of our Wald test statistic, when apart from _N → ∞, #T₁ and/or #T₂ also do. Furthermore, it has been shown

(19)

that under certain assumptions, the classical _z −_test is optimal in testing for the population mean. It is also interesting to see the optimality results, especially those refer to the choice of T₁ and T₂.

(20)

REFERENCES

Abadir, K.M., Paruolo, P., 1997. Two Mixed Normal Densities from Cointegration Analysis. Econometrica, 65, 671-680.

Anderson, T.W., 1978. Repeated Measurement on Autoregressive Processes. Jour-nal of the American Statistical Association, 73, 371-378.

Anderson, T.W., Hsiao, C., 1981. Estimation of Dynamic Models with Error Com-ponents. Journal of the American Statistical Association, 76, 598-606.

Bai, J., 2003. Inference on Factor Models of Large Dimension. Econometrica, 71, 135-171.

Bai, J., Ng, S., 2002. Determining the Number of Factors in Approximate Factor Models. Econometrica, 70, 191-221.

Bolthausen, E., 1982. On the Central Limit Theorem for Stationary Mixing Ran-dom Fields. The Annals of Probability, 10, 1047-1050.

Conley, T.G., 1999. GMM Estimation with Cross Sectional Dependence. Journal of Econometrics, 92, 1-45.

Fuller, W.A., 1996. Introduction to Statistical Time Series, 2nd Edition. New York: Wiley.

de Jong, R.M., Davidson, J., 2000. Consistency of Kernel Estimators of Het-eroscedastic and Autocorrelated Covariance Matrices. Econometrica, 68, 407-423.

Holz-Eakon, D., Newey, W., Rosen, H., 1988. Estimating VARs with Panel Data.

Econometrica, 56, 1371-1395.

Hsiao, C., 2003. Analysis of Panel Data, 2nd Edition. New York: Cambridge University Press.

Im, K., Pesaran, H., Shin, Y., 2003. Testing for Unit Roots in Heterogeneous Panels. Journal of Econometrics, 115, 53-74.

Kao, C., 1999. Spurious Regression and Residual-Based Tests for Cointegration in Panel Data when the Cross-Section and Time Series Dimensions are Compa-rable. Journal of Econometrics, 90, 1-44.

(21)

Kelejian, H.H., Prucha, I.R., 1999. A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model. International Economic Review, 40, 509-533.

Kotz, S., Balakrishnan, N., Johnson, N.L., 2000. Continuous Multivariate Distri-butions, New York: Wiley.

Levin, A., Lin, C.-f., 1992. Unit Root Tests in Panel Data: Asymptotic and Finite-Sample Properties. UCSD Department of Economics Discussion Paper 92-23, downloadable from http://www.econ.ucsd.edu/papers/dp93.html#92-23. Levin, A., Lin, C.-f., 1993. Unit Root Tests in Panel Data: New Results. UCSD

Department of Economics Discussion Paper 93-56, downloadable from http://www.econ.ucsd.edu/papers/dp93.html#93-56.

Levin, A., Lin, C.-f., Chu, C.-s., 2002. Unit Root Tests in Panel Data: Asymptotic and Finite-Sample Properties. Journal of Econometrics, 108, 1-24.

Newey, W.K., West, K.D., 1987. A Simple, Positive Semi-deﬁnite, Heteroskedas-ticty and Autocorrelation Consistent Covariance Matrix. Econometrica, 55, 703-708.

Pedroni, P., 2004. Panel Cointegration: Asymptotic and Finite Sample Proper-ties of Pooled Time Series Tests with an Application to the PPP Hypothesis.

Econometric Theory, 20, 597-625.

Phillips, P.C.B., Durlauf, S.N., 1986. Multiple Time Series Regression with Inte-grated Processes, Review of Economic Studies, 53, 473-495.

Phillips, P.C.B., Moon, H., 1999. Linear Regression Limit Theory for Nonstation-ary Panel Data. Econometrica, 67, 1057-1112.

Quah, D., 1994. Exploiting Cross-Section Variation for Unit Root Inference in Dynamic Data. Economics Letters, 44, 9-19.

White, H., 1980. A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica, 48, 817-838.