Panel VAR Models with Spatial Dependence
¤
Jan Mutl
Graduate Assistant, University of Maryland College Park
March 20, 2002
Abstract
I consider a panel vector autoregressive (panel VAR) model with cross-sectional dependence of the model disturbances that can be characterized by a …rst order spatial autoregressive process. I derive asymptotic proper-ties of a constrained maximum likelihood estimator that uses a consistent estimate of the degree of the spatial autocorrelation to concentrate the like-lihood function. The asymptotic properties are derived taking the time dimension of the panel as …xed and letting the cross-sectional dimension tend to in…nity.
1. Introduction
Vector autoregressive (VAR) models are extensively used in econometric appli-cations in a wide variety of …elds. The extension to panel data represents an interesting challenge due to the likely presence of cross-sectional heterogeneity. I consider a panel VAR model with …xed time dimension T and derive asymptotic properties of a proposed estimation procedure with respect to the cross-sectional dimension N . When the cross-sectional dimension is …xed, one has to parsimo-niously parameterize the correlations across cross-sectional units in order to avoid the incidental parameters problem. In this paper I follow the spatial economet-rics literature and study a …rst order spatial autocorrelation model with a known
¤I am grateful to Michael Binder, Harry Kelejian and Ingmar Prucha for helpful advice and
comments. However, all errors and omissions are my responsibility. Correspondence address: Department of Economics, University of Maryland, Tydings Hall, College Park, MD 20742, Email: [email protected].
spatial weighting matrix. The panel spatial autocorrelation model is a general-ization of the single cross-section models that include the single equation models, e.g., Cli¤ and Ord (1973, 1981), and the simultaneous equation models, such as Whittle (1954), Anselin (1988) or Kelejian and Prucha (1998, 1999 and 2001).
On the other hand, the model extends the panel VAR literature to allow for cross-sectional dependence of the model disturbances; for models with homoge-neous disturbances see, e.g., Binder et al. (2001) for the quasi maximum likelihood (QML) and minimum distance (MD) estimators or Arellano and Bond (1991), Ahn and Schmidt (1995) or Arellano and Bover (1995) for the generalized method of moments (GMM) approach.
Existing extensions of panel models for cross-sectional dependence of the model disturbances include a generalized least squares test to test for unit roots in a panel data (although without deriving any asymptotic properties of the estima-tor) in O’Connell (1998), a two-step sieve least squares procedure to estimate a panel VAR model with a nondiagonal cross-sectional covariance matrix that is proportional to an observed economic distance measure in Chen and Conley (2000) who look at asymptotics when the cross-sectional dimension is …xed, and, …nally, Chang (2001) who derives asymptotic properties of a univariate panel model with a general unrestricted form of cross-sectional heterogeneity when the cross-sectional dimension of the panel is also …xed. This approach is comple-mentary to the present paper which considers asymptotics with respect to the cross-sectional dimension and keeping the time dimension …xed.
2. The Panel VAR Model
In this section I specify the model and discuss the main assumptions that will be maintained in the consistency proofs. The speci…cation adopted here follows the spatial autoregressive framework with known spatial weighting matrix. In such models the correlation across agents is conveniently parameterized with only one parameter. The the model can be expressed as
yit = (Im¡ ©)¹i+ ©yi;t¡1+ uit (1) uit = ¸ N X j=1 wij;tujt+ "it (2)
where the …rst subscript i²f1; ::; Ng refers to the cross-sectional dimension and the second subscript t²f1; ::; T g refers to the time dimension of the panel of
obser-vations fyitg1·i·N1·t·T. I also allow the model to contain more than one equation and
so the observations yit, the individual-speci…c e¤ects ¹i and the disturbances uit
and "it are m £ 1 vectors and the known weighting parameters wij, the unknown
model parameters © and the identity matrix Im are m £ m matrices. The degree
of spatial autocorrelation is captured by the scalar parameter ¸. Stacking across individuals we obtain
yt = (IN [Im¡ ©])¹ + (IN ©)yt¡1+ ut (3) ut = ¸Wtut+ "t (4) where yt = 0 B @ y1t .. . yN t 1 C A mN £1 ¹= 0 B @ ¹1 .. . ¹N 1 C A mN £1 (5) ut= 0 B @ u1t .. . uN t 1 C A mN £1 "t = 0 B @ "1t .. . "N t 1 C A mN £1
and the mN £ mN weighting matrix Wt is
Wt= 0 B @ w11;t ¢ ¢ ¢ w1N;t .. . . .. ... wN 1;t ¢ ¢ ¢ wN N;t 1 C A mN £mN (6)
Solving for the disturbance terms yields
ut= (ImN¡ ¸Wt)¡1"t (7)
To facilitate identi…cation of the model, I assume that there is no spatial cor-relation across equations, that is each m £ m matrix wij is diagonal. However,
the model allows for contemporaneous correlation across equations in di¤erent cross-sections because the variance-covariance matrix of the error terms "it is left
unrestricted.1
1There is cross-equation correlation for a single cross-section and since the cross-sections are
spatially correlated, the error terms in di¤erent equations for di¤erent cross-sections will be contemporaneously correlated.
2.1. Random vs. Fixed E¤ects Speci…cation
Allowing for individual e¤ects without any additional restrictions, such as random or …xed e¤ects speci…cation, leads to an incidental parameters problem. As the time dimension of the panel is …xed, one cannot consistently estimate a general form of the individual-speci…c e¤ects with a …nite number of observations per parameter. To resolve this problem, there are two options. Either to assume that there is a well-behaved distribution (e.g. with …nite fourth moments) from which the individual-speci…c e¤ects are generated (the random e¤ects speci…ca-tion), or transform the data to obtain speci…cation that does not contain the individual-speci…c e¤ects (the …xed e¤ect speci…cation). The usual approach in the …xed e¤ect speci…cation is to …rst-di¤erence the data; see the argument in Hsiao, Pesaran and Tahmiscioglu (2001) who show in a univariate context that the QML estimator is invariant to the choice of the transformation matrix that eliminates the individual-speci…c e¤ects. The argument is readily extended to the multivariate setting in this paper. However, the …xed e¤ect speci…cation and …rst-di¤erencing does not eliminate the incidental parameter problem unless we assume that the spatial weighting matrices are constant over time. Hence the choice between …xed and random e¤ects speci…cation depends on which of the two assumptions (constant weighting matrix or existence of the distribution that generates the individual-speci…c e¤ects) is more appropriate.
In this paper I work out the case of …xed e¤ects with constant spatial weight-ing matrix. However, the extension to random e¤ects with time varyweight-ing spatial weighting matrix is straightforward.
2.2. Initial Disturbances Speci…cation
Instead of conditioning on initial observations, I explicitly treat the initial con-ditions when de…ning the likelihood function. There are several assumptions one can make. The least restrictive case is worked out in this paper. Denote the vector of initial model disturbances as
u0 = y0 ¡ ¹ (8)
I assume that u0 is spatially correlated and is generated by
u0 = ¸Wu0+ » (9)
where » is an N £ 1 vector of independently and identically distributed (in N) initial random disturbances.
Hence the initial observations are
¢y1 = u1¡ [IN (Im¡ ©)]u0 (10)
= (IN ¡ ¸W)¡1("1¡ [IN (Im¡ ©)]»)
Notice that this implies that
("1¡ [IN (Im¡ ©)]») = (IN ¡ ¸W)¢y1 (11)
We assume that the initial disturbances are independent of subsequent error terms. We use the notation
var("i1¡ (Im¡ ©)»i) = ª and var("it) = " (12)
Given that © 6= Im, have that
ª= "+ [IN (Im¡ ©)]var(»i)[IN (Im ¡ ©)] (13)
and hence ª is unconstrained.
In general, if the eigenvalues of © are inside the unit circle, one could make further assumptions on the » disturbances and express ª in terms of © and ".
In particular, since in this case the data generating process is stationary and, therefore, one could assume that it has started in an in…nite past. This implies that the initial observations y0 are drawn from the limiting stationary distribution
of the process, e.g. that:
y0 = 1 X j=0 [IN ©]j¡1"¡j (14) Therefore ¢y1 = (IN ¡ ¸W)¡1 1 X j=0 [IN ©]j¡1¢"¡j (15) and
var("1¡ [IN (Im¡ ©]») = var ([IN ¡ ¸W]¢y1) (16)
= var à 1 X j=0 [IN ©]j¡1¢"¡j ! = var à "0+ (IN [Im¡ ©]) 1 X j=0 [IN ©]j"¡j¡1 ! = "+ (IN [Im¡ ©]) à 1 X j=1 ©j(IN ")©0j ! (IN [Im¡ ©])0
Such assumption complicates the algebra and we leave this for further exten-sions of our model. In the following we treat vechª as a vector of additional free parameters.
2.3. Maintained Assumptions
To be able to derive the asymptotic properties of the model I make the following assumptions about the disturbances and the spatial weighting matrices.
Assumption 1. The disturbance vector "it is identically and independently
distributed with zero mean, …nite positive-de…nite variance matrix " and …nite
fourth moments.
The above assumption is needed to ensure that the observable data, which is a transformation of the "it process, has a well-de…ned asymptotic properties.
The following assumption is necessary for identi…cation of the model:
Assumption 2. The diagonal elements of each Wt are zero and each wit
matrix is diagonal.
The next two assumptions ensure that the weighting matrices do not ’explode’ as the sample size increases.
Assumption 3. The matrices (ImN¡ ¸Wt) are nonsingular for all j¸j < 1.
Assumption 4. The row and column sums of the matrices Wt and (IN ¡
¸Wt)¡1 are bounded uniformly in absolute value.
3. Estimation
The model can be estimated using a variety of approaches. Straightforward least squares estimation of the …rst di¤erences of the observations on its lagged values is not consistent because the error term ¢ut is correlated with the explanatory
variable ¢yt¡1. However, I show that an instrumental variable (IV) estimation leads to a consistent estimates of the spatially correlated disturbances. We can then use a method of moments estimation and obtain a consistent estimator of the spatial parameter ¸; e.g. use the moment conditions based on the estimated disturbances:
¢^ut= ¢yt¡ (IN ^©IV)¢yt¡1 (17)
where ^©IV is the IV estimators of ©. I show that this two stage procedure leads
to a consistent estimator of ¸. Kelejian and Prucha (1999) show consistency of a similar two stage procedure for model with spatial lags in both the dependent variable as well as the error term.
Finally, we can use the spatial Cochrane-Orcutt transformation and write the model as
(ImN ¡ ¸W)¢yt = (ImN ¡ ¸W)(IN ©)¢yt¡1+ ¢"t (18)
If ¸ is known, the transformed model can be estimated with standard techniques, such as the QML method in Binder, Hsiao, Mutl and Pesaran (2002) or the GMM approach as in Arellano and Bond (1991), Ahn and Schmidt (1995) or Arellano and Bover (1995). However, since ¸ has to be estimated, we need to prove that it is a nuisance parameter.
In the following, I …rst de…ne the IV estimator and show that it produces con-sistent estimates of the disturbance terms. I then discuss the moments estimator of the spatial parameter. Finally, I de…ne the full as well as the constrained QML procedures and show that the spatial parameter is a nuisance parameter.
3.1. Instrumental Variable Estimation
To be able to de…ne the IV estimator, it turns out to be convenient to stack the model di¤erently. Our model is:
¢yit= ©¢yi;t¡1+ ¢uit (19)
where ¢yit and ¢uit are m £ 1 vectors. After taking transpose and staking the
observations at di¤erent times for a given cross-section, we have 0 B @ ¢y0 i1 .. . ¢y0 iT 1 C A T £m = 0 B @ ¢y0 i0 .. . ¢y0i;T ¡1 1 C A T £m ©0m£m+ 0 B @ ¢u0 i1 .. . ¢u0 iT 1 C A T £m (20)
or with the obvious notation
¢Yi = ¢Yi;¡1©0+ ¢Ui (21)
Stacking the cross-sections yields
¢Y = ¢Y¡1©0+ ¢U (22) where ¢Y = (¢Y0
1; :::;¢YN0 )0, ¢Y¡1 = (¢Y01;¡1; :::;¢Y0N;¡1)0and ¢U = (¢U01; :::;¢U0N)0.
We de…ne the IV estimator of © as ^
©IV =
h ^
where ^Z= PH¢Y with PH = H(H0H)¡1H0 where H is vector of instruments used
for ¢Y¡1. I suggest the use of the instruments H = Y¡2 = (Y0
1;¡2; :::; YN;¡20 )0
where Yi;¡2 = (yi;¡1; :::; yi;T ¡2)0. However, any instruments that satisfy the fol-lowing conditions, lead to consistent estimates of the spatially correlated distur-bances:
Assumption 5. The instrument matrix H has a full column rank. Assumption 6. The instruments satisfy the following:
1. p limN1H0H= Q
HH where QHH is …nite and nonsingular;
2. p lim 1
NH0¢Y = QHY where QHY is …nite and has a full column rank.
3. The instruments H can be expressed as H = F(&1; ::; &m) where each &i is
a N T £ 1 vector of identicaly and independently distributed random vari-ables and F is an N T £ NT nonstochastic absolutely summable matrix. Furthermore, each &i is independent of "it.
The …rst two assumptions guarantee that the instruments are not degenerate and that they are asymptoticaly correlated with the variables they replace. The last assumption implies that the instruments are not correlated with the error terms and that our central limit theorem can be applied. Given these additional assumptions we can assert that the IV estimation produces consistent estimates: Proposition 1. Given the setup and assumptions 1-6, the IV estimator is con-sistent and the rate of convergence is N¡1=2; that is ^©IV = © + Op(N¡1=2).
Remark: The rate of convergence is important for consistency of estimation ¸ (the degree of spatial correlation in the residuals) in the the second step of the procedure.
Proof of proposition 1.
Substituting for the instruments in the model yields ^ ©IV = © + h ^ Z0Z^i¡1Z^0¢U (24) = © +hZ^0Z^i¡1¢Y0H(H0H)¡1H0¢U
To show consistency we prove that p lim 1
N ^
where QZZ, is …nite and nonsingular and provide a central limit theorem (CLM)
for the remaining N¡1=2H0¢U term. Together, these results imply that
N1=2( ^©IV ¡ ©) ! N(0; Q) (26)
in distribution; where
Q= Q¡1ZZQ0HYQHYQ¡1ZZ p lim
N !1tr(¢U¢U
0) (27)
The existence and nonsingularity of the Qxx matrices is due to the correct
choice of instruments. The CLM utilized here is a modi…cation of the CLM for quadratic forms of triangular arrays given, for example, in Kelejian and Prucha (2001) or in Pinkse (1999).
Theorem 2. Central Limit Theorem (CLM)
Let &1 and &2 be two vectors each consisting of n independent and identically
distributed zero mean random disturbances (with …nite 4th moments), furthermore let &1 and &2 be independent of each other (hence expected value of the quadratic
form is zero). Let An be an n£n nonstochastic absolutely summable matrix. Then
&0 1An&2 var(&0 1An&2) ! N(0; 1) (28) in distribution.
To be able to apply the CLM we …rst express the instrument as in assumption 6 and then apply the above CLM to each element of H0¢U separately. We have
H0¢U = (&1; ::; &m)0F0¢U (29)
= (&1; ::; &m)0F0(¢U01; :::;¢U0N)0
Hence ij ¡ th element of H0¢U is of the form
&0iF0 0 B @ ¢U1 .. . ¢UN 1 C A :;j = &0iF0T(ImN ¡ ¸W)"t (30)
where T is a …nite and absolutely summable tranformation matrix whose elements depend of only on N . Since elements of "it are independent of the elements &j,
the conditions of our CLM are met and N¡1=2H0¢U covnverges in distribution.
QED.
Note that our suggested instruments meet the required conditions. By back-ward substitution we can eliminate the lagged dependent variables and express the instruments as a function of lagged disturbance terms and lagged explanatory variables. It is easily veri…ed that
¢yt= (ImN ¡ ¸W)¡1 Ãt¡2 X j=0 (IN ©)j¡1¢"t¡j+ (IN ©)t¡1["1¡ (I ¡ ©)»] ! (31) and hence we have that
H = (Y01;¡2; :::; YN;¡20 )0 (32) = F("1¡ (I ¡ ©)»; ¢"2; :::; ¢"T ¡2)0 (33) where F= [IT (IN ¡ ¸W0)]¡1 £ IT (IN ¡ ¸W)¡1 ¤
Our assumptions on the spatial weighting matrices imply that the N £ N matrix F is absolutely summable.
3.2. Estimation of ¸
The second step in the proposed estimation procedure is to use moment conditions based on the estimated disturbances:
¢^ut= ¢yt¡ (IN ^©IV)¢yt¡1 (35)
where ^©IV is the IV estimators of ©. Kelejian and Prucha (1999) show consistency
of a similar two stage procedure for model with spatial lags in both the dependent variable as well as the error term. The conditions of their theorem 2 are met in the present setup and hence their moment estimator produces a consistnet estimate of the spatial parameter ¸. In the appendix, show that ¸ is a nuisance parameter in a model with m = 1. Generalization to m > 1 is involves somewhat tedious notation and is omitted here. To demonstrate that ¸ is nuisance, I show that the o¤-diagonal elements of the Hessian of the likelihood function, corresponding to the parameter ¸, are op(1).
3.3. Quasi Maximum Likelihood (QML) Estimation
The likelihood function for the panel VAR model is easily derived under the as-sumption that "it » N(0; ") where " is the m £ m variance-covariance matrix
of "ti. I specify the exact distribution of the initial observations as in Binder et
al. (2001) and derive the QML function taking this into account. We can de…ne the mN (T + 1) £ 1 vector ¢´ = 0 B B B @ ¢y0 ¢"1 .. . ¢"T 1 C C C A (36)
We then have that E(¢´) = 0 and var(¢´) = §¢´ where
§¢´ = 0 B B B @ ª ¡" 0 ¡" 2" . .. ¡" 0 ¡" 2" 1 C C C A IN (37) with ª being a m £ m symmetric matrix of parameters. This speci…cation leaves the variance-covariance matrix of the initial observations unrestricted - e.g. there are m(m + 1)=2 free parameters.
The likelihood function for the entire sample is then LN(µ) = const ¡ N 2 ln j§¢´j + ln ¯ ¯ImN (T +1)¡ ¸ ¹W ¯ ¯ (38) ¡N2tr£R0(ImN (T +1)¡ ¸ ¹W)§¡1¢´(ImN (T +1)¡ ¸ ¹W0)RSN ¤
where µ = (vechª0;vech0
"; vec©0) is the vector of parameters. The mN (T + 1) £
mN (T + 1) observable time-invariant spatial weighting matrix ¹W is
¹ W= 0 B B B @ ImN 0 ¢ ¢ ¢ 0 0 W1 .. . . .. ... 0 ¢ ¢ ¢ WT 1 C C C A (39)
The mN (T + 1) £ mN(T + 1) matrix R is de…ned as R= IN 0 B B B @ Im 0 ¡© Im . .. 0 ¡© Im 1 C C C A (40)
and the matrix SN is
SN = IN 1 N N X i=0 ¢yi¢yi0 (41)
with ¢yi = (¢y0i0; :::; ¢y0iT)0 being the vector of the …rst di¤erences of the
obser-vations for the i-th cross-section. 3.3.1. Computational Issues
The computation of the likelihood function should exploit the structure of the [IT (IN ¡ ¸W)] and §¢´ matrices when evaluating their determinants and
inverses. In particular, we can express §¢´ as
§¢´ = µ ª (A1 ") (A0 1 ") (A2 ") ¶ (42) where A1 and A2 are matrices of constants. The inverse of §¢´ is then
§¡1¢´ = µ D¡1 ¡D¡1(A 1A¡12 ") (A¡12 A01 ")D¡1 D¡1¡ (A¡12 A01 ")D¡1(A1A¡12 ") ¶ (43) where D = ª ¡ (A1A¡12 A1 "). 3.4. Constrained QML Estimation
Although the QML estimation based on the likelihood function (38) is feasible2, it is extremely di¢cult to prove its consistency and asymptotic normality. In this paper, I propose an alternative approach that takes a consistent estimator of the
2The QML estimator is likely to be computationally expensive due to the necessity to
calcu-late eigenvalues of a sparse matrix (I¡ ¸Wt) which is of the dimension N. With large N this
spatial correlation parameter ¸ and maximizes a constrained likelihood function. That is, maximize
QN(~µ) = const ¡ N 2 ln j§¢´j + ln ¯ ¯ ¯ImN (T +1)¡ ^¸ ¹W ¯ ¯ ¯ (44) ¡N2trhR0(ImN (T +1)¡ ^¸ ¹W)§¡1¢´(Im(T +1)¡ ^¸ ¹W0)RSN i
with respect to ~µ = (vechª0; vech0
"; vec©0)0, taking the consistent estimator ^¸
of ¸ as given. The consistent estimator of the spatial correlation be based on the two-step procedure proposed above.
4. Asymptotics
To prove consistency of the constrained QML estimator, I …rst prove that the likelihood function converges point-wise in probability to some function (which is the limit of the expected likelihood). I then use identi…cation conditions to show consistency.
4.1. Point-wise Convergence of the Likelihood Function The constrained likelihood function is
QN(~µ) = const ¡ N 2 ln j§¢´j + ln ¯ ¯ ¯Im(T +1)¡ ^¸ ¹W ¯ ¯ ¯ (45) ¡N 2tr h R0(ImN (T +1)¡ ^¸ ¹W)§¡1¢´(ImN(T+1)¡^¸ ¹W0)RSN i
For the …rst part we need to show that sup µ2£ ¯ ¯ ¯ ¯N1QN(~µ) ¡ E µ 1 NLN(¸; µ) ¶¯¯¯ ¯ ! 0 in probability (46) where ~µ = (vechª0; vech0; vec©0)0 is a vector of parameters from admissible
parameter space £, e.g. and ª are symmetric positive-de…nite, etc. Note that 1 NQN(~µ) ¡ E µ 1 NLN(¸; µ) ¶ = (47) N 2tr h R0(ImN (T +1)¡ ^¸ ¹W)§¡1¢´(ImN (T +1)¡ ^¸ ¹W0)R(E(SN) ¡ SN) i
< N 2tr h R0(ImN (T +1)¡ ^¸ ¹W)§¡1¢´(ImN (T +1)¡ ^¸ ¹W0)R i ¢ tr(E(SN) ¡ SN) (48)
To complete the proof, we need to show that (i) trR0(I
m(T +1)¡ ^¸ ¹W)§¡1¢´(ImN (T +1) ¡ ^¸ ¹W0)R is bounded as N ! 1, and
that
(ii) trSN = N1
PN
i=0¢yi¢yi0 converges to its expected value trR¡1(Im(T +1) ¡
^
¸ ¹W)¡1§
¢´(Im(T +1)¡ ^¸ ¹W0)¡1R0¡1 in probability as N ! 1.
The former follows from the assumptions I have made on the weighting ma-trices, speci…cally the absolute summability of the W matrix. The latter follows from the i.i.d. assumptions on the disturbance term and the restrictions on the weighting matrices - we have that
trSN = tr 1 N¢y¢y 0 (49) = tr · R¡1(ImN (T +1)¡ ^¸ ¹W)¡1 µ 1 N"" 0 ¶ (ImN (T +1)¡ ^¸ ¹W0)¡1R0¡1 ¸ (50) tr · (ImN (T +1)¡ ^¸ ¹W0)¡1R0¡1R¡1(ImN (T +1)¡ ^¸ ¹W)¡1 µ 1 N"" 0 ¶¸ (51) < trh(ImN (T +1)¡ ^¸ ¹W0)¡1R0¡1R¡1(ImN (T +1)¡ ^¸ ¹W)¡1 i ¢ tr µ 1 N"" 0 ¶ (52)
We can show that tr(ImN (T +1)¡^¸ ¹W0)¡1R0¡1R¡1(ImN (T +1)¡^¸ ¹W)¡1is also bounded.
Given the assumptions on the disturbances implyN1""0 ! §¢´, we have the result
(ii).
4.2. Consistency
Given the point-wise convergence of the constrained likelihood function, any se-quence of ^µN = arg maxµ2£
h
QN(~µ)
i
has to converge to a set of global maxima of the limiting function Q(:) = p limN !1E[LN(:)]. The set of global maxima
of this function contains the true parameter vector (this is a direct consequence of Jensen’s inequality), furthermore, if some identi…cation conditions are met, it only contains the true value of the parameter and we have consistency.
To judge whether the likelihood has a ‡at top one needs to inspect asymptotic behavior of the Hessian, i.e. inspect the limits of the Hij matrices (de…ned in
the appendix). If the probability limit as N ! 1 of some of the Hij matrices is
singular, then the corresponding parameter is asymptotically not identi…ed. The terms in these matrices are similar to the ones discussed in the proofs above, e.g.
§0¡1¢´(ImN (T +1)¡ ^¸ ¹W0)RS0NR0(ImN (T +1)¡ ^¸ ¹W)§0¡1¢´ §¡1¢´(ImN (T +1)¡ ^¸ ¹W0)RS N, §0¡1´ (ImN (T +1)¡ ^¸ ¹W0) (ImN (T +1)¡ ^¸ ¹W)§0¡1¢´(ImN (T +1)¡ ^¸ ¹W0) and R0(I mN (T +1)¡ ^¸ ¹W)§0¡1¢´(ImN (T +1)¡ ^¸ ¹W0).
Let us take the expression §¡1¢´(ImN (T +1)¡ ^¸ ¹W0)RS
N as an example. We need to show that lim N !1N § ¡1 ¢´(ImN (T +1)¡ ^¸ ¹W0)R > 0 (53) and that p lim N !1 1 NSN > 0 (54)
The later plim was discussed above. The former follows from the assumptions on the weighting matrices and the parameter space; since ^¸ is consistent, the matrix (ImN (T +1)¡ ^¸ ¹W0) is not singular as N tends to in…nity.
Inspection of the Hessians does not rule out multiplicity of peaks of the like-lihood function. However, since the likelike-lihood function is smooth, it does imply local identi…cation and hence consistency of the constrained QML estimator. The following proposition summarizes the main result.
Proposition 3. Under Assumptions 1-4, there exists a neighborhood of the true parameter value £1 µ £ such that maximization of the constrained maximum
likelihood function over £1 gives a consistent estimate of ~µ.
The above proposition implies that as long as the admissible parameter space is compact within some neighborhood of the true parameter value and that we can obtain starting values from this compact neighborhood, the constrained maximum likelihood estimator will be consistent.
5. Conclusion
This paper develops an estimation approach for a panel VAR model with spa-tial dependence. I suggest a three-step estimation procedure. In the …rst step, instrumental variables procedure is used to consistently estimate the spatially correlated disturbances. In the second step, a method of moments estimation is
used to obtain a consistent estimate of the spatial parameter. The …nal step of the procedure could be either a constrained maximum likelihood procedure or moments estimation based on a model transformed by a spatial Cochrane-Orcutt transformation.
I introduce the constrained maximum likelihood estimator based on a consis-tent estimate of the spatial dependence parameter and sketch a proof of its con-sistency when the time dimension of the panel is …xed. In future versions of this paper, I plan to explore the small sample properties of the QML and constrained QML estimators with a Monte Carlo study. It would also be of interest to prove asymptotic normality of the proposed estimator as well as to derive the asymptotic properties of the QML estimator under some reasonable set of assumptions.
6. Appendix A - Derivatives of the QML Function
To judge whether the model is asymptotically identi…ed, I inspect the Hessian of the concentrated likelihood function QN(µ). The …rst di¤erential is
dQN = ¡ N 2 tr§ ¡1 ¢´(d§¢´) (55) +N 2tr h R0(ImN (T +1)¡ ^¸W)§¡1¢´d§¢´§¡1¢´(ImN (T +1)¡ ^¸W0)RSN i ¡N 2 tr h dR0(ImN (T +1)¡ ^¸W)§¡1¢´(ImN (T +1)¡ ^¸W0)RSN i ¡N2 trhR0(ImN (T +1)¡ ^¸W)§¡1¢´(ImN (T +1)¡ ^¸W0)dRSN i = N 2vec h ¡§0¡1¢´ + §0¡1¢´(ImN (T +1)¡ ^¸W0)RS 0 NR0(ImN (T +1)¡ ^¸W)§0¡1¢´ i0 ¢ ¢DmTdvech§¡1¢´ N vech¡S0NR0(ImN (T +1)¡ ^¸W)§0¡1¢´(ImN (T +1)¡ ^¸W0) i0 dvecR
where DmT is a duplication matrix (such as that Dkvech(X) = vec(X) for any
k £ k matrix X), Ksq is a commutation matrix (such that Ksqvec(X) =vec(X0)
for any s £ q matrix X),
dvech§¡1¢´ = vech [(A1 dª) + (A2 d)] (56)
= D¡1mT(IT Km;T Im)(vecA1 Im2)DmTvechª + (57) +D¡1mT(IT Km;T Im)(vecA2 Im2)DmTvech = D¡1mTB1vechª + D¡1mTB2vech (58) and dvecR = vec(A3 d©) (59) = (IT Km;T Im)(vecA3 Im2)dvec© (60) = B3dvec© (61)
with A1, A2, A3, B1, B2 and B3 being matrices of constants re‡ecting the
In particular, A1 = 0 B B B @ 1 0 ¢ ¢ ¢ 0 0 0 ... .. . . .. 0 ¢ ¢ ¢ 0 1 C C C A (62) A2 = IT ¡ A1 (63) and A3 = 0 B B B B B B @ 0 ¢ ¢ ¢ 0 ¡1 0 ... 0 ¡1 . .. .. . . .. ... 0 0 ::: 0 ¡1 0 1 C C C C C C A (64) De…ning M1 = ¡§0¡1´ + §0¡1´ (ImN(T+1)¡^¸W0)RS0NR0(Im(T +1)¡ ^¸W)§0¡1´ (65) and M2 = ¡S0NR0(Im(T +1)¡ ^¸W)§0¡1¢´(ImN(T+1)¡^¸W0) (66)
we can write the Jacobian of QN(#) in a partitioned form as
DQN(#) = N 2 2 4 vec(M1) 0B 1 : vec(M1)0B2 : 2vec(M2)0B3 3 5 (67)
The second order di¤erential is d2QN = ¡ N 2tr§ ¡1 ¢´(d§¢´)§¡1¢´(d§¢´) (68) +N trhR0(Im(T +1)¡ ^¸W)§¡1¢´d§¢´§¡1¢´d§¢´§¡1¢´(ImN(T+1)¡^¸W0)RSN i +2N tr h dR0(Im(T +1)¡ ^¸W)§¡1¢´d§¢´§¡1¢´(ImN(T+1)¡^¸W0)RSN i ¡NtrhS0NR0(Im(T +1)¡ ^¸W)§¢´¡1d§¢´§0¡1´ (ImN(T+1)¡^¸W0)dR i ¡NtrhS0NdR0(Im(T +1)¡ ^¸W)§0¡1¢´(ImN(T+1)¡^¸W0)dR i = ¡N2(dvech§¡1¢´)0DmT0 (§¢´ §¢´)DmT(dvech§¢´) +N (dvech§¢´)0D0mT h §¡1¢´(ImN(T+1)¡^¸W0)RSNR0(Im(T +1)¡ ^¸W)§¡1¢´ §¡1¢´ i ¢ ¢DmT(dvech§¢´) +2N (dvecR)h§¡1¢´(Im(T +1)¡ ^¸W0) §¡1¢´(ImN(T+1)¡^¸W0)RSN i DmT(dvech§¢´) ¡N(dvech§¢´)0D0mT h §¡1¢´(ImN(T+1)¡^¸W0)RSN §0¡1¢´(ImN(T+1)¡^¸W0) i (dvecR) ¡N(dvecR)0hSN (Im(T +1)¡ ^¸W)§0¡1¢´(ImN(T+1)¡^¸W0) i (dvecR) = N 2 µ DmTdvech§¢´ dvecR ¶0 H µ DmTdvech§¢´ dvecR ¶ where H= µ H11 H12 H21 H22 ¶ (69) H11= ([2M1¡ §¢´] §¢´) (70) H12= h §¡1¢´(ImN(T+1)¡^¸W0)RSN §0¡1´ (ImN(T+1)¡^¸W0) i (71) H21 = SN (Im(T +1)¡ ^¸W)§0¡1´ (ImN(T+1)¡^¸W0) (72) H22 = 2SN S¡1N M2 (73)
Using our previous results, we have that d2QN = N 2 µ B1vechª + B2vech B3dvec© ¶0 H µ B1vechª + B2vech B3dvec© ¶ (74) = N 2 0 @ B1vechª B2vech B3dvec© 1 A 00 @ H11 H11 H12 H11 H11 H12 H21 H21 H22 1 A 0 @ B1vechª B2vech B3dvec© 1 A = N 2 0 @ vechªvech dvec© 1 A 00 @ B1H11B1 B1H11B2 B1H12B3 B2H11B1 B2H11B2 B2H12B3 B3H21B1 B3H21B2 B3H22B3 1 A 0 @ vechªvech dvec© 1 A = N 2 0 @ vechªvech dvec© 1 A 0 H¤ 0 @ vechªvech dvec© 1 A
Hence the Hessian is then
HQN =
1 2[H
¤+ H¤0] (75)
7. Appendix B - Nuisance Property of
¸
For the m = 1 case, the notation simpli…es to:
§¢´ = 0 B B B @ Ã ¡¾2 0 ¡¾2 2¾2 . .. . .. . .. ¡¾2 0 ¡¾2 2¾2 1 C C C A (76)
and var["i1¡ (1 ¡ Á)»i] = Ã and var("it) = ¾2. If we assume that "t» N(0; ¾2IN)
and ["1 ¡ (1 ¡ Á)»] »N(0; ÃIN) independent of "t for all 1 · t · T , then the
log-likelihood functions is:
ln LN(µj¢Xt; ¢yt; ¢yt¡1) = ¡ N T 2 ln(2¼) ¡ N 2 ln j§¢´j + T ln jIN ¡ ¸Wj +1 2V(µ) 0(§¡1 ¢´ IN)V(µ) (77)
where we de…ne the sample counterpart of ¢´ as
with the vector of parameters µ = (Á; ¾2; Ã; ¸)0. As before, we can express § ¢´ as §¢´ = µ Ã ¾2a0 1 ¾2a1 ¾2A2 ¶ (79) where a1 = (¡1; 0; :::; 0)0and A2 are vector and a matrix of constants. The inverse
of §¢´ is then §¡1¢´ = 1 dA ¡1 (80) where A¡1 = µ 1 ¡a01A¡12 ¡A¡12 a1 IT ¡1¡ A2¡1a1a01A¡12 ¶ (81) and d = Ã ¡ ¾2a0
1A¡12 a1. Using the same partitioning of §¢´, we can express its
determinant as
j§¢´j = (¾2)T ¡1ÃjA2j + (¾2)TjA3j (82)
where the A3 is equal to A2 with the …rst row replaced by a01.
7.1. Partial Derivatives
The …rst and second-order partial derivatives of the likelihood function are: @ ln L @Á = ¡V 0(§¡1 ¢´ IN)[IT (IN ¡ ¸W)]¢Y¡1 (83) @ ln L @Ã = ¡ N 2j§¢´j £ (T ¡ 1)(¾2)T ¡2ÃjA2j + T (¾2)T ¡1jA3j ¤ +a 0 1A¡12 a1 2d2 V 0(A¡1 IN)V (84) where A¡1 = µ 1 ¡a0 1A¡12 ¡A¡12 a1 IT ¡1¡ A2¡1a1a01A¡12 ¶ (85) and …nally @ ln L @¾2 = ¡ N 2j§¢´j £ (¾2)T ¡1jA2j ¤ ¡ 2d12V 0(A¡1 IN)V (86)
The second-order partial derivatives are: @2ln L @Á@¸ = V 0(§¡1 ¢´ IN)[IT W]¢Y¡1 (87) @2ln L @¾2@¸ = a0 1A¡12 a1 2d2 V 0(A¡1I N)[IT W(IN ¡ ¸W)¡1]V (88) @2ln L @Ã@¸ = ¡ 1 2d2V 0(A¡1I N)[IT W(IN ¡ ¸W)¡1]V (89)
7.2. Probability Limits of the Hessian
The probability limits of the o¤-diagonal elements of the Hessian are: 7.2.1. @Á@¸ p lim N !1 1 N @2ln L @Á@¸ = p limN !1 1 N¢´ 0(§¡1 ¢´ IN)(IT W)¢Y¡1 (90)
The expected value of the …rst part of above expression is E 1 N¢´ 0(§¡1 ¢´ IN)(IT W)¢Y¡1 = 1 Ntr £ E(¢Y¡1¢´0)(§¡1¢´ IN)(IT W) ¤ (91) To evaluate E(¢Y¡1¢´0), we …rst express ¢Y¡1 as a function of the model disturbances by recursive substitution:
¢Y¡1 = [IT (IN ¡ ¸W)]¡1 T ¡1 X k=1 Ák¡1¢´¡k (92) and hence E(¢Y¡1¢´0) = [IT (IN ¡ ¸W)]¡1 T ¡1 X k=1 Ák¡1E(¢´¡k¢´0) (93) Now, E(¢´¡k¢´0) has a structure similar to (§
¢´ IN):
where the …rst kN rows of §¢´¡k are zeros and the remaining (T ¡ k)N rows are
the …rst (T ¡ k)N rows of §¢´. Therefore, we have
E(¢´¡k¢´0)(§¡1¢´ IN) = (§¢´¡k IN)(§¡1¢´ IN) (95)
= (IT¡k IN)
where the …rst k rows of IT¡k are zeros and remaining (T ¡k) rows are …rst (T ¡k)
rows of a T £ T identity matrix. Collecting the results, we have that E(¢Y¡1¢´0)(§¡1¢´ IN) = [IT (IN ¡ ¸W)]¡1 T ¡1 X k=1 Ák¡1E(¢´¡k¢´0)(§¡1¢´ IN) = [IT (IN ¡ ¸W)]¡1 T ¡1 X k=1 Ák¡1(§¢´¡k IN)(§¡1¢´ IN) = [IT (IN ¡ ¸W)]¡1 T ¡1 X k=1 Ák¡1(IT¡k IN) = [IT (IN ¡ ¸W)]¡1(© IN) (96)
where © is a matrix of zeros except for T (T ¡1)2 elements below the main diagonal which are powers of Á. Therefore,
tr£E(¢Y¡1¢´0)(§¡1¢´ IN)(IT W) ¤ = tr£[IT (IN ¡ ¸W)]¡1(© IN)(IT W) ¤ = tr£(IT W)[IT (IN ¡ ¸W)]¡1(© IN) ¤ = tr£(IT W(IN ¡ ¸W)¡1)(© IN) ¤ = tr£(IT© W(IN ¡ ¸W)¡1) ¤ = T ¢ tr£W(IN ¡ ¸W)¡1) ¤ < T ¢ ¹w (97)
where ¹w is such that 8N : 80 · i · N :PNi=1jaijj < ¹w where aij are elements of
W(IN ¡ ¸W)¡1. Existence of such ¹w is due to absolute summability of W(IN ¡
¸W)¡1. Hence we can conclude that E 1 N¢´ 0(§¡1 ¢´ IN)(IT W)¢Y¡1 < T Nw ! 0¹ (98) The limit of the variance of the same expression is
p lim N !1var 1 N¢´ 0(§¡1 ¢´ IN)(IT W)¢Y¡1
= p lim N !1 1 N2¢Y 0 ¡1(IT W)0(§¡1¢´ IN)(IT W)¢Y¡1 = tr · p lim N !1 1 N2¢Y¡1¢Y 0 ¡1(IT W)0(§¡1¢´ IN)(IT W) ¸ = 0 (99)
by assumption (2) below. Therefore, by Chebychev Lemma, the probability limit of N1 @Á@¸@2 is zero.
7.2.2. @¾2@¸
For the term @2ln L
@¾2@¸ we have p lim N !1 1 N @2ln L @¾2@¸ = a0 1A¡12 a1 2d2 p limN !1 1 N¢´ 0(A¡1
IN)[IT W(IN ¡ ¸W)¡1]¢´
(100) To get the needed result, we again show that expected value and variance of the above expression is zero in the limit, and hence by Chebyshev Lemma its probability limit is zero:
E 1 N @2ln L @¾2@¸ = 1 N a01A¡12 a1 2d2 Etr[ 1 N(¢´¢´ 0)(A¡1
IN)(IT W(IN ¡ ¸W)¡1)]
= 1 N a0 1A¡12 a1 2d2 tr[E(¢´¢´ 0)(A¡1I N)(IT W(IN ¡ ¸W)¡1)] = 1 N a01A¡12 a1 2d2 tr[(dA IN)(A ¡1
IN)(IT W(IN ¡ ¸W)¡1)]
= 1 N a0 1A¡12 a1 2d tr[IT W(IN ¡ ¸W) ¡1] ! 0 (101)
since both A¡1 and W(I
N ¡ ¸W)
¡1 are absolutely summable. The variance is
var 1 N @2ln L @¾2@¸ = var 1 N a0 1A¡12 a1 2d2 tr[(¢´¢´ 0)(A¡1I N)[IT W(IN ¡ ¸W)]¡1] = 1 N2 µ a01A¡12 a1 2d2 ¶2 ¢ (102) ¢tr ·
(IT (IN ¡ ¸W0)]¡1W0)(A0¡1IN)var(¢´¢´0)
¢(A¡1IN)[IT W(IN ¡ ¸W)]¡1
If ¢"t has …nite fourth moments, we have var 1 N @2ln L @¾2@¸ < const ¢ 1 N2tr[(IT (IN ¡ ¸W0) ¡1]W0)
¢(A0¡1IN)(A¡1IN)[IT W(IN ¡ ¸W)]¡1]
= const ¢ 1
N2tr[(IT (IN ¡ ¸W0) ¡1]
¢W0W(IN ¡ ¸W)]¡1)(A0¡1A¡1IN)
! 0 (103)
because (IN¡ ¸W0)¡1W0W(I
N¡ ¸W)
8. References
Ahn, S.C. and P. Schmidt (1995): E¢cient Estimation of Models for Dynamic Panel Data, Journal of Econometrics, 68, 5-27.
Anselin, L. (1988): Spatial Econometrics: Methods and Models, Kluwer Acad-emic Publishers, Boston.
Arellano, M. and S.R. Bond (1991): Some Tests of Speci…cation for Panel Data: Monte Carlo Evidence and Application to Employment Equations, Review of Economic Studies, 58, 277-297.
Arellano, M. and O. Bover (1995): Another Look at the Instrumental Variable Estimation if Error-Component Models, Journal of Econometrics, 68, 28-51.
Binder, Michael, Cheng Hsiao and M. Hashem Pesaran (2001): Estimation and Inference in Short Panel Vector Autoregressions with Unit Roots and Coin-tegration, mimeo.
Binder, Michael, Cheng Hsiao, Jan Mutl and M. Hashem Pesaran (2002): Computational Issues in Estimation of Higher-Order Panel Vector Autoregres-sions, mimeo.
Chang, Yoosoon (2001): Nonlinear IV Unit Root Tests in Panels with Cross-Sectional Dependency, mimeo.
Chen, Xiaohong and Timothy Conley (2000): A New Semiparametric Spatial Model for Panel Time Series, mimeo.
Cli¤, A. and J. Ord (1973): Spatial Autocorrelation, Pion, London.
Cli¤, A. and J. Ord (1981): Spatial Processes, Models and Applications, Pion, London.
Hsiao, C., M. H. Pesaran and A. K. Tahmiscioglu (2001): Maximum Likelihood Estimation of Fixed E¤ects Dynamic Panel Data Models Covering Short Time Periods, Journal of Econometrics, forthcoming.
Kelejian, Harry H. and Ingmar R. Prucha (1998): A Generalized Spatial Two-Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances, Journal of Real Estate Finance and Economics, 17, 99-121.
Kelejian, Harry H. and Ingmar R. Prucha (1999): A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model, International Economic Review, 40, 509-533.
Kelejian, Harry H. and Ingmar R. Prucha (2001): Estimation of Simultaneous Systems of Spatially Interrelated Cross Sectional Equations, mimeo.
Distri-bution of the Moran I Test Statistic with Applications, Journal of Econometrics, 104, 219-257..
O’Connel, paul G. J. (1998): The Overvaluation of Purchasing Power Parity, Journal of International Economics, 44, 1-19.
Pinkse, Joris (1999): Asymptotic Properties of Moran and Related Tests and Testing for Spatial Correlation in Probit Models, mimeo.
Whittle, P. (1954): On Stationary Processes in the Plane, Biometrica, 41, 434-449.