Estimation and Testing of Higher-Order Spatial Autoregressive Panel Data Error Component Models

(1)

ePub

WU

Institutional Repository

Harald Badinger and Peter Egger

Estimation and Testing of Higher-Order Spatial Autoregressive Panel Data

Error Component Models

Article (Accepted for Publication) (Refereed)

Original Citation:

Badinger, Harald and Egger, Peter (2013) Estimation and Testing of Higher-Order Spatial

Autoregressive Panel Data Error Component Models. Journal of Geographical Systems, 15 (4).

pp. 453-489. ISSN 1435-5930

This version is available at: http://epub.wu.ac.at/5468/

Available in ePubWU: March 2017

ePubWU, the institutional repository of the WU Vienna University of Economics and Business, is

provided by the University Library and the IT-Services. The aim is to enable open access to the scholarly output of the WU.

This document is the version accepted for publication and — in case of peer review — incorporates referee comments.

(2)

Estimation and Testing of Higher-Order Spatial Autoregressive

Panel Data Error Component Models

Harald Badinger

Department of Economics, Vienna University of Economics and Business

Peter Egger

Department of Management, Technology, and Economics at ETH Zürich; CEPR

May 2012

Abstract: This paper develops an estimator for higher-order spatial autoregressive panel data error component models with spatial autoregressive disturbances, SARAR(R,S). We derive the moment conditions and optimal weighting matrix without distributional assumptions for a generalized moments (GM) estimation procedure of the spatial autoregressive parameters of the disturbance process and define a generalized two-stage least squares estimator for the regression parameters of the model. We prove consistency of the proposed estimators, derive their joint asymptotic distribution, and provide Monte Carlo evidence on their small sample performance.

JEL-code: C13, C21, C23

Keywords: Higher-order spatial dependence; Generalized moments estimation; Two-stage least squares; Asymptotic statistics

(3)

I. Introduction

This paper considers the estimation of panel data models with higher-order spatially autocorrelated error components and spatially autocorrelated dependent variables (SARAR). Spatial interactions in data may originate from various sources such as strategic interaction between jurisdictions (to attract firms or other mobile agents) and firms (in their price, quantity, or quality setting) or general equilibrium effects which disseminate with spatial decay due to their transmission through trade flows, migration, or input-output relationships.1

Data sets used in empirical studies often share two features: first, they are available in the form of panel data, with a large cross-sectional and a small time series dimension. Second, spatial interactions of various kinds co-exist – such as geography-related, trade-related, migration-related interactions – or the decay function of a single spatial interaction is unknown. The estimator proposed here addresses these two features in a unified framework and provides a flexible setup for applied work, allowing specification tests, estimation, and inference in higher-order random effects panel data models.

There are two main motivations for the use of higher-order models. First, the distance between two cross-sectional units is not necessarily (only) geographical in nature, a point prominently made in the political science literature by Beck, Gleditsch, and Beardsley (2006). Possible candidates for channels of spatial dependence beyond adjacency or geographical distance between units (such as countries) relate to i) economic distance (e.g., trade, cross-border lending, migration, input-output relationships, profit shifting of multinational firms), ii) socio-economic distance (e.g., differences in per capita income, age structure, ethnic composition of population), iii) cultural distance (language, index of individualism, religion), or iv) political/institutional distance (electoral systems, degree of federalism, voting in international organisations). A higher-order approach allows including several weights matrices that are based on alternative concepts of distance, whose relative importance to each other is unknown. Apart from the fact that assessing the relevance of alternative transmission channels of spillovers is of interest in itself, wrongly imposing a first-order spatial regressive process may misattribute part of the spatial dependence in the data (due to omitted transmission channels) to the single transmission channel included in the model. As a consequence, the estimates of the spatial regressive parameters and their standard errors will then be biased and inconsistent.

Second, even with only one channel of interdependence (e.g., related to geographical distance), the functional form of the true distance decay function, reflected in the elements of the weights matrix, is typically unknown. The standard approach to assume a known spatial

1

See Anselin (2007) and Pinkse and Slade (2010) for surveys on the past, present, and future of spatial econometrics. A recent textbook on the matter is Le Sage and Pace (2009).

(4)

weights matrix with binary elements (e.g., for nearest neighbours) or elements that are specified as a decreasing function of distance (with known functional form and known distance decay parameter) seems highly restrictive. In fact, the distance decay function may exhibit discontinuities (e.g., border effects for interactions between units of different jurisdictions or countries) or a decay which is different from the one that is assumed by the researcher. Then, allowing subsets of the elements of the weights matrix to bear different spatial regressive parameters may significantly reduce the bias rooting in the assumption of an inadequate, preimposed decay function in a single weights matrix

The specification of higher-order models introduces non-trivial issues regarding the specification, interpretation, and estimation of these models. See Elhorst, Lacombe, and Piras (2012) and LeSage and Pace (2012) for a discussion of potential pitfalls in the use of higher-order models. However, these complications can be dealt with, while the alternative to stick with a misspecified first-order model for the sake of simplicity may result in inferior estimates. At the very least, the robustness of the results from a first order specification should be thoroughly explored, not only against variations in the specification of a single weights matrix (as is common in applied work), but also against the inclusion of further weights matrices, whenever economic theory suggests various channels of interdependence.

Estimation and testing of both random and fixed effects spatial regressive panel data models has been considered in the recent literature by Baltagi, Song, and Koh (2003) and Lee and Yu (2010) in a maximum likelihood framework and by Kapoor, Kelejain and Prucha (2007) as well as Mutl and Pfaffermayr (2011) using the generalized moments (GM) approach introduced by Kelejian and Prucha (1999). Obvious advantages of GM over ML estimation are that it does not rely on distributional assumptions and its computational simplicity. Moreover, comprehensive (cross-sectional) Monte Carlo Evidence by Arraiz, Drukker, Kelejian and Prucha (2010) shows that the large sample distribution provides a good approximation to the actual small sample distribution of the GM estimators of spatial regressive models.

The present paper builds on Kapoor, Kelejian, and Prucha (2007). They propose a GM estimator for the parameters of the spatial regressive error process in a random effects panel data model without endogenous explanatory variables (such as spatial lags of the dependent variable), derive a simplified weighting matrix for the moment conditions under the assumption of normally distributed error components, and prove consistency of the GM estimates. They also establish the asymptotic distribution of the regression parameters of the feasible generalized least squares (FGLS) estimates of the parameters of the main equation.

The present paper extends the estimation framework in Kapoor, Kelejian, and Prucha (2007) in several respects. In particular, it makes the following contributions:

(5)

estimates of the regression parameters and the GM estimates of the parameters of the spatial regressive disturbance process.

 Second, we allow for endogenous variables, including spatial lags of the dependent variable in the main equation, which is shown to affect the optimal weighting matrix for the moment conditions and the distribution of the GM estimates.

 Third, we dispense with the assumption of normally distributed error components, used by Kapoor, Kelejian, and Prucha (2007) to derive a simplified weighting matrix of the moments, retaining one of the main advantages of the GM approach of maximum-likelihood estimation.

 Fourth, we allow for higher-order rather than only first-order spatial regressive processes in both the dependent variable and the error process. This enables a more flexible design of the ‘spatial’ interdependence decay function and allows for the co-existence of more than one mode of interdependence as often suggested by economic theory (see Lee and Liu, 2010; and Badinger and Egger, 2011; for a treatment of higher-order spatial models with cross-section data).

 Finally, we provide some Monte Carlo evidence on the small sample performance of the proposed estimation procedure.

The remainder of the paper is organized as follows. Section II introduces the basic model specification. Section III proposes GM estimators of the parameters of spatial dependence in the error components. Section IV derives a two-stage least-squares (TSLS) routine to estimate the regression parameters of the model and derives the asymptotic distribution of all model parameters. Section V presents the results of a Monte Carlo simulation and section VI concludes. The detailed proofs are relegated to a technical appendix.

II. Basic Model Specification and Notation

The basic specification is a generalization of Kapoor, Kelejian, and Prucha (2007), who consider a panel data error components model with nonstochastic explanatory variables and first-order spatial autoregressive disturbances, i.e., a SAR(1) model. The present paper allows for an R-th order spatial autoregressive process in the dependent variable and an S-th order spatial process in the disturbances, i.e., we consider a SARAR(R,S) panel data error components model with i1,...,N cross-sectional units and t1,...,T time periods.2 For time period t, the model reads

2

Except for the ones on error components, the catalogue of assumptions in this paper extends to the case of fixed effects estimation (see Mundlak, 1978, for an early treatment of "within" parameter estimation in the context of an error components model).

(6)

) ( ) ( ) ( ) ( 1 , , t t t t _N R r N N r N r N N N X β W y u y  



   , or (1a) ) ( ) ( ) (t N t N N t N Z δ u y   , (1b)

where yN(t) is an N1 vector with cross-sectional observations of the dependent variable in

year t, XN(t) is an NK matrix of observations on K non-stochastic explanatory variables, i.e., X_N(t)[x₁_,_N(t),...,x_K_,_N(t)] with each N1 vector x_k_,_N(t) denoting the observations on the k-th explanatory variable. The structure of spatial dependence in yN(t) is determined by

the time-invariant NN matrices W_r_,_N, r1,...,R, whose elements w_ij_,_r_,_N are assumed to be known (and often specified as decreasing function of geographical distance). The expression yr,N(t)Wr,NyN(t) is referred to as the r-th spatial lag of yN. The specification

of a higher-order process allows the strength of spatial interdependence in the dependent variable (reflected in the spatial autoregressive parameters _r_,_N, r1,...,R) to vary across a fixed number of R subsets of relations between cross-sectional units.

In Equ. (1b), the N(KR) design matrix is given by Z_N(t)[X_N(t),Y_N(t)], with )] ( ),..., ( [ ) (t 1,N t R,N t N y y

Y  , and δN (βN,λN), where the K1 parameter vector of the

exogenous variables is given by β_N (β₁_,_N,...,β_K_,_N) and the R1 vector of spatial autoregressive parameters of y_N is defined as λ_N (₁_,_N,...,_R_,_N).

The N1 vector of error terms uN(t)[u1,N(t),...,uN,N(t)] is assumed to follow a spatial

autoregressive process given by

) ( ) ( ) ( 1 , , t t t _N S m N N m N m N M u ε u 



   , (1c) ) ( ) (t N N t N μ v ε   , (1d)

where m,N and Mm,N denote the time-invariant, unknown parameters and the known NN

matrix of spatial interdependence, respectively. The structure of spatial correlation in the disturbances is determined by the S different, time-invariant NN matrices M_m_,_N. The expression u_m_,_N(t)M_m_,_Nu_N(t) is referred to as the m-th spatial lag of u_N. The S1 vector of the spatial autoregressive parameters of uN(t) is defined as ρN (1,N,...,S,N).

(7)

) (t

N

ε and v_N(t) are the scalars _it_,_N and v_it_,_N, respectively, and the N1 vector of unit-specific error components is given by μ_N (μ₁_,_N,...,μ_N_,_N).

Stacking observations for all time periods such that t is the slow index and i is the fast index with all vectors and matrices, the model reads

N N N N N N X β Y λ u y    , or (2a) N N N N Z δ u y   , (2b)

with the NTK regressor matrix XN [XN(1),...,XN(T)], and YN (y1,N,...,yR,N), where ] ) ( ),..., 1 ( [ _, _, ,N  rN rN T  r y y

y is the NT1 vector of observations on the r-th spatial lag of the dependent variable y_r_,_N. The NT1 vector of disturbances u_N [u_N(1),...,u_N(T)] for the spatial autoregressive process of order S is given by

N S m N N m T N m N I M u ε u 



  1 , , ( )  , (2c)

where I_T is an identity matrix of dimension TT. The NT1 vector ε_N [ε_N(1),...,ε_N(T)]

is specified as N N N T N e I μ v ε (  )  , (3a)

where e_T is a unit vector of dimension T1 and I_N is an identity matrix of dimension N

N . In light of (2c), the error term can also be written as



        S m N N m N m N T S m N N m T N m N N 1 , , 1 , , (I M )u I (I M )u u ε   . (3b) It follows that



     S m N N m N m N T N 1 1 , , ) ] ( [I I M ε u  , and (4a) N R r N r N r N T N N R r N r N r N T N I I W X t β I I W u y [ ( ) ] ( ) [ ( ) 1] 1 , , 1 1 , ,    



        , (4b)

The following assumptions are maintained throughout this paper. Assumption 1.

(8)

Let T be a fixed positive integer. (a) For all 1tT and 1iN,N1, the error components vit,N are identically and (mutually) independently distributed with E(vit,N)0,

2 2

, )

(v_it_N _v

E  , where 0_v2 b_v , and Ev_it_,_N4  for some 0. (b) For all

1 ,

1iN N , the unit-specific error components _i_,_N are identically and (mutually) independently distributed with E(i,N)0,

2 2 , ) (iN _ E , where 0_2 b_ , and     4 ,N i

E for some  0. (c) The processes {v_it_,_N} and {_i_,_N} are independent of each other. Assumption 1 is slightly stronger than that in Kapoor, Kelejian, and Prucha (2007), since it requires not only the fourth but also the (4)-th moments of the error components to be finite for some 0. This is required to invoke the central limit theorem of Kelejian and Prucha (2010) in the derivation of the asymptotic distribution in section III.

Assumption 1 implies that

2 2 ,

, )

( _it_N _js_N _v

E   _  for i jand ts, (5a)

2 , , ) (_it_N_js_N _ E for i j and ts, (5b) 0 ) ( _it_,_N _js_,_N  E   , otherwise. (5c)

As a consequence, the variance-covariance matrix of the stacked error term εN reads

NT v N T N N N E ε ε J I I Ω_ε 2 2 ,  (  )(  ) , (6a)

where JT eTeT is a TT matrix with unitary elements and INT is an identity matrix of

dimension NTNT. Eq. (6a) can also be written as

N N v N 1, 2 1 , 0 2 , Q Q Ω_ε   , (6b)

where 12 v2 T2. The two matrices Q0,N and Q1,N, which are central to the estimation

of error component models and the moment conditions of the GM estimator, are defined as

N T T N T I J I Q₀_, (  ) , (7) N T N T I J Q1,   . (8)

(9)

Notice that Q₀_,_N and Q₁_,_N are both of order NT NT, symmetric, idempotent, orthogonal to each other, and sum up to I_NT.

Assumption 2.

(a) All diagonal elements of W_r_,_N, r1,...,R, and M_s_,_N, s1,...,S, are zero. Without loss of generality, we assume that they are row-normalized in the following. b) The parameters _r_,_N, r = 1, …, R, and s,N, s = 1, …, S, are finite and contained in the admissible parameter spaces

) , ( , r r N N N r a a      and _, ( s, s) N N N s a a  

   ; with row-normalized matrices, we have 1

1 , 



 R r N r  and 1 1 , 



 S s N s  .3

Assumption 2 ensures invertibility of ( )

1 , ,



  S m N m N m N M I  and ( ) 1 , ,



  R r N r N r N W I  and thus

that y and _N u are uniquely identified by (4a) and (4b). _N

We emphasize that all results of the present paper hold under alternative normalizations of the weights matrices with corresponding modifications of the admissible parameter space (Lee and Liu, 2010). Notice further that the assumptions regarding the admissible parameters space given here and in Lee and Liu (2010) are sufficient but not necessary and might be overly restrictive. A detailed discussion of a possible relaxation of the constraints on the admissible parameter space is provided by Koch (2011a,b), Elhorst, Lacome, and Piras (2012), and LeSage and Pace (2012).

Assumption 3.

The row and column sums of W_r_,_N, r1,...,R, M_s_,_N, s1,...,S, 1

1 , , ) (  



 R r N r N r N W I  , and 1 1 , , ) (  



 S m N m N m N M

I  are bounded uniformly in absolute value.

By Assumptions 1-3 and Remark A.1 in the Appendix, it follows that E(u_N)0 and the variance-covariance matrix of u_N is given by



    _ _ _      S m N m N m N T S m N N m N m N T N N N E 1 1 , , 1 , 1 , , , (u u ) [I (I M ) ]Ω [I (I M ) ] Ωu    , and (9a) 3

All results of the present paper hold under alternative normalizations of the weights matrices with corresponding modifications of the admissible parameter space (Lee and Liu, 2010).

(10)



    _ _     S m N m N m N S m N m N m N v N N t t E 1 1 , , 1 1 , , 2 2 ) ( ) )( ( )] ( ) ( [u u _  I  M I  M . (9b)

All variables (including X_N) and parameters except for the variances of the error components are allowed to depend on sample size N. As a result, the model specification in Eqs. (1a)-(1c) allows for higher-order spatial dependence in the dependent variable, the explanatory variables, and the disturbances.

III. GM Estimation of a SAR(S) Model

Below, we derive GM estimators for the spatial autoregressive parameters of the disturbance process (1c) and the asymptotic joint distribution of all model parameters.

1. Moment Conditions

With an S-th order process (SAR(S), with S 1), GM estimators of 1,N,...,S,N,

2

v

 , and 12

are obtained by recognizing that – under Assumptions 1 and 2 – the moment conditions used by Kapoor, Kelejian, and Prucha (2007) hold for each matrix M_s_,_N, s1,...,S. Define for each Ms,N, s1,...,S ] ) ( )[ ( ) ( 1 , , , , ,



       S m N N m T N m N N s T N N s T N s I M ε I M u I M u ε  . (10)

The moment conditions are then given by

Ma 2 , 0 , 0 ] ) 1 ( 1 [ ] ) 1 ( 1 [ N N N N N N v T N E T N E       ε Q ε v Q v , (11) M1,s ( ) 1 ] ) ( ) 1 ( 1 [ ] ) 1 ( 1 [ _s_,_N ₀_,_N _s_,_N _N ₀_,_N _T _s_,_N _s_,_N ₀_,_N _N _v2 tr _s_,_N _s_,_N N T N E T N E ε Q ε vQ I M M Q v  M M      , M2,s ( ) ] 0 ) 1 ( 1 [ ] ) 1 ( 1 [ _, ₀_,  ₀_,  _, ₀_,      sN N N N N T sN N N T N E T N E ε Q ε v Q I M Q v , Mb 1, 1, ) 12 1 ( ] ) ( 1 [ ) 1 ( N N N  N T T  N N  N N N  N E N E N E ε Q ε μ e e I μ v Q v , M3,s ( ) ] 1 [ ] ) ( 1 [ ) 1 ( _s_,_N ₁_,_N _s_,_N _N _T _T _s_,_N _s_,_N _N _N ₁_,_N _T _s_,_N _s_,_N ₁_,_N _N N E N E N E ε Q ε  μ ee M M μ  vQ I M M Q v ) ( 1 , , 2 1 tr sN sN N M M  , M4,s (1 s,N 1,N N) [1 N( T T  s,N) N] [1 N 1,N( T  s,N) 1,N N]0 N E N E N E ε Q ε μ e e M μ v Q I M Q v ,

(11)

where 12 v2 T2. The moment conditions associated with matrices Ms,N, s1,...,S,

through (10), are indexed with subscripts 1 to 4. The remaining two moment conditions are independent of s and denoted as Ma and Mb. For an S-th order process as in (2c), we thus have (4S2) moment conditions.

It is apparent that under Assumptions 1-3 from M1,s and M3,s that there are potentially

) 1 (S

S further moment conditions, namely s s tr N T N E sN N sN  v sN sN     (  ), 1 ] ) 1 ( 1 [ , , 2 , , 0 , Q ε M M ε  from M1,s and s s tr N N E sN N sN  ( sN sN),   1 ) 1 ( , , 2 1 , , 1 , Q ε M M ε  from M3,s 4

. For simplicity, we use only the moment conditions in (11) below, which are always available for weights matrices satisfying Assumptions 1 and 2. However, the results carry over to the more general estimator using all

2 ) 1 (

4SS S  moment conditions.

Substituting (3b), (10), and (1c) into the 4S2 moment conditions (11) yields a (4S 2) equation system in ( ,..., , , 12)

2 , ,

1   

 N S N v , which can be written as

0 Γ

γN  NbN  , (12)

where b_N is a [2SS(S1)/22]1 vector, given by

) , , ,..., ,..., , ,..., , ,..., ( 12 2 , , 1 , , 1 , 2 , 1 2 , 2 , 1 , , 1    N SN  N SN  N N  NSN S_ NSN v  N b ,

i.e., b_N contains S linear terms _m_,_N, m1,...,S, S quadratic terms _m2_,_N, m1,...,S,

2 / ) 1 (S

S cross products _m_,_N_l_,_N, m1,...,S1,lm1,...,S, as well as _v2 and ₁2. For later reference, we define the (S2)1 vector of all parameters as

) , , ,..., ( ) , , ( 12 2 , , 1 2 1 2     N v   N SN v  N ρ θ . N

γ is a (4S2)1 vector with elements [i,N], i1,...,(4S2), and ΓN is a )

2 4

( S [2SS(S1)/22] matrix with elements [i,j,N], i1,...,(4S2),

4

The efficiency gain from using these additional moment conditions depends on the properties of the weights matrices. If two weights matrices are orthogonal, i.e., M_s_,_NM_s__,_N 0, the corresponding moment condition is trivially satisfied for any set of (finite) parameter values and does not add any information.

(12)

] 2 2 / ) 1 ( 2 [ ,..., 1     S S S

j . The elements _i_,_N and _i_,_j_,_N will be defined below. The row-index of the elements γ_N and Γ_N will be chosen such that the equation system (12) has the following order. The first four rows correspond to moment restrictions M1,1 to M4,1 associated with matrix M₁_,_N through (10); rows five to eight correspond to M1,2 to M4,2 associated with matrix M₂_,_N, and so forth; rows (S4) to 4S correspond to the M1,S to M4,S associated with matrix MS,N . Finally, rows (4S1) and (4S2) correspond to moment conditions Ma and

Mb, respectively, which are independent of s.

The sample analogue to (12) is given by

) ( ~ ~ N N N N N Γ θ γ  b  , (13)

where the elements of γ~ and N ΓN

~

are equal to those of γN and ΓN with the expectations

operator suppressed and the disturbances uN replaced by (consistent) estimates u~ . N

GM estimates of parameters 1,N,...,S,N,

2

v

 and 12are then obtained as the solution to

)] ( ~ ) ( [ )] ~ ~ ( ~ ) ~ ~ [( min arg 2 1 2 2 1, ,.., , , N N N N N N N N N N N N v S θ Θ θ Γ γ Θ Γ γ             b b , (14)

i.e., the parameter estimates can be obtained from a (weighted) non-linear least squares regression of γ~ on the columns of N ΓN

~

. The optimal choice of the (4S2)(4S2)

weighting matrix ΘN will be discussed below.

Below, we define the elements of γ_N and Γ_N, grouped by the corresponding moment conditions, using N N s T N s I M u u_, (  _, ) , s1,...,S, and (15a) N N m N s T N N m T N s T N sm I M I M u I M M u u _, (  _, )(  _, ) (  _, _, ) , s1,...,S, m1,...,S. (15b) M1,s delivers s1,...,S rows 4(s1)1 in (12): ) 1 ( 1 , 1 ) 1 ( 4    _ T N N s  E(u_s_,_NQ₀_,_Nu_s_,_N), (16a) ) ( ) 1 ( 2 , , 0 , , , 1 ) 1 ( 4 s mN E sN N smN T N  u Q u     , m1,...,S,

(13)

) ( ) 1 ( 1 , , 0 , , , 1 ) 1 ( 4s S mN E smN N smN T N  u Q u       , m1,...,S, ) ( ) 1 ( 2 , , 0 , , 2 / ) 1 ( ) 1 ( , 1 ) 1 ( 4s S m mm l mN E smN N slN T N  u Q u           , m1,...,S1, lm1,...,S, ) ( 1 , , , 1 2 / ) 1 ( 2 , 1 ) 1 ( 4s S S S N tr sN sN N M M        , 0 , 2 2 / ) 1 ( 2 , 1 ) 1 ( 4s  SS S  N   . M2,s consists of s1,...,S rows 4(s1)2 in (12): ) ( ) 1 ( 1 , 0 , , 2 ) 1 ( 4s N E sN N N T N  u Q u     , (16b) ) ( ) 1 ( 1 , , 0 , , 0 , , , 2 ) 1 ( 4s mN E smN N N sN N mN T N  u Q u u Q u     , m1,...,S, ) ( ) 1 ( 1 , , 0 , , , 2 ) 1 ( 4s S mN E smN N mN T N  u Q u       , m1,...,S, ) ( ) 1 ( 1 , , 0 , , , 0 , , 2 / ) 1 ( ) 1 ( , 2 ) 1 ( 4s S m mm l mN E slN N mN smN N lN T N  u Q u u Q u           , m1,...,S1, S m l 1,..., , 0 , 1 2 / ) 1 ( 2 , 2 ) 1 ( 4s  SS S  N   , 0 , 2 2 / ) 1 ( 2 , 2 ) 1 ( 4s  SS S  N   . M3,s corresponds to s1,...,S rows 4(s1)3 in (12): N N s 1 , 3 ) 1 ( 4     E(u_s_,_NQ₁_,_Nu_s_,_N), (16c) ) ( 2 , , 1 , , , 3 ) 1 ( 4 s mN E sN N smN N u Q u     , m1,...,S, ) ( 1 , , 1 , , , 3 ) 1 ( 4s S mN E smN N smN N u Q u       , m1,...,S, ) ( 2 , , 1 , , 2 / ) 1 ( ) 1 ( , 3 ) 1 ( 4s S m mm l mN E smN N slN N u Q u           , m1,...,S1, lm1,...,S, 0 , 1 2 / ) 1 ( 2 , 3 ) 1 ( 4s  SS S  N   , ) ( 1 , , , 2 2 / ) 1 ( 2 , 3 ) 1 ( 4s S S S N tr sN sN N M M        . M4,s represents s1,...,S rows 4(s1)4 in (12): ) ( 1 , 1 , , 4 ) 1 ( 4s N E sN N N N u Q u     , (16d) ) ( 1 , , 1 , , 1 , , , 4 ) 1 ( 4s mN E smN N N sN N mN N u Q u u Q u     , m1,...,S,

(14)

) ( 1 , , 1 , , , 4 ) 1 ( 4s S mN E smN N mN N u Q u       , m1,...,S, ) ( 1 , , 1 , , , 1 , , 2 / ) 1 ( ) 1 ( , 4 ) 1 ( 4s S m mm l mN E slN N mN smN N lN N u Q u u Q u           , m1,...,S1, lm1,...,S, 0 , 1 2 / ) 1 ( 2 , 4 ) 1 ( 4s  SS S  N   , 0 , 2 2 / ) 1 ( 2 , 4 ) 1 ( 4s  SS S  N   .

Ma reflects the equation in row (4S1) of (12):

) ( ) 1 ( 1 , 0 , 1 4S N E N N N T N  uQ u    , (16e) ) ( ) 1 ( 2 , 0 , , , 1 4S mN E mN N N T N  u Q u    , m1,...,S, ) ( ) 1 ( 1 , , 0 , , , 1 4S S mN E mN N mN T N  u Q u      , m1,...,S, ) ( ) 1 ( 2 , , 0 , , 2 / ) 1 ( ) 1 ( , 1 4S S m mm l mN E mN N lN T N  u Q u          , m1,...,S1, lm1,...,S, 1 , 1 2 / ) 1 ( 2 , 1 4S SS S  N   , 0 , 2 2 / ) 1 ( 2 , 1 4S SS S  N   .

Mb is associated with row (4S2) of (12):

) ( 1 , 1 , 2 4S N E N N N N uQ u    , (16f) ) ( 2 , 1 , , , 2 4S mN E mN N N N u Q u    , m1,...,S, ) ( 1 , , 1 , , , 2 4S S mN E mN N mN N u Q u      , m1,...,S, ) ( 2 , , 1 , , 2 / ) 1 ( ) 1 ( , 2 4S S m mm l mN E mN N lN N u Q u          , m1,...,S1, lm1,...,S, 0 , 1 2 / ) 1 ( 2 , 2 4S SS S  N   , 1 , 2 2 / ) 1 ( 2 , 2 4S SS S  N   .

For future reference, we define the (2S1)1 vector γ0_N as the sub-vector containing rows s and (s1), s 1,...,S and row (4S1)of γ_N, corresponding to M1,s, M2,s, and Ma. Moreover, we define the (2S1)[2SS(S1)/21] matrix Γ0N as the sub-matrix

containing rows s and (s1), s 1,...,S, and row (4S1) of Γ_N, corresponding to M1,s, M2,s, and Ma.

(15)

Analogously, we define the (2S1)1 vector γ1N as the sub-vector containing rows 2s,

) 1 2

( s , s1,...,S, and row (4S2) of γN, corresponding to M3,s, M4,s, and Mb. Finally, we

define the (2S1)[2SS(S1)/21] matrix Γ1_N as the sub-matrix containing rows 2s,

) 1 2

( s , s1,...,S, and (4S2) of Γ_N, corresponding to M3,s, M4,s, and Mb.

2. Definition of GM Estimators

We next define three alternative GM estimators for the spatial autoregressive parameters of the disturbance process given by (1c) and the variances of the error components.5

2.1. Initial GM Estimation

The initial GM estimator is a special case of (14), using the identity matrix as weighting matrix Θ_N and a subset of moment conditions (Ma, M1,s and M2,s) only. It is based on γ0N and

0

N

Γ . Define θ0_N as the corresponding parameter vector that excludes ₁2, i.e., ) , ,..., ( ) , ( 1, , 2 2 0 v N S N v N        ρ θ , and accordingly ) , ,..., ,..., , ,..., , ,..., ( ₁_, _, ₁2_, 2_, ₁_, ₂_, ₁_, _, ₁_, _, 2 0    N SN v S N S N N N N S N N S N N            b .

The initial GM estimator is then obtained as the solution to

} ] , 0 [ , ), ( ) ( min{ arg ) , ,..., ( 2, 0 0 0 0 2 , , 1,N S N vN  N N N N    vN bv    a ρ a θ θ    , (17a) with N0(θ0)N0(ρ,2v) ) ~ ~ (γ_N0 Γ0_Nb0 .

Using these initial estimates of (₁_,_N,...,_S_,_N) and _v2, ₁2 can be estimated from moment condition Mb:



      S m N m N m N N S m N m N m N N N 1 , , , 1 1 , , 2 , 1 ) ~ ~ ( ) ~ ~ ( 1 u u Q u u      (17b) ... ~ ~ ... ~ ~ 2 , 1 1 , 2 4 , , 2 4 , 1 1 , 2 4 2 4S γ S N γ S S SN γ S S N γ _  _    _   _ _   . ~ ... ~ ~ , , 1 2 / ) 1 ( 2 , 2 4 , 2 , 1 1 2 , 2 4 2 , 2 , 2 4S S SN γ S S N N γ S S S S S N S N γ _   _ _    _ _ _  _   2.2. Weighted GM Estimation

While the initial GM estimator as defined in (17) is consistent, it is inefficient. First, it ignores the information contained in moment conditions (Mb, M3,s and M4,s). Second, as is known from the literature on GMM-estimation, it is optimal to use as weighting matrix the inverse of the (properly normalized) variance-covariance matrix of the moments, evaluated at true

5

See Kapoor, Kelejian, and Prucha (2007) for analogous conditions under SARAR(0,1) estimation, assuming only nonstochastic regressors in equation (1a).

(16)

parameter values. Denote the optimal weighting matrix, which will be derived in Subsection 3.2, by Ψ_N1 and its estimate by Ψ~_N1. The optimally weighted GM estimator is based on all

) 2 4

( S moment conditions and uses Θ~_N Ψ~_N1 as the weighting matrix. It is defined as } ] , 0 [ ], , 0 [ , ), ( ~ ) ( { min arg ) ~ , ~ , ~ ,..., ~ ( 12 2 2 1, 2 , , 1,N SN vN  N  N  NN    v bv   c    a ρ a θ Θ θ , with cbvTb_, and ( ) ( , , ) 2 1 2    N θ N ρ v ) ~ ~ (γN ΓNb . (18)

As already mentioned, the optimal weighting matrix is derived without distributional assumptions and involves third and fourth moments of the error components v_it_,_N and _i_,_N. Kapoor, Kelejian, and Prucha (2008) use the assumption that ε_it_,_N is normally distributed to obtain a simplified weighting matrix as an approximation of the true optimal weighting matrix. For comparison, we also consider such a weighting matrix, which is a special case of

1



N

Ψ (see the Appendix) and referred to as (Ψ_N)1. The simplified weighted GM estimator is defined as the weighted GM estimator given in (18), using Θ~_N (Ψ~_N)1.

3. Asymptotic Properties of the GM Estimator for θ _N 3.1 Consistency

For proving consistency, the following additional assumptions are introduced:

Assumption 4.

Assume that u~_N u_N D_NΔ_N, i.e., u~_i_,_N u_i_,_N d_i_.,_NΔ_N, for i1,...,NT,6 where D_N is an P

NT matrix, the 1P vector d_i_.,_N denotes the i-th row of D_N and Δ_N is a P1 vector. Let dij,N be the j-th element of di.,N. For some  0, we assume that  

 d N ij t c d E , ( )2  ,

where c_d does not depend on N, and that N1/2 Δ_N O_p(1).

Assumption 4 will hold in many settings, e.g., if model (1a) contains endogenous variables (such as spatial lags of yN) and is estimated using 2SLS. In that case, ΔN denotes the

difference between the parameter estimates and the true parameter values and d_i_.,_N is the (negative of the) i-th row of the design matrix Z_N (compare Lemma 1 in Subsection 2 of Section IV).

Assumption 5.

6

(17)

(a) The smallest eigenvalues of Γ0_NΓ0_N and Γ1_NΓ1_N are bounded away from zero, i.e., 0 ) ( * min     i N i N Γ

Γ for i = 1, 2. (b) Θ~N ΘN op(1), where ΘN are (4S2)(4S2)

nonstochastic, symmetric, positive definite matrices. (c) The largest eigenvalues of Θ_N are bounded uniformly from above, and the smallest eigenvalues of Θ_N are bounded uniformly away from zero.

Assumption 5 implies that the smallest eigenvalues of Γ_NΓ_N and Γ_NΘ_NΓ_N are bounded uniformly away from zero, ensuring that the true parameter vector θ_N is identifiable unique. Moreover, by the equivalence of matrix norms, it follows from Assumption 5 that Θ_N and

1



N

Θ are O(1).

Assumptions 1-5 ensure consistency of the GM estimators for θ_N (ρ_N,_v2,₁2) as summarized in the following theorems (see Appendix B for a proof).

Theorem 1a. Consistency of Initial GM Estimator ~θ_N0

Suppose Assumptions 1-5 hold. Then, provided the optimization space contains the parameter space, the initial GM estimators θ0_N (ρ₁_,_N,...,ρ_S,_v2_,_N) defined by (17a), and ₁2_,_N, defined by (17b) are consistent for _1,_N,...,_S_,_N, _v2, and ₁2, i.e.,

0 , s, p N s N    , s1,...,S, _v2_,_N _v2p 0 , and ₁2_,_N ₁2p 0 as N.

Theorem 1b. Consistency of Weighted GM Estimator θ~_N

Suppose Assumptions 1-5 hold. Then, provided the optimization space contains the parameter space, the weighted GM estimators ~ (~ ) [~ (~ ),...,~ (~ ),~ (~ ),~2 (~ )]

, 1 2 , , , 1   N N SN N vN N N N N N Θ ρ Θ ρ Θ Θ Θ θ  

defined by (18) are consistent for 1,N,...,S,N,v2, and

2 1  , i.e., 0 ) ~ ( ~ , s, p N s N N    Θ , s1,...,S, ~2, (~ ) 2 0 p v N N v    Θ , and ~ (~ ) 12 0 2 , 1 p N N    Θ as N .

This result holds for an arbitrary weighting matrix (satisfying Assumption 5). Hence, it applies to both the optimally weighted GM estimator defined by (18) with Θ~_N (Ψ~_N)1and its simplified variant θ~_N with Θ~_N (Ψ~_N)1.

3.2 Asymptotic Distribution of GM Estimator for θ_N

In the following we consider the asymptotic distribution of the optimally weighted GM estimator θ~_N. To establish asymptotic normality of θ~_N (~ρ_N,~_v2_,_N,~₁2_,_N), we introduce some additional assumptions.

(18)

Assumption 6.

Let DN be defined as in Assumption 4, such that u~N uN DNΔN. For any real NTNT

matrix AN, whose row and column sums are bounded uniformly in absolute value, it holds

that N 1 N N N N 1E( N N N)op(1)   u A D u A D .

A sufficient condition for Assumption 6 is, e.g., that the columns of D_N are of the form

N N

N Π ε

π  , where the elements of π_N are bounded uniformly in absolute value and the row and column sums of Π_N are bounded uniformly in absolute value (see Kelejian and Prucha, 2010, Lemma C.2). This will be the case in many applications, e.g., for the model in Eq. (1a), if D_N equals (the negative of) matrix Z_N (compare Lemma 1 in Section IV).

Assumption 7.

Let Δ_N be defined as in Assumption 4. Then, ) 1 ( ) ( ) (NT 1/2Δ_N  NT 1/2T_Nξ_N o_p , with T_N (T_v_,_N,T__,_N), ξ_N (v_N,μ_N), i.e., ) 1 ( ) ( ) ( ) (NT 1/2Δ_N  NT 1/2T_v_,_Nv_N  NT 1/2T__,_Nμ_N o_p ,

where TN is an (NTN)P-dimensional real nonstochastic matrix whose elements are

bounded uniformly in absolute value; T_v_,_N is of dimension (NTP) and T__,_N is of dimension (NP). As remarked above, Δ_N typically denotes the difference between the parameter estimates and the true parameter values. Assumption 7 will be satisfied by many estimators. In Section IV, we verify that it holds if the model in Eq. (1a) is estimated by TSLS or feasible generalized TSLS (FGTSLS).

The limiting distribution of the GM estimator of θNwill be shown to depend on (the inverse

of) the matrix J_NΘ_NJ_N and the variance-covariance matrix of a vector of quadratic forms in

N

v and μN, denoted as qN. We consider each of these expressions in the following. The ) 2 ( ) 2 4

( S  S matrix JN of derivatives of the (4S2)1 vector of moment conditions in

(11) is given by θ Γ γ θ J      ( ) ) ( N N N N N b ) , , ,..., ( _,₁_, _, _, _, _, _, _, 1 N i N i N S i N i j j j j v    , with (19a)  N s i j_, _, s N N i N i    (γ., Γ., b ) , i1,...,(4S2), s1,...,S,

(19)

 N i v j_,_ _, v N N i N i    (γ_., Γ_., b ) , i1,...,(4S2),  N i j,₁, 1 ., ., ) (     γ_i _N Γ_i _Nb_N , i1,...,(4S2),

where γ_i_.,_N and Γ_i_.,_N denote the i-th row of γ_N and Γ_N respectively.

Using 0 θ γ     _N

and ignoring the negative sign, we have

N N N N N Γ b Γ B θ ρ J      ) ( , (19b)

where Γ_N is defined above and of dimension (4S2)[2SS(S1)/22] and B_N is a

) 2 ( ] 2 2 / ) 1 ( 2

[ SS S   S matrix of the form

) , , , ( 1 2, 3, 4,   N N N N B B B B B , (20a) with B1 (IS,0S2), 2, [ 1(2 s,N), S2)] S s N diag  0 B , and ] , ) ,..., [( ₃_,₁_, ₃_, ₁_, ₍ ₁₎_/₂ ₂ , 3N  B N BS N 0S S  B is an S(S1)/2(S2) matrix. The (S1) vertically arranged blocks, B₃_,_m_,_N, m1,...,(S1), have the following structure:

) , , ( _, _, _, , , 3mN CmN dmN EmN B  , (20b)

where C_m_,_N is a (Sm)(m1) matrix of zeros,7 d_m_,_N is a (Sm)1 vector, defined as )

,...,

( ₁_, _,

,N  m N S N 

m  

d , and E_m_,_N _m_,_NI_S__m. Finally, B₄_,_N is a 2(S2)matrix, defined as           1 , 0 , 1 , 1 1 1 , 4 S S N 0 0 B . (20c)

We next consider the vector q_N and its limiting distribution. First, define q_N(θ_N,Δ_N) as the

1 ) 2 4

( S  vector of sample moments as given by (11) with the expectation operator suppressed, evaluated at the true parameter values, and ignoring the deterministic constants:

7

(20)

 ) , ( N N N θ Δ q                                              N N b N N N a N N N S N N N S N N N S N N N S N N N N N N N N N N N N N N u C u u C u u C u u C u u C u u C u u C u u C u u C u u C u ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ . ~ ~ ~ ~ ~ ~ ~ ~ , , , , 4 , , 3 , , 2 , , 1 , 1 , 4 , 1 , 3 , 1 , 2 , 1 , 1 1 , (21) where



           S m N m N m N T N N s N s T N S m N m N m N T N s T 1 , , , 0 , , , 0 1 , , , , 1 [ ( )] ( ) [ ( )] ) 1 ( 1 M I I Q M M I Q M I I C   ,



            S m N m N m N T N N s N s T N S m N m N m N T N s T 1 , , , 0 , , , 0 1 , , , , 2 [ ( )] [ ( )] [ ( )] ) 1 ( 2 1 M I I Q M M I Q M I I C   ,



          S m N m N m N T N N s N s T N S m N m N m N T N s 1 , , , 1 , , , 1 1 , , , , 3 [I (I M )]Q (I M M )Q [I (I M )] C   ,



           S m N m N m N T N N s N s T N S m N m N m N T N s 1 , , , 1 , , , 1 1 , , , , 4 [ ( )] [ ( )] [ ( )] 2 1 M I I Q M M I Q M I I C   ,



         S m N m N m N T N S m N m N m N T N a T 1 , , , 0 1 , , , [ ( )] [ ( )] ) 1 ( 1 M I I Q M I I C   ,



        S m N m N m N T N S m N m N m N T N b 1 , , , 1 1 , , , [I (I M )]Q [I (I M )] C   . (22)

By Assumption 3 and Remark A.1 in Appendix A, the row and column sums of the symmetric NTNT matrices C_p_,_s_,_N, p1,...,4, s1,...,S, C_a_,_N, and C_b_,_N are bounded uniformly in absolute value. Also, note that C₃_,_s_,_N, C₄_,_s_,_N, C_b_,_N differ from C₁_,_s_,_N, C₂_,_s_,_N,

N a,

C only by the normalization and the use of Q₁_,_N versus Q₀_,_N.

In light of (21) and Lemma B.1 (see Appendix B), the elements of N1/2qN(ρN,ΔN) can be expressed as

(21)

 ) , ( 2 / 1 N N N N q θ Δ (1) . 2 / 1 , , 2 / 1 2 / 1 , , 2 / 1 2 / 1 , , 4 , , 4 2 / 1 2 / 1 , , 3 , , 3 2 / 1 2 / 1 , , 2 , , 2 2 / 1 2 / 1 , , 1 , , 1 2 / 1 2 / 1 , 1 , 4 , 1 , 4 2 / 1 2 / 1 , 1 , 3 , 1 , 3 2 / 1 2 / 1 , 1 , 2 , 1 , 2 2 / 1 2 / 1 , 1 , 1 , 1 , 1 2 / 1 p N N b N N a N N N a N N a N N N S N N S N N N S N N S N N N S N N S N N N S N N S N N N N N N N N N N N N N N N N N N N N N o N N N N N N N N N N N N N N N N N N N N                                                                              Δ α u C u Δ α u C u Δ α u C u Δ α u C u Δ α u C u Δ α u C u Δ α u C u Δ α u C u Δ α u C u Δ α u C u , (23) where 2 ( ,, ) 1 , ,sN N psN N p N E D C u α    , p1,...,4, s1,...,S, 2 ( , ) 1 ,N N aN N a N E D C u α    , and ) ( 2 , 1 ,N N bN N b N E D C u α   

. By Lemma B.1 the elements of the P1 vectors αp,s,N,

4 ,..., 1



p , s1,...,S, α_a_,_N and α_b_,_N are bounded uniformly in absolute value.

Using (22), (3c), Assumption 7, and Q₀_,_Nε_N Q₀_,_Nv_N we obtain:

) , ( 2 / 1 N N N N q θ Δ (1) ) 1 ( 1 . )] ( [ 2 1 ) ( )] ( [ ) 1 ( 2 1 ) ( ) 1 ( 1 , , 1 , , 0 , , 4 , 1 , , , 1 , , 3 , 1 , , , 1 , , 2 , 0 , , , 0 , , 1 , 0 , , , 0 2 / 1 p N N b N N N N N a N N N N N s N N N s N s T N N N N s N N N s N s T N N N N s N N N s N s T N N N N s N N N s N s T N N o T T T N                                                                 ξ a ε Q ε ξ a v Q v ξ a ε Q M M I Q ε ξ a ε Q M M I Q ε ξ a v Q M M I Q v ξ a v Q M M I Q v , (25)

for s1,...,S. The (NT N)1 vector ξ_N (v_N,μ_N), a_p_,_s_,_N T1T_Nα_p_,_s_,_N, p1,...,4, S

s1,..., , a_a_,_N T1T_Nα_a_,_N, and a_b_,_N T1T_Nα_b_,_N, which can also be written as ] ) ( , ) [( ) , ( _,_, _, _, 1 _, _,_, _, _,_, , ,        N s p N N s p N v N s p v N s p N s p a a T T α T α a  _ , s1,...,S, p1,...,4, and ] ) ( , ) [( ) , ( _, _, 1 _, _, _, _, ,        N a N N a N v N a v N a N a a a T T α T α a  _ , ] ) ( , ) [( ) , ( , , , , 1 , , ,        N b N N b N v