Parameter estimation and inference with spatial lags and cointegration

(1)

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=lecr20

Econometric Reviews

ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: https://www.tandfonline.com/loi/lecr20

Parameter estimation and inference with spatial

lags and cointegration

Jan Mutl & Leopold Sögner

To cite this article: Jan Mutl & Leopold Sögner (2019) Parameter estimation and inference with spatial lags and cointegration, Econometric Reviews, 38:6, 597-635, DOI: 10.1080/07474938.2017.1382803

To link to this article: https://doi.org/10.1080/07474938.2017.1382803

View supplementary material

Accepted author version posted online: 21 Sep 2017.

Published online: 08 Nov 2017. Submit your article to this journal

Article views: 451

(2)

Parameter estimation and inference with spatial lags and

cointegration

Jan Mutla_{and Leopold Sögner}b

a_{EBS Real Estate Management Institute, EBS Business School, Germany;}b_{Institute for Advanced Studies and Vienna} Graduate School of Finance, Austria

ABSTRACT

This article studies dynamic panel data models in which the long run outcome for a particular cross-section is affected by a weighted average of the outcomes in the other cross-sections. We show that imposing such a structure implies a model with several cointegrating relationships that, unlike in the standard case, are nonlinear in the coefficients to be estimated. Assuming that the weights are exogenously given, we extend the dynamic ordinary least squares methodology and provide a dynamic two-stage least squares estimator. We derive the large sample properties of our proposed estimator under a set of low-level assumptions. Then our methodology is applied to US financial market data, which consist ofcredit defaultswap spreads, as well as firm-specific and industry data. We construct the economic space using a “closeness” measure for firms based on input–output matrices. Our estimates show that this partic-ular form of spatial correlation of credit default swap spreads is substantial and highly significant.

KEYWORDS

Cointegration; credit risk; dynamic ordinary least squares; spatial autocorrelation

JEL CLASSIFICATION

C31; C33; G13

1. Introduction

This article investigates the estimation of nonstandard cointegrating relationships under the presence of regressor endogeneity and serial correlation in the disturbances. Following literature on panel cointegration, we augment the cointegrating vectors by peer or neighborhood effects, which are modeled as spatial lags following Cliff and Ord (1973). In addition to the kind of endogeneity typically dealt with in panel cointegration models (Mark and Sul,2003; Mark et al.,2005; Baltagi,2008; Pesaran,2015), this article shows that the spatial lag results infurther regressor endogeneityof a different type.

Several approaches have emerged in the literature to estimate cointegrating relationships and to perform statistical inference. One possibility is to use a simple estimation routine, e.g.,ordinary least squares(OLS) and then work out the (sometimes complicated) large sample distribution of the estimated parameters (Phillips and Hansen,1990; Phillips and Loretan,1991). Another opportunity is to adjust the estimation routine, such that the large sample distribution is either simpler or free of nuisance parameters. Examples along these lines are thefully modified least squaresestimator (see, e.g., Phillips and Hansen, 1990; Phillips and Moon, 1999; Pedroni, 2000), the integrated modified least squares estimator (Vogelsang and Wagner, 2014, in which integrated modified least squares estimation is linked to fixed-b inference) and thedynamic least squaresapproach. Dynamic least squares estimation includes time-series leads and lags of the first differences of the regressors to control for the serial correlation and regressor endogeneity. This kind of estimator has been proposed by Phillips and Loretan (1991), Saikkonen (1991), and Stock and Watson (1993). It has been applied to panel data, e.g., in

CONTACTLeopold Sögner [email protected] Institute for Advanced Studies and Vienna Graduate School of Finance, Josefstädter Straße 39, 1080 Vienna, Austria.

Supplemental data for this article can be accessed on thepublisher’s website.

Published with license by Taylor & Francis Group, LLC © Jan Mutl and Leopold Sögner

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http:// creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.

2019, VOL. 38, NO. 6, 597–635

(3)

Kao and Chiang (2000), Mark and Sul (2003) and Mark et al. (2005), in which a seemingly unrelated regression (SUR) type model is considered (for an overview on panel cointegration, see, e.g., Pesaran,

2015, Chapter 31). A further alternative is provided by the autoregressive distributed lag approach, in which cointegrating relationships including lagged variables can be investigated (the reader is referred to Pesaran and Shin,1995; Binder et al.,2005; Chudik and Pesaran,2013a,b). Almost recently, spatial correlation and cointegration have been investigated in Yu et al. (2008) and Sögner and Wagner (2017). While Yu et al. (2008) considered maximum-likelihood estimation in a spatial dynamic panel model, Sögner and Wagner (2017) and Sögner et al. (2017) investigatefully modifiedas well asintegrated modified OLSestimation in a model close to the model presented in this article.

This article develops an econometric tool suitable for investigating situations, in which the long run outcome for a particular cross-section cannot be assumed to be independent of the outcomes of the other cross-sections and, at the same time, autocorrelation of the disturbances and regressor endogeneity are present. To adequately cope with the endogeneity arising from the inclusion of the spatial lags, we propose using adynamic two-stage least squares(D2SLS) estimator, which combinesdynamic least squares(DOLS) andtwo stage least squares(2SLS) estimation. Section2describes the model assumptions and Section3obtains the large sample distribution of our estimator and shows how to correctly conduct inference. The finite sample properties are investigated by a Monte Carlo study in Section4.

Finally, Section5applies our methodology to a financial dataset, in which we construct the economic space using a “closeness” measure for firms based on input–output matrices. The weights matrix obtained from input–output data should approximate possible correlation patterns due to technology and demand shocks working their way through the economy. We find that our particular form of cross-sectional spillovers is substantial and highly significant.

2. The model and assumptions

Suppose that the data are generated from1 yit=ρ

n X

j=1

Wijyjt+β′IxIit+β′CxCt+β′LxLi+αi+λt+u†_it =ρyit∗+β′xit+β′LxLi+αi+λt+u†_it, (1)

whereyitis the scalar response random variable andxitis ak×1 column vector of prediction random

variables. The vectorxit is integrated of order one (I(1)) and permitted to consist ofindividual specific

regressorsxIitof dimensionkI ≥ 1, andcross-sectionally common regressorsxCtof dimensionkC ≥ 0.

The regressorsxLt, of dimensionkL≥0, are time invariant. Hence,β:= β′I,β′C _′ ∈Rk_{, with}_β I ∈RkI, β_C _∈ RkC _and_k ₌ _k_I₊_k_C_{. In addition,}_x_it _:₌ _x′ Iit,x′Ct _′

∈ Rk_{. The time index is}_t ₌ _1,_{. . .}_,_T,

whilei₌1,. . .,nis the cross-sectional index. The individual and time effects,αi,i=1,. . .,n, andλt,

t ₌1,. . .,T, are (as in fixed effects model) allowed to be correlated withxit andxLi. The termy∗it :=

Pn

j=1Wijyjtis referred to as aspatial lagand represents the contemporaneous impact of the neighboring

observations onyit(see, e.g., Cliff and Ord,1973; Kelejian and Prucha,1998,1999; Kapoor et al.,2007). 1_{In this article, the following notation will be applied: For vectors and matrices we use boldface. If not otherwise stated, the} vectors considered are column vectors. Given arM×cMmatrixM, the termM(ra:rb,ca:cb)=[M](ra:rb,ca:cb)stands for “from

rowrato rowrband from columncato columncbof matrixM”. Thei,jelement ofMis [M](i,j)or in shorter notationMij. 0(a×b)and1(a×b)stands fora×bmatrix of zeros and ones;0(a)and is used to abbreviate0(a×1);Iais thea×aidentity matrix, whileI(·)stands for an indicator function. Given a vectorx∈Rn,diag(x)transformsxinto an×ndiagonal matrix,

while forxi∈Rℓ×k(i=1,. . .,n),diag(x1,. . .,xn)yields anℓ×nkblock diagonal matrix.⊗denotes the Kronecker product. 2 E-3 stands for 2_·10−3₌_{0.002. [}_Tr_{] denotes the integer part of}_Tr_,_⇒_{stands for}_{weak convergence}_{(see, e.g., Klenke,}₂₀₀₈_, Chapter 13), while_→P anda.s._→denote convergence in probability and almost sure convergence (see, e.g., Klenke,2008, Chapter 6).ν_∼N₍a,A)denotes thatνis a normally distributed vector with meanaand covarianceA, whileχ_n2denotes a

χ2random variable withndegrees of freedom. Variables where the within-transform described in (9), (12), or (14) is applied, are—if not otherwise stated—denoted by the superscripts_˜,_`, and_˘, respectively.1xt=xt−xt−1abbreviates the first difference ofxt.

(4)

We collect the weightsWijinto ann×nspatial weights matrixWand follow the spatial econometrics

literature by maintaining the following assumption regarding the cross-sectional (or spatial) structure of the model2:

Assumption 1(Spatial Lag). The spatial weights Wijare nonstochastic and observable with Wii=0and W6=0(n×n). The parameterρis such that the largest absolute eigenvalue ofρWis smaller than one.

The restriction thatWii =0 is a normalization of the model, which requires that no observation is its

own neighbor. The second part of the assumption guarantees that the matrix(In−ρW)is invertible (see,

e.g., Horn and Johnson,1985, Corollary 5.6.16). That is, the inverseK:=(In−ρW)−1is well-defined.3

The invertibility of the matrix(In−ρW)is needed to provide a unique solution of the model and rule

out multiple solutions foryitthat would be consistent with the explanatory variables and disturbances.

To obtain the prediction variables, letxIt :=

x′_Iit_i₌_1,_..._,_n _′

∈RkI·n_and_x

t := x′_It,x_Ct′ ′ ∈RkIn+kC.

We assume thatxtfollows a vector random walk process, i.e.,xt =xt−1+vt, wherev_It:=1xIt,v_Ct:=

1xCtandvt := v′It,v′Ct _′

. In addition, we definev_Iit:₌1xIitandvit := v′Iit,v′Ct _′

∈Rk_{. Together with}

the noise termsu†_t :₌ u†₁_t,. . .,u†_nt′ _∈ Rn_{, we define}_η†

Iit := u†it,vIit′ ′ ∈ RkI+1_,_η† It := u†t′,v′It ′ ∈ R(kI+1)·n_and_η† t := u†t′,v′It,v′Ct ′

∈R(kI+1)·n+kC_{. The (auto-)}_covariance_{matrices of}_η†

t, abbreviated by

Ŵ†_ℓ _∈R((kI+1)·n+kC)×((kI+1)·n+kC)_{, for any lag}_ℓ_∈_Z_{, are}

Ŵ†_ℓ :₌E η_t†₋_ℓη†_t′₌ Ŵ†_ℓ_,_uuŴ†_ℓ_,_uv Ŵ†_ℓ_,_vu Ŵ_ℓ_,_vv = _E u†_t₋_ℓu†_t′E _u† t−ℓv′t E _v t−ℓu †′ t _E vt−ℓv_t′ , where Ŵ†_ℓ_,_uu:₌E _u† t−ℓu †′ t =diagŴ_ℓ†_,_u 1u1,. . .,Ŵ † ℓ,unun ∈Rn×n_. ₍₂₎ Let3†:=P∞ℓ=1E η†t−ℓη †′ t

∈R((kI+1)_·n+kC)_×((kI+1)_·n+kC)_{. Then, the}_{long run covariance matrix}†_∈ R((kI+1)·n+kC)×((kI+1)·n+kC)_{and the}_{half-long run covariance matrix}₁†_∈_R((kI+1)·n+kC)×((kI+1)·n+kC)_of

η†_t are given by (see, e.g., Phillips and Hansen,1990) †:₌ ∞ X ℓ=−∞ E η†_t₋_ℓη†_t′₌Ŵ†₀₊3†₊3†′ and 1†:₌ ∞ X ℓ=0 E η†_t₋_ℓη†_t′₌Ŵ†₀₊3†. (3)

† contains the submatrices: _vv := Pℓ∞=−∞E v†t−ℓv

†′ t ∈ R(kI·n+kC)_×(kI·n+kC)_, vivj := P_∞ ℓ=−∞E v†i,t−ℓv †′ j,t ∈R(kI+kC)_×(kI+kC)_{, and}† uu:= P_∞ ℓ=−∞E u†t−ℓu †′ t =diag †_u 1u1,. . ., † unun ∈

Rn×n_{. The same notation is applied to}₁†_{. In order to fully specify the model, we augment our set of} assumptions by

Assumption 2(Error Dynamics I). η†_Iitandη_Ijt† are independent for all i,j = 1,. . .,n with i6= j. The common componentsvCt∈RkC are permitted to be correlated withη†_Iit. The stochastic process η†t

t∈Zis

weakly stationary and obeys a functional central limit theorem. That is, 1 √ T [Tr] X t=1 η†_t _⇒B†₍_r₎₌ †1/2_W†₍_r₎_, ₍₄₎ where r _{∈ [}0, 1_]andB†₍_r₎_{is a Brownian motion in}_R(kI+1)·n+kC_{, while}W†₍_r₎_{is a standard Brownian} motion inR(kI+1)·n+kC_{. The long run covariance matrix}†_∈R((kI+1)·n+kC)×((kI+1)·n+kC)_{is finite and of} 2_{Throughout the analysis we only consider one spatial lag term. However, the theory considered in this article can also be} applied to a model wherey_it₌ρ1Pnj=1W1,ijyjt+ · · · +ρkρPjn=1Wkρ,ijyjt+β′xit+αi+u†itin a straightforward way. The restriction that only one matrixWis included is used to keep the notation simple.

3_{The spectral radius is the lower bound for every induced matrix norm (cf. Theorem 5.6.9 in Horn and Johnson,}₁₉₈₅_{). Our} assumption will, for example, be satisfied when the maximum absolute row or column sums ofρWare less than one.

(5)

full rank;0<†<_∞in short form. Moreover, 1 T T X t=2   t−1 X j=1 v_j  u†_t _⇒ Z 1 0 B v(s)dB†u(s)′+3†vu. (5)

In addition, for η†_tη†_t′ a weak law of large numbers holds, i.e. _T1 PT_t₌₁η†_tη†_t′ _→P E η†_tη†_t′. To v_tu†_it a central limit theorem can be applied. That is, √1

T PT t=1 vtu†it−E vtu†it ⇒ ν₍_vu†₎, where ν₍_vu†₎ ∼ N 0(kIn+kC×1),D_v tu†it and 0 < D vtu†it _< _∞_{. Convergence in (}₄_{), (}₅_{) and} 1 √ T PT t=1 vtu†it−E vtu†it is joint.

The parameter vectorβ ₆₌0(k×1)and k≥ kI ≥ 1. If kL >0the time invariant regressorsxLi ∈RkL

are independent ofxjsas well as u†jsfor all i,j=1,. . .,n and all t=1,. . .,T.

The Brownian motionB†₍_r₎_∈_R(kI+1)·n+kc_{consists of the pairwise independent Brownian motions}

B

vIi(r), BvIj(r)∈RkI, fori,j=1,. . .,n, the Brownian motionBvC(r)∈RkCarising from the common

componentvCt, and the Brownian motionsBui†(r) ∈ Rarising from the common components u†it.

Then,B_v(r) := B

vI1(r)

′_,_{. . .}_,B

vIn(r)′,BvC(r)′

_′

∈ RkI·n+kC_{. The same notation is applied to}_W†₍_r₎_. In addition,B_v i(r):= B vIi(r)′,BvC(r)′ _′ ∈ Rk_{, where, in general,}_B† ui(r)andBvi(r)as well asB † ui(r) andB v(r)are correlated.

Conditions on the stochastic process (η†_t)t∈Z, where a functional central limit theorem holds, are provided in de Jong and Davidson (2000), Davidson (1994), and White (2001), as well as in Mark and Sul (2003), Mark et al. (2005), and Phillips (2014). In the latter articles(η†_t)has a moving average representationη†_t ₌ 9†(L)ε†_t, where9†(L)is a lag polynomial andε_t† ₌ ε†_I₁′_t,. . .,ε†_Int′ ,ε′_Ct′ _∈

R(kI+1)n+kC_{is a noise term. This also allows for heteroscedastic}_ε†

t, which can be important if financial

data sets are analyzed (Section5).

Weak stationary of(η†_t), in particular of the componentsu†_it, is necessary to obtain a cointegrating relationship in (1). To see this, by Assumption2, (xt)follows a vector random walk process and is

therefore integrated of order one.yitarises from a weighted sum ofI(1)random variables, the fixed effects

α_iandλ_t, the (stationary) termβ′_LxLias well as the stationary noise termu†it. To exclude cointegration

relationships between the components ofxitand to guarantee thatyitisI(1), we imposed the assumption

thatβ₆₌0(k)and that the long run covariance matrix is of full rank. Hence, by Assumption2the rank of

_vviskI·n+kC. The regressorsxLiare assumed to be independent ofη†t and therefore strictly exogenous,

such thatu†_jtandxLiare uncorrelated for all pairsi,j=1,. . .,nandt=1,. . .,T. Remark 1. Byyt := y1t,. . .,ynt′,y∗t := y∗1t,. . .,y∗nt

′

,u†_t := u†₁_t,. . .,u†_nt′,x_L := x′_L₁,. . .,x′_Ln′∈

RnkL_,_λ_:₌_(λ

1,. . .,λT)′α:=(α1,. . .,αn)′,β˜ :=In⊗β′∈ Rn×nk, andβ˜L :=In⊗β′L ∈Rn×nkL, we

obtain thestructural form(triangular system)

yt =ρy∗t + ˜βxt+ ˜βLxL+α+λt1(n×1)+u†t , where xt =xt−1+vt. (6)

Assumption1guarantees thatIn−ρWhas the full rankn. This allows us to obtain thereduced form yt=(In−ρW)−1 ˜ βxt+ ˜βLxL+α+λt1(n×1)+u†t , wherext =xt−1+vt. (7)

Observe that the system constitutesncointegrating equations. The cointegrating relationships do not have the usual linear form in the sense that the solution foryitis a nonlinear function of the parameter

(6)

Assumption2does not exclude correlation betweenvit andu†it. (ii) Forρ 6= 0,yjtdepends onyitand

vice versa. (iii)u†_it andu†_jtare independent by Assumption2. (iv) Sinceyjtdepends onyit we know that

y∗_it =Wijyjtandu†itare correlated in general. (v) By Assumption2there is no correlation betweenu†jt

andxL.

3. Estimation procedure and large sample results

The goal of the following analysis is to construct a dynamic two-stage least squares (D2SLS) estimator and to show that it leads to asymptotically unbiased estimates of the parametersρ,β_Iandβ_Corβ_L. In particular, Section 3.1 applies the methodology of projecting on the leads and lags of the (first differenced) dependent variablesxit ∈ Rk orxt ∈ Rkn introduced in Saikkonen (1991). We shall

observe that these projections eliminate the correlation betweenxit and u†it but not the y∗it and u†it

correlation discussed in Remark1. To get rid of the latter type of correlation, instrumental variables will be applied. Sinceβ_Ias well asβ_Lorαas well asλare not separately estimable (see, e.g., Hsiao,

2015, Section 3.6.1), we consider different cases: Section3.2considers a model without time effectsλ and without longitudinally common regressorsxLi, i.e.,kL=0. In this case model (1) becomes

yit =ρ n X j=1 Wijyjt+β′IxIit+β′CxCt+αi+u†it =ρy∗it+β′xit+αi+u†it, where xt =xt−1+vt, xt =(x′I1t,. . .,x′Int,x′Ct)′∈RnkI+kC, β= β′I,β′C _′ ∈Rk_{, and}_k₌_k I+kC. (8)

Motivated by the empirical example discussed in Section5, model (8) will be considered to be the leading case in the following. To simplify the algebra, we apply the within-transformation (see, e.g., Baltagi,2008, p. 11) and derive the asymptotic distribution of the estimates of the slope coefficientsρ andβusing within-transformed data. That is, the variables in deviations from their individual means (means taken with respect to the time series dimension) are

eyit :=yit− 1 T T X t=1 yit , exit :=xit− 1 T⋆ T X t=1 xit , ey∗it := n X j=1 Wijeyjtandeu†it :=u†it− 1 T T X t=1 u†_it , (9) such that (8) after applying the within-transformation (9) reads as follows:

eyit =ρ n X

j=1

Wijeyjt+β′IexIit+β′CexCt+eu†_it⋆ =ρey∗it+β′exit+eu†_it⋆. (10)

For this model Section3.2obtains theT_{→ ∞}-limit distribution for the ordinary least squares (OLS), the dynamic ordinary least squares (DOLS) and the two stage least squares (2SLS) estimator, where second-order asymptotic bias terms show up. Then, theT _{→ ∞}-limit distribution of theD2SLSestimator is provided, where no second-order bias term shows up and the asymptotic limit distribution is a zero mean Gaussian mixture distribution (for a definition of the Gaussian mixture distribution see, e.g., Johansen,

1995, Chapter 13.1).

In a second step, Section3.3will consider the case where “kC =0,kL >0, no cross-sectional fixed

effects are present (α₌0(n×1)) but time fixed effects are included (i.e.,λ6=0(T×1)).” In this case model

(1) becomes yit =ρ n X j=1 Wijyjt+β′IxIit+β′LxLi+λt+u†it =ρy∗it+β′xit+β′LxLi+λt+u†it,

(7)

Another within-transformation can be applied to model (11) to get rid of the time fixed effectsλ(see, e.g., Baltagi,2008). That is,

` yit :=yit− 1 n n X i=1 yit, yìt∗:=y∗it− 1 n n X j=1 y∗_jt, x_ìt := `xIit=xIit− 1 n n X j=1 xIjt, ` xLi:=xLi− 1 n n X j=1 xLj and u`†it :=uit− 1 n n X j=1 u†_jt, such that ` yit = ρ n X j=1 Wij`yjt+β′IxÌit+β′Lx`Li+ ù†_it =ρy`∗it+β′IxÌit+β′Lx`Li+ ù†_it. (12)

For this within-transformed model a D2SLS estimator can be developed. However, in this article we shall consider model (11) jointly with the case where “kL =kC =0 and cross-sectional fixed effects as well

as time fixed effects are allowed.” In this case model (1) becomes equal to4 yit =ρ

n X

j=1

Wijyjt+β′IxIit+αi+λt+u†it =ρy∗it+β′IxIit+αi+λt+u†it,

xt =xt−1+vt, where xt=xIt=(x′I1t,. . .,x′Int)′∈Rk, β=βI, k=kI, andvt =1xt. (13)

To model (13) the within-transformation (see, e.g., Baltagi,2008) ˘ yit :=yit− 1 T T X t=1 yit− 1 n n X j=1 yjt+ 1 Tn n X j=1 T X t=1 yjt, y˘∗it := n X j=1 Wijy˘jt, ˘ xit := ˘xIit:=xIit− 1 T T X t=1 xit− 1 n n X j=1 xjt+ 1 Tn n X j=1 T X t=1 xjt, ˘ u†_it :=u†_it − 1 T T X t=1 u†_it−1 n n X j=1 u†_jt+ 1 Tn n X j=1 T X t=1 u†_jt is applied to obtain ˘ yit = ρ n X j=1 Wijy˘jt+βI′x˘Iit+ ˘u†_it =ρy˘∗it+β′Ix˘Iit+ ˘u†_it. (14) In addition, sincex_˘Li :=xLi−_T1 PTt=1xLi−1_nPnj=1xLj+_Tn1 Pnj=1 PT t=1xLj =xLi−T_TxLi−1_nPnj=1xLj+ T Tn Pn

j=1xLj =0(kL×1), applying the within-transform used in (14) to model (11) also results iny˘it =

ρy_˘∗_it₊β′_Ix_˘Iit+ ˘u†it. This allows for a joint treatment of the models (11) and (13). For model (14) the limit

distribution for the parametersρandβ_Iwill be obtained whenTor(n,T)become large in Section3.3. For the model (11), given the estimates ofρ andβ_I, an estimate ofβ_L will be derived by means of a further ordinary least squares regression.

4_{To distinguish a model where the econometrician decides to include different kinds for fixed effects (given that}_k

C=0 or

k_L₌0), indicator functions of the formI(α)andI(λ)can be included into model (1). To simplify the notation and to improve

the readability we assume that ifk_C₌ 0 andk_L ₌0, then time fixed effects as well as cross-sectionally fixed effects are included.

(8)

3.1. Projection facility

Assumption2implies that potentially all leads and lags of1xIit=vIitand1xCt=vCtare correlated with

u†_it. In the next step, we followDOLSliterature and remove the correlation ofu†_itandv_itby projecting on the leads and lags of the prediction variables. This implies that for eachi, we project on1xit−s =

1x′_Iit₋_s,1x′_Ct₋_s′ = v′_Iit₋_s,v′_Ct₋_s′fors = −p,. . ., 0,. . .,p. The projection ofu†_iton thepleads and lags of1xityields a truncation componentP+s=−p pδ′i,s1xi,t−s, vectors of projection coefficientsδi,s∈Rk

(fors _{= −}p,. . .,p), a truncation errore_p_;_it :₌P_s_>_p_,_s_<₋_pδ′_i_,_s1xi,t−splus anewdisturbanceuit, such

that

u†_it = δ′_p_;_iζ_p_;_it₊e_p_;_it₊uit =δ′p;iζp;it+up;it and up;it :=ep;it+uit, where

ζ_p_;_it :=v_Ii′_,_t₋_p,v_C′_,_t₋_p,. . .,v′_Ii_,_t,v_C′_,_t,. . .,v′_Ii_,_t₊_p,v_C′_,_t₊_p′∈R(2p+1)k_and

δ_p_;_i:₌δ′_i_,₋_p,. . .,δ′_i_,0,. . .,δ′_i_,₊_p′_∈R(2p+1)k _. ₍₁₅₎

The subscriptpdenotes that the truncation errore_p_;_itas well as the noise termu_p_;_it ₌e_p_;_it₊uitdepend

on the number of leads and lagsp.ζ_p_;_itis by construction orthogonal to the new noise termuit, while

the termu_p_;_it =e_p_;_it₊uitcan still be correlated with1xitfor somep<∞.

A further alternative is to follow the system DOLS approach (see, e.g., Park and Ogaki,1991) and project on the leads and lags ofallcross-sections. That is,ζ_♯_p_;_it ₌ζ_♯_p_;_t :₌ v_t′₋_p,. . .,v′_t,. . .v_t′₋_p′ _∈ R(2p+1)(kIn+kC)_{, where in Section}₂_{, we already defined}_v

t = 1xt = v′_I₁_t,. . .,v′_Int,v′_Ct′ ∈ Rn·kI+kC. Then, u†_it ₌ δ′_♯_p_;_iζ_♯_p_;_t₊e♯p;it+uit =δ′♯p;iζ♯p;t+u♯p;it, ζ_♯_p_;_t ₌ v_t′₋_p,. . .,v′_t,. . .,v′_t₊_p′_∈R(2p+1)(kIn+kC)_, u_♯_p_;_it :=e♯p;it+uit, and δ♯p;i:= δ′_♯_i_,₋_p,. . .,δ′_♯_i_,0,. . .,δ′_♯_i_,₊_p′_∈R(2p+1)(kIn+kC) _. ₍₁₆₎

We shall observe that projecting on the own leads and lags (15) is sufficient to eliminate the correlation asymptotically between the regressors and the noise term in model (8), while with time fixed effects or kL >0, where the within-transform (14) is applied, the projection on all leads and lags (16) is used to

get rid of the correlation betweenx_˘Iit andu˘†_it.5 Now we impose an additional restriction on the error

dynamics that will guarantee that the truncation errore_p_;_it(ore♯p;it) converges to zero whenTbecomes

large:

Assumption 3(Error Dynamics II; see Saikkonen (1991), Mark et al. (2005)). Suppose that p₌p(T). Then p(T)has to fulfillp(T_T)3 →0and√TP_|_s_|_>_p₍_T₎kδi,sk2 → 0(or√TP_|s|>p(T)kδ♯i,sk2 →0when projecting on the full cross-section of leads and lags) as T→ ∞, wherek.k2stands for the Euclidean norm. Assumption3requires thatp(T)does not grow too fast, while the second part restricts the depen-dence between the noise term and the regressors. Based on Assumptions2–3, for model (10), if T becomes large then—due to the increase in the number of leads and lagsp(T)—the truncation errore_p_;_it

becomes small. As a result, the difference betweenu_p_;_itanduitbecomes small, such thatup;it becomes

5_{By contrast, when skipping the assumption of independent cross-sections, Online Appendix A-4 demonstrates that in this} case a projection on all leads and lags becomes necessary already for model (8). The Online Appendix can be downloaded fromhttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=2256929.

(9)

orthogonal toζ_p_;_it asT → ∞. For the transformed model (14),x˘Iit as well asu˘†it contain weighted

elements from the other cross-sectionsj=1,. . .,n,j6=i. The projections on the full cross-section of leads and lags is applied to the transformed model (14). Then the truncation errore♯p;itbecomes small

such thatu˘_♯_p_,_itconverges tou˘_itand thex˘Iit,u˘♯p,it-correlation gets small for largeTdue to Assumption3.

After applying the projections (15) for the transformed model (10) as well as (16) with the transformed model (14), we arrive at the new covariance stationary process η_t_t

∈Z, where ηt := u′_t,v_t′′ ∈ R(kI+1)n+kC_, _η

It := (η′it)i=1,...,n′ ∈ R(kI+1)n, ηit := uit,v′_it′ ∈ RkI+1, vt =

v′_I₁_t,. . .,v′_Int,v′_Cit′ ∈ Rn·kI+kC_{, and} _u

t := (u1t,. . .,unt)′ ∈ Rn). For any ℓ ∈ Z we getŴℓ :=

E η_t₋_ℓη′_t_∈R((kI+1)·n+kC)×((kI+1)·n+kC)_{, the new long run covariance matrix}_:₌P∞

ℓ=−∞E ηt−ℓη′t

and new half long covariance1:₌P∞_ℓ₌₀E η_t₋_ℓη′_tu†_it(16) the projection on all leads and lags is used to get rid of the correlation betweenx_˘Iitandu†_it(16). By applying the notation introduced in (2) and (3),

we getŴℓ,vu =vu =1vu =0(nk×n)as well asuiuj =Ŵℓ,uiuj =0, fori6=jby Assumption2. By the

construction of the noise processuit, the correlation betweenuitandvj,t−ℓis zero for allj = 1,. . .,n

andℓ_∈Z.

In addition, while the time indextgoes from 1 toTin (1), after the projection facilities are applied only the observationsp₊1,. . .,T₋pcan be used to estimate the model parameters. By definingt⋆ :=

t₋p₊1 we obtain a new time index which accounts for the projection facility. ThenT⋆=T−2pand

t⋆ =1,. . .,T⋆.

Given the assumptions on the error dynamics, afunctional central limit theoremcan be applied, such that ifT⋆ → ∞: 1 √ T⋆ [_XT⋆r] t⋆=1 η_t⋆ _⇒B(r)₌1/2W(r). (17)

W(r)is a standard Brownian motion inRkI·n+kC_{, while}_B₍_r₎ ₌1/2_W₍_r₎_{. These Brownian motions} contain the components,B_v

i = BvIi(r)′,BvC(r)′ ′ andW_v i := WvIi(r)′,WvC(r)′ ′ . By Assumption2

and the construction ofη_t⋆,B_u

i(r)andBvi(r)as well asBui(r)andBv(r)are independent for alli =

1,. . .,n. This also yieldsB_u

i(r) = p uiuiWui(r). By Davidson (1994, Theorem 30.2), 1 √ T⋆x˜i[rT⋆] ⇒ B_vi₍_r₎₋R1 0 Bvi(s)dsand T12 ⋆ P_[rT⋆] t⋆=1x˜it⋆x˜′it⋆ ⇒ Rr

0B˜vi(s)B˜vi′(s)ds, forT⋆ → ∞, where thedemeaned

Brownian motionB_v

i(r)− R1

0 Bvi(s)dsis abbreviated byB˜vi(r). Assumption2and some algebra results

in _T⋆1 P_t⋆[T⋆₌]₁x_˜it⋆u˜it⋆ ⇒ p

uiui R1

0 B˜vi(s)dWui(s)+1viui (see also Davidson,1994, Theorem 30.13). Since,vit⋆anduit⋆ are uncorrelated,1viui =0(k)and

1 T⋆ P[T⋆] t⋆=1x˜it⋆u˜it⋆ ⇒puiui R1 0 B˜vi(s)dWui(s). By

contrast, _T⋆1 P[_t⋆T⋆₌]₁x˜it⋆u˜†it⋆ ⇒ p _uiuiR₀1B˜_vi(s)dW† ui(s)+1 † viui, where - in general -1 † viui 6= 0(k). In addition, we derive√1 T⋆x`i[rT⋆]⇒Bvi(r)− 1 n Pn

j=1Bvj(r)=:B`vi(r), whereB`ui(r)andB`v(r)are defined

in the same way, and√1

T⋆x˘i[rT⋆]⇒Bvi(r)− R1 0 Bvi(s)ds−n1 Pn j=1Bvj(r)+_n1Pnj=1 R1 0 Bvj(s)ds=:B˘vi(r), whereB˘_ui₍_r₎_andB˘_v₍_r₎_{are defined in an equivalent way (for more details see Online Appendix A-2).}

3.2. Large sample properties of some parameter estimators for model (8)

This subsection investigates the model defined in (8), where kL = 0 and no time effects are

included. To write down our estimator in a compact way, we define the model in a stacked notation. Defineey := ey11,. . .,ey1T⋆,. . .,eyn1,. . .,eynT⋆′, y∗ := ey∗₁₁,. . .,ey∗₁_T⋆,. . .,ey∗_n₁,. . .,ey∗_nT⋆ _′ , ex := ex₁₁,. . .,ex₁_T⋆,. . .,ex_n₁,. . .,ex_nT⋆ _′ , andeu_p := eu_p_;11,. . .,eu_p_;1_T⋆,. . .,eu_p_;_n₁,. . .,eu_p_;_nT⋆ _′ , whereey,ey∗,

(10)

andeu_pare of dimensionnT⋆×1, whileexis annT⋆×kmatrix. Furthermore, we have eζ_p:₌            eζ′_p_;11 0(1×(2p+1)k) 0(1×(2p+1)k) .. . eζ′_p_;1_T⋆ 0(1×(2p+1)k) 0(1×(2p+1)k) 0(1×(2p+1)k) eζp;21 0(1×(2p+1)k) . .. 0(1×(2p+1)k)0(1×(2p+1)k) eζp;nT⋆            ∈RnT⋆×(2p+1)k·n _and δ_p:₌    δ_p_;1 .. . δ_p_;_n   ∈R(2p+1)k·n×1_. ₍₁₈₎

This provides us with model (10) in stacked form:

ey₌ρey∗₊exβ₊eζ_pδ_p₊_eu_p₌ ey∗,exγ₊eζ_pδ_p₊_eu_p₌eXp γ′,δ′_p′₊_eu_p₌eXpθ′p+eup, (19) whereγ :₌ ρ,β′′ _∈ R1+k _and_θ p :=

γ′,δ′_p′ _∈ R1+k+(2p+1)·kn_{. The right-hand side variables are}

collected ineXp =

ey∗,ex,ζ_p

∈ RnT⋆×k+1+(2p+1)·kn_{. The transpose of the rows of the matrix}_e_X

pare

the column vectorseXp;it⋆ :=

˜

y∗_it⋆,x˜′_it⋆,0(1×(2p+1)k·(i−1)),eζ′p;it⋆,0(1×(2p+1)k·(n−i−1))

_′

∈ Rk+1+(2p+1)·kn_.

Including the projection facility (15), model (8) can be written asy˜it⋆ =eX_p′_;_it⋆θp+eup;it⋆. In addition, we

apply the following notation: LetWi ∈R1×nstand for theith row ofW. Then·(kI+kC)×nkI+kC

matrixCtransformsx˜t = ˜x′I1t,. . .,x˜Int′ ,x˜′Ct _′

∈ RkIn+kC _into _x_˜′

I1t,x˜′Ct,. . .,x˜′Int,x˜′Ct _′

∈ R(kI+kC)_·n_.

Note that each row ofCcontains exactly one element equal to 1, while the other elements are zero. In addition,Chas full column ranknkI+kC. ForkC =0,C=InkI. Moreover, letu˜t := ˜u1t,. . .,u˜nt

′_and ˜ ζ_p_,_t ₌diag(ζ˜′_p_,1_t,. . .,ζ˜′_p_,_nt)_∈Rn×(2p+1)·kn. Then,y_˜∗_it ₌Pn_j₌₁WijPnl=1Kjl βx_˜lt+δ′_p_;_lζ˜p;lt∗+ ˜ult can be expressed more compactly byWiK

˜

βCx˜t+ ˜ζp,tδp+ ˜ut

. In the same way as x˜′₁_t,. . .,x˜′_nt′ =Cx˜t

we proceed withv_˜, resulting inCv˜t =C1x˜t the half-long covariance matrixC1vu ∈ Rkn×n, and the

superposition of the components of the demeaned Brownian motionCB˜_v_∈Rkn_.

Let us start with theOLS-estimator, wherep= ∅,u˜_it = ˜u†_it,T=T⋆,t=t⋆,eXOLS_it := ˜y∗_it,x˜_it′

_′

and

bγ_OLS:₌ρ_ˆOLS,βˆ′OLS _′ = n X i=1 T X t=1 e XOLS_it eXOLS_it ′ !−1 _n X i=1 T X t=1 e XOLS_it y_˜_it. (20)

To obtain theT→ ∞-limit distribution,eX_itOLSis scaled by 1/T. This yields

Proposition 1. Consider the fixed effects spatial correlation model (8) and the OLS estimator (20) based on the within-transformed model (10). Suppose that the Assumptions1and2hold.

Then, for n fixed and T _{→ ∞}, it follows that the T _{→ ∞}limits ofMOLS_˜

XX˜,Tni := 1 T2 PT t=1eXOLSit eXitOLS′ andMOLS_˜ XX˜,Tn:= Pn

i=1MOLS_X_˜_X_˜_,_Tniare

M_X_˜_X_˜_,_ni:₌ Z 1 0 gi(r)gi(r)′dr, gi(r):= WiKβ˜CB˜v(r) ˜ B_vi(r) , and M_X_˜_X_˜_,_n:₌ n X i=1 M_X_˜_X_˜_,_ni, (21)

(11)

while the T_{→ ∞}limits ofmOLS_˜ Xu˜†_,_ni:= 1 T PT t=1eXOLSit u˜ † itandmOLSX˜u˜†_,_n:= Pn i=1mOLS_X_˜_u_˜†_,_niare m_X_˜_u_˜†_,_ni:= WiK R1 0 β˜CB˜v(r)dBui†(r) q †_uiuiR₀1B˜_v i(r)dWu†i(r) ! + WiK h ˜ βC1†_vui₊Ŵ†_0,_uuii 1†_v iui ! , and m_X_˜_u_˜†_,_n:= n X i=1 m_X_˜_u_˜†_,_ni. For the centered and scaled ordinary least squares estimator ofγwe observe:

T γ_ˆ_OLS₋γ_⇒M−_˜1

XX˜,nmX˜u˜†,n. (22)

Proof. See AppendixA.

Note that m_X_˜_u_˜†_,_n contains the “usual” second-order bias term

Pn

i=11†viui in the coordinates

2 to k ₊ 1, while in the first component of m_X_˜_u_˜†_,_n we observe the second-order bias term

Pn i=1WiK ˜ βC1†_vui₊Ŵ†_0,_uu i

arising from a spatial lag. Next, the panelDOLSestimator, derived in Mark and Sul (2003) as

ˆ θDOLS;p:= e X′_peXp ₋1 e X′_pey, (23)

results inθˆ_DOLS_;_p₋θ_p₌Pn_i₌₁PT⋆_t⋆₌₁eX_p_;_it⋆eX′_p_;_it⋆−1Pn_i₌₁PT⋆_t⋆₌₁eX_p_;_it⋆u˜_p_;_it⋆. To obtain theT_{→ ∞} -limit distribution ofγ_ˆ_DOLS_;_p :₌ρ_ˆDOLS;p,βˆ′DOLS;p

′

, the firstk₊1 components ofeXp;it⋆ are scaled by

1/T⋆, while the remaining components are scaled by 1/√T⋆, resulting in the scaling matrixA_Xp_˜ :=

diag T_⋆−1·Ik+1,T⋆−0.5·I(2p+1)nk

∈Rk+1+(2p+1)nk×k+1+(2p+1)nk_{. For the}_DOLS_{estimator we obtain:} Proposition 2. Consider the fixed effects spatial correlation model (8) and the DOLS estimator (23) based on the within-transformed model (10). Suppose that the Assumptions1to3hold. Let T⋆ =T−2p(T).

Then, for n fixed and T_{→ ∞}, it follows that: (a) T⋆(γˆDOLS;p−γ)and

√

T⋆(δˆDOLS,p−δp)are asymptotically independent.

(b)T⋆

ˆ

γ_DOLS_;_p₋γ_⇒M−_˜1

XX˜,nmX˜u˜,n, whereMX˜X˜,nfollows from (21),

m_X_˜_u_˜_,_ni:₌ WiK R1 0 β˜CB˜v(r)dBui(r) p uiui R1 0 B˜vi(r)dWui(r) ! + WiKŴ0,uui 0(k) , and m_X_˜_u_˜_,_n:₌ n X i=1 m_X_˜_u_˜_,_ni.

M_X_˜_X_˜_,_n,m_X_˜_u_˜_,_niandm_X_˜_u_˜_,_nare the T _{→ ∞}-limits ofhPn_i₌₁PT⋆_t⋆₌₁A_Xp_˜ eX_p_;_it⋆eX′_p_;_it⋆A_Xp_˜ i

(1:k+1,1:k+1) = M_X_˜_X_˜_,_nT, hPT⋆_t⋆₌₁A_Xp_˜ eX_p_;_it⋆u_˜it⋆ i (1:k+1,1) =: mX˜u˜,nTi and hPn i=1 PT⋆ t⋆=1AXp˜ eXp;it⋆u˜it⋆ i (1:k+1,1) =: m_X_˜_u_˜_,_nT.

SinceŴ0,uiui >0 projecting on the leads and lags1xit⋆+s,s= −p(T),. . .,−1, 0, 1,. . .,p(T), is not

sufficient to obtain convergence to a zero mean Gaussian mixture distribution. Comparing theOLSlimit M−_˜1

XX˜,nmX˜u˜†,nto theDOLSlimitM

−1 ˜

XX˜,nmX˜u˜,n, we observe that by projecting on the leads and lags, the

second-order bias termsC1†_vui and1_viui† are removed, but not the correlation termWiKŴ0,uui arising

from the spatial lag.

To obtain convergence to a mean zero Gaussian mixture distribution, we shall estimate the model using instruments for the endogenous variableey∗_it=Pnj=1Wijeyjt=Wieyt, for which we assume:

(12)

Assumption 4(Valid Instruments; see Kitamura and Phillips (1997)). The instrumentsz_˜∗_it _∈Rqρ _fulfill

the requirements for instrumental variable estimation as stated, e.g., in Ruud (2000, Chapter 20), Phillips and Hansen (1990) and Kitamura and Phillips (1997). In particular, (i) the number of instruments is larger or equal to the number of parameters (order condition), (ii)_T12

⋆ PT∗

t⋆=1(y˜it⋆∗ ,x˜′it⋆)′((ez∗it⋆)′,x˜′it⋆)weakly

converges to a matrix of rank k₊1(almost surely) as T⋆→ ∞, and (iii)_T12

⋆

PT⋆

t⋆=1((ez∗it⋆)′,x˜′it⋆)′((ez∗it⋆)′,x˜′it⋆)

weakly converges to a matrix of rank k+qρ(almost surely) as T⋆→ ∞.

By following Kelejian and Prucha (1998), we base the instruments on the spatial lags of the explana-tory variables. In more detail, our model can be solved asey=IT⊗(In−ρW)−1 exβ+eζpδp+eup

. The matrix(In−ρW)−1can then be expanded as(In−ρW)−1 =P∞s=0(ρW)s(see, e.g., Horn and Johnson,1985, Corollary 5.6.16). This implies that variables of the formPn_j₌₁Wijexjt⋆κ,Pn_j₌₁W_ij2exjt⋆κ,. . .

are suitable instruments forWy˜.x˜jt⋆κis the coordinateκofexjt⋆, whileWijτstands for [Wτ](i,j)andWτi

for [Wτ](i,1:n), whereτ ∈ N. Ifx˜jt⋆κ is a component specific variable,x˜jt⋆κ andeu_it⋆, fori 6= j, are

independent by Assumption2, while ifx˜jt⋆κis a common variable, then asymptotic independence will

be established for theD2SLSestimator whenT → ∞. Note that these instruments have an intuitive interpretation: we instrument theWijweighted sum of the neighbors/peerseyjt⋆by theWijweighted sum

of the characteristics of the neighbors (theirexit⋆ values). The higher-order spatial lags as instruments

then use the characteristics of the neighbors of the neighbors, etc. Hence, we work with the instruments

ez∗_it⋆_κ=ex∗_it⋆_κ:= n X j=1 Wτκ it⋆exjt⋆κ, (24)

whereκ _∈K_{⊂ {}_1,. . .,k},K_{is an index set collecting the indices of the instruments used, and}τκ∈N.

LetK₍_l₎_{stand for the}_{lth element of the set}K_{. Then,}_e_z∗

it⋆= ex_it⋆∗_K (1),. . .,ex ∗ it⋆K(qρ ) _′ ∈Rqρ_{, the exponents}

τκareτK(l),l=1,. . .,qρ. By then×nkselector matricesC(K(j))(where the coefficients

C_K(j)

(ι,(ι−1)n+nι)

are equal for allι₌1,. . .,n), we getez_it⋆∗_K

(j) =ex

∗

it⋆K(j) =W

τ_K_(j)

i C(K(j))x˜t⋆, forj=1,. . .,qρ. To keep the

notation simple, we consider - as already stated at the beginning of Section2- a model with one spatial lag (kρ=1). Hence, withqρ≥1 the order condition is met.

We collect the variablesez∗_it⋆ ₌ WτK(1)

i C(K(1))x˜t⋆,. . .,W τK_{(qρ )} i C(K(qρ ))x˜t⋆ ′ ∈ Rqρ _{in the} nT⋆ × qρ matrixz∗ := ez∗1t⋆,. . .,ez∗nt⋆ ′

. The set of our instruments is theneZp :=

z∗,x,ζ_p

∈

RT⋆n×qρ+k+(2p+1)k·n_{. The rows of}_e_Z

p∈RnT×qρ+k+(2p+1)k·nare the transpose of theqρ+k+(2p+1)k·

n-dimensional column vectorseZp;it⋆ :=

˜

z∗′_it⋆,x˜′_it⋆,0(1×(2p+1)k·(i−1)),ζ˜′p;it⋆,0(1×(2p+1)k·(n−i−1))

_′

. Next we considertwo-stage least squares(2SLS) estimation. Since no projection on leads and lags is applied with 2SLS, we getp _{= ∅},T ₌T⋆andt=t⋆as well aseX2_itSLS := ˜y∗_it,x˜_it′

_′

=eXOLS_it andeZ2_itSLS :_{= ˜}z∗′_it,x_˜′_it′. CollectingeX2_itSLSandeZ_it2SLSresults ineX2SLSandeZ2SLS. The noise term and the projection operator are given byeu†_itandP₂_SLS_:₌e_Z2SLS _e_Z2SLS′_e_Z2SLS−1_e_Z2SLS′_{. The term}_x_˜

itcontained ineZ2itSLS, is still correlated

witheu†_it. This correlation does not vanish ifT → ∞. To see this, consider the two stage least squares estimator

ˆ

γ₂_SLS:= eX′₂_SLSP₂_SLSe_X₂′_SLS−1e_X₂′_SLSP₂_SLS_y_˜_. ₍₂₅₎ By scalingeX2_itSLSandeZ2_itSLSby 1/Twe obtain:

Proposition 3. Consider the fixed effects spatial correlation model (8) and the2SLS estimator (25) based on the within-transformed model (10). Suppose that the Assumptions1to4hold. Instruments are based on (24).

(13)

Then, for n fixed and T _{→ ∞}, it follows that the T_{→ ∞}-limits ofM2_˜SLS XZ˜,nTi:= 1 T2 PT t=1eX2itSLSeZ2itSLS′, M2_˜SLS XZ˜,nT := Pn i=1M2_X_˜SLS_Z_˜_,_ni,M2_Z_˜SLS_Z_˜_,_nTi := T12 PT t=1eZ2itSLSeZit2SLS′,M2Z˜SLSZ˜,nT := Pn i=1M2_Z_˜SLS_Z_˜_,_ni, m2_Z_˜SLS_u_˜_,_nTi := 1 T PT t=1eZ2itSLSu˜ † it, andm2Z˜SLSu˜,nT := Pn

i=1m2_Z_˜SLS_u_˜_,_nTiare provided by

M_X_˜_Z_˜_,_ni:= Z 1 0 gi(r)hi(r)′dr, M_X_˜_Z_˜_,_n:= n X i=1 M_X_˜_Z_˜_,_ni, where hi(r):= Wτ_iK(1)C(K(1))B˜v(r),. . .,W τ_K_(j) i C(K(j))B˜v(r),. . .,W τ_K_{(qρ )} i C(K(qρ ))B˜v(r),B˜vi(r)′ ′ ∈Rqρ+k_, M_Z_˜_Z_˜_,_ni:₌ Z 1 0 hi(r)hi(r)′dr, MZ˜Z˜,n:= n X i=1 M_Z_˜_Z_˜_,_niand m_Z_˜_u_˜†_,_ni:=          WτK(1) i R1 0 C(K(1))B˜v(r)dB † ui(r) .. . Wτ_iK(qρ )R₀1C(K(qρ ))B˜v(r)dB † ui(r) q †_ui_uiR₀1B˜_vi(r)dW† ui(r)          +         Wτ_iK(1)C(K(1))1 † vui .. . W τ_K_{(qρ )} i C(K(qρ ))1 † vui 1†_v iui         , as well asm_Z_˜_u_˜†_,_n:= n X i=1 m_Z_˜_u_˜†_,_ni. (26)

The asymptotic limit distributed of the centered and scaled 2SLS estimator ofγis provided by: T γˆ₂_SLS₋γ_⇒M_X_˜_Z_˜_,_nM−_˜1 ZZ˜,nM ′ ˜ XZ˜,n −1 M_X_˜_Z_˜_,_nM−_˜1 ZZ˜,nmZ˜u˜†,n. (27)

Note that 2SLSeliminates the correlation termPn_i₌₁WiKŴ0,†uui inmX˜u˜†,ni, while the second-order

bias arising from serial correlation is still present. We do not attain convergence to a zero mean Gaussian mixture distribution.

We now construct a two stage-least square procedure for our panel setting where leads and lags of1x_˜it as well as instrumentsez∗it are included. Let us define theprojection operatorPHp projecting

on the column space spanned byeZp (see, e.g., Ruud, 2000, Chapter 3). In formal terms PHp := eZp

eZ′_peZp

₋1

eZp. SinceeZpis aT⋆n×qρ+k+(2p+1)k·nmatrix,PHphas to be aT⋆n×T⋆nmatrix.

Thedynamic two-stage least squaresestimator ofθ_p₌(ρ,β′,δ_p′)′₌(γ′,δ′_p)′is defined as follows:

bθ_D₂_SLS_;_p:₌eX′_pP_Hpe_X_p−1e_X′ pPHpey=(γ′,δ′p)′+ e X′_pP_Hpe_X_p−1e_X′ pPHpeup. (28)

LeteXit⋆,,(1:k+1) andeZit⋆,,(1:k+qρ) stand for the first k +1 and k +qρ elements ofeXp;it⋆ and eZp;it⋆.

For the asymptotic analysis we apply the scaling matrixA_Xp_˜ as well as the scaling matrix A_Zp_˜ := diag T−1

⋆ ·Ik+qρ,T−⋆0.5·I(2p+1)nk∈ Rk+qρ+(2p+1)nk×k+qρ+(2p+1)nk. The matrixAXp˜ is diagonal with 1/T⋆in the firstk+1 elements while the matrixA_Zp_˜ is diagonal with 1/T⋆in the firstk+qρelements.

The other scaling factors, i.e. all the further elements on the main diagonals ofA_Xp_˜ andA_Zp_˜ areT_⋆−0.5. LetM⋆ ˜ XX˜,nTi := PT⋆ t⋆=1AXp˜ eXit⋆eXit⋆′ AXp˜ ,M⋆X˜X˜,nT := Pn i=1M⋆_X_˜_X_˜_,_nTi,M⋆_X_˜_Z_˜_,_nTi := PT⋆ t⋆=1AXp˜ eXit⋆eZ′it⋆AZp˜ , M⋆_˜ XZ˜,nT := Pn i=1M⋆_X_˜_Z_˜_,_nTi,M⋆_Z_˜_Z_˜_,_nTi := PT⋆ t⋆=1AZp˜ eZit⋆,eZ′it⋆,AZp˜ ,M⋆X˜Z˜,nT := Pn i=1M⋆_X_˜_Z_˜_,_nTi,m⋆_Z_˜_u_˜_,_nTi :=

(14)

PT⋆

t⋆=1AZp˜ eZit⋆,(1:k+1)u˜_it⋆, andm⋆_Zu_,_nT :=Pn_i₌₁m⋆_Z_˜

˜ u,nTi, as well as M_X_˜_X_˜_,_nTi:= 1 T2 ⋆ T⋆ X t⋆=1 e Xit⋆,(1:k+1)eX′_it⋆_,(1:k+1 =hM⋆_X_˜_X_˜_,_nTi i (1:k+1,1:k+1) , M_X_˜_X_˜_,_nT := n X i=1 M_X_˜_X_˜_,_nTi∈Rk+1×k+1_, M_˜ XZ˜,nTi:= 1 T2 ⋆ T⋆ X t⋆=1 eXit⋆,(1:k+1)eZ′it⋆,(1:k+qρ) =hM⋆_˜ XZ˜,nTi i (1:k+1,1:k+qρ) , M_˜ XZ˜,nT := n X i=1 M_˜ XZ˜,nTi∈R k+1×k+qρ_, M_Z_˜_Z_˜_,_nTi:= 1 T2 ⋆ T⋆ X t⋆=1 eZit⋆,,(1:k+qρ)eZ′_it⋆_,,(1:k+qρ) =hM⋆_Z_˜_Z_˜_,_nTi i (1:k+qρ,1:k+qρ) , M_Z_˜_Z_˜_,_nT:= n X i=1 M_Z_˜_Z_˜_,_nTi∈Rk+qρ×k+qρ _, m_˜ Zu˜,nTi:= 1 T⋆ T⋆ X t⋆=1 eZit⋆,(1:k+1)u˜it⋆ =hm⋆_˜ Zu˜,nTi i (1:k+qρ,1) and m_Z_˜_u_˜_,_nT :₌ n X i=1 m_˜ Zu˜,nTi= h m⋆_˜ Zu˜,nT i (1:k+qρ,1)∈ Rk+qρ_.

Next, we summarize the large sample properties of theD2SLSestimator:

Theorem 1(T_{→ ∞}limits forD2SLSEstimation). Consider the fixed effects spatial correlation model (8) and the D2SLS estimator (28) based on the within-transformed model (10). Suppose that Assumptions1–4

hold. Let T⋆=T−2p(T). Then, for n fixed and T→ ∞, it follows that

1.T⋆(γˆD2SLS;p−γ)and

√

T⋆(δˆ_D₂_SLS_;_p−δ_p)are asymptotically independent.

2.M_X_˜_X_˜_,_nTi,M_X_˜_X_˜_,_nT,M_Z_˜_Z_˜_,_nTi,M_Z_˜_Z_˜_,_nT,M_X_˜_Z_˜_,_nT,M_X_˜_X_˜_,_nT,m_Z_˜_u_˜_,_nTi, andm_Z_˜_u_˜_,_nTconverge weakly toM_X_˜_X_˜_,_ni, M_X_˜_X_˜_,_n,M_Z_˜_Z_˜_,_ni,M_Z_˜_Z_˜_,_n,M_X_˜_Z_˜_,_n,M_X_˜_X_˜_,_n,m_Z_˜_u_˜_,_ni, andm_Z_˜_u_˜_,_n, where m_Z_˜_u_˜_,_ni:₌ Z 1 0 hi(r)dBui(r)= p uiui Z 1 0 hi(r)dWui(r) and mZ˜u˜,n:= n X i=1 m_Z_˜_u_˜_,_ni. (29) M_X_˜_X_˜_,_nandM_X_˜_X_˜_,_niare provided in (21), whileM_X_˜_Z_˜_,_ni,M_X_˜_Z_˜_,_n,M_Z_˜_Z_˜_,_n, andM_Z_˜_Z_˜_,_niare provided in (26). In addition, T⋆

ˆ

γ_D₂_SLS_;_p₋γ

converges weakly toM−_n1mn, where Mn :=M_X_˜_Z_˜_,_nM−_Z_˜_Z_˜1_,_nM′_X_˜_Z_˜_,_n and mn:= n X i=1 M_X_˜_Z_˜_,_nM−_˜1 ZZ˜,nmZ˜u˜,ni=MX˜Z˜,nM −1 ˜ ZZ˜,nmZ˜u˜,n. (30)

3.Suppose thatˆuuis a consistent estimator ofuu=diag(uiui)i=1,...,n, then VnT := h M_X_˜_Z_˜_,_nTM−_˜1 ZZ˜,nTM ′ ˜ XZ˜,nT i−1 DnT h M_X_˜_Z_˜_,_nTM−_˜1 ZZ˜,nTM ′ ˜ XZ˜,nT i−1 ⇒ hM_X_˜_Z_˜_,_nM−_˜1 ZZ˜,nM ′ ˜ XZ˜,n i₋1 Dn h M_X_˜_Z_˜_,_nM−_˜1 ZZ˜,nM ′ ˜ XZ˜,n i₋1 =:Vn, where

(15)

DnT:=MX˜Z˜,nTM− 1 ˜ ZZ˜,nT n X i=1 ˆ uiuiMZ˜Z˜,nTi ! M−_˜1 ZZ˜,nTM ′ ˜ XZ˜,nT, and Dn:=M_X_˜_Z_˜_,_nM−_Z_˜_Z_˜1_,_n n X i=1 uiuiMZ˜Z˜,ni ! M−_˜1 ZZ˜,nM ′ ˜ XZ˜,n.

Given ans_×(k+1)restriction matrixR, the Wald type statistic

W_γ_,_nT ₌_T_⋆_R_bγ_D₂_SLS_;_p₋γ′ RVnTR′

₋1

T⋆R

bγ_D₂_SLS_;_p₋γ , (31)

converges in distribution to aχ2random variable withsdegrees of freedom.

Remark 2. Since the signal to noise ratio goes to infinity, we observe that the OLS, theDOLS, the 2SLS, and theD2SLSestimator is consistent, when consideringT _{→ ∞}-limits. Sufficient conditions for consistent estimation of the covariance matrix_uuare discussed in Jansson (2002) and in Online Appendix A-6.

In addition, observe that ifβ₌0(k×1)ork=0, the variableyitbecomesI(0)(see, e.g., Eqs. (1) and

(7)). In this case the signal to noise ratio does not go to infinity and the ordinary least squares estimator is not consistent (for a proof see Sögner and Wagner,2017).

Remark 3. Projecting onallleads and lags as proposed in system-DOLS(see Park and Ogaki,1991) and theDSUR approach (see Mark et al.,2005) does not eliminate this bias. Based on Mark et al. (2005),D2SLScan be augmented to a richer correlation structure by projecting on the leads and lags of all regressors as described in (16). However, by this projection facility the dimension of the nuisance parameter becomes(2p₊1)_·(kIn2+kCn). Due to numerical constraints, this estimator can hardly be

implemented whennis large (see Section A-4 in the Online Appendix and Section4).

Remark 4. The following subsection also derives limits when both nand T become large. For the

model (10), where x_˜Ct and x˜Iit are used as regressors, we observe that _nT12

⋆ Pn j=1 PT⋆ t⋆=1x˜Ctx˜′Ct = 1 T2 ⋆ PT⋆

t⋆=1x˜Ctx˜′_Ctsuch that theT→ ∞limit as well as the(n,T)→ ∞limit remain random variables.

By this fact, the(n,T)_{→ ∞}limit of the centered and scaled parameter vector is not a mean zero normal vector ifkC>0.6

3.3. Large sample properties of the D2SLS parameter estimator for model (13)

Using the within-transform (14) and the projection facility (16), we get byδ˘_♯_p_;_i:₌δ_♯_p_;_j₋1_nPn_j₌₁δ_♯_p_;_j and some algebra (see Online Appendix A-3)7

˘

yit =ρy˘∗it⋆+β′Ix˘Iit⋆+ ˘δ♯′p;iζ˜♯p;t⋆+ ˘up;it⋆ =WiK ˜ β_Ix_˘t⋆+ In⊗ ˜ζ′♯p;t⋆ ˘ δ_♯_p_{+ ˘}u_t⋆ , (32)

6_{In a former version we obtained}₍_n_,_T₎_{→ ∞}_{-limits for model with locally common variables, in which case the joint limit is} a normally distributed zero mean random vector. These results are available on request.

7_{To simplify the notation we write}_u_˘

it⋆ andu˘t⋆ when the within-transform (14) is applied tou♯it⋆ andu♯t⋆(i.e., we skip the symbol♯). In addition, althoughδ_♯_p;j− 1n

Pn

j=1δ♯p;jcorresponds to the within-transform defined in (12) we use the abbreviationδ˘_♯_p;ito emphasize that this parameter is related to the model based on the within-transform (14). To obtain (32) note that withk_C ₌ 0, we haveβ = β_I, β˜_I = ˜β = In ⊗β_I ∈ Rn×kIn_{, and}_C ₌ _I

nkI such that WiK ˜ βCx˘t⋆+ In⊗ ˜ζ′♯p;t⋆ + ˘u_t⋆=Kβ˜_Ix˘t⋆+ In⊗ ˜ζ′♯p;t⋆ + ˘u_t⋆.