• No results found

Cross-sectional models with spatial autocorrelation

Spatial econometric models and estimation strategies 1

1.4 Cross-sectional models with spatial autocorrelation

When spatial autocorrelation is present in the data, the hypothesis of independence between the observations is violated and inference based on Ordinary Least Squares (OLS) estimation is therefore not reliable. This is the reason for the need to pay great attention to the presence of spatial autocorrelation when estimating econometric models. In order to address this issue, taxonomy for the most popular cross-sectional spatial models is

13

given, followed by a discussion of the estimation strategies that are usually employed.

A taxonomy for cross-sectional spatial models 1.4.1

Given a classical linear regression model, such as

! U4 V - (1.3)

where ! is the 7 1 vector of the dependent variable, U is the 7 W matrix of observations for the W independent variables, 4 is the W 7 1 vector of unknown coefficients and - is the 7 1 vector of errors, OLS estimation is based on the following assumptions, that make it the Best Linear Unbiased Estimator (BLUE):

OLS_1. Exogeneity of the regressors U.

OLS_2. X2-5 0 OLS_3. X2-′-5 /0Z.

The presence of spatial dependence causes the violation of some of these hypotheses as the following sections will make clear, thus making OLS estimates inefficient or even biased. Spatial dependence can be incorporated in the specification of a linear regression model in different ways, particularly either in the form of a spatially lagged variable (the spatial lag of the dependent variable, 6!, or the spatial lag of an exogenous variable, 63), or in the error structure, so that X[- -"\ $ 0.

These two forms can also be combined in the more complex Cliff-Ord model.

SAR models 1.4.2

The Spatial Autoregressive (SAR) model incorporates spatial dependence through a spatial lag of the dependent variable:

! ]?6! V U4 V - (1.4)

14

where ]? is the so called spatial autoregressive coefficient and the other notation is unchanged. For the sake of simplicity the error terms are assumed to be . . :. although heteroskedasticity can be variously incorporated (Anselin 1988a).

The introduction of the spatial lag of the dependent variable allows one to evaluate the effects of spatial dependence once the effects of the other regressors are controlled for; on the other hand, it also allows evaluating the impact of the other regressors once the effects of spatial dependence are wiped out.

It is important to notice that the term 6! is correlated with the error terms in model (1.4), thus resulting in an endogenous regressor that causes bias and inconsistency in a-spatial OLS estimates. This becomes clear when one considers the following rearrangement of equation (1.4):

! 2Z ^ ]?65S?2U4 V -5. (1.5)

Expression (1.5) shows how a shock occurring in unit affects not only the value of ! in that unit, but also that of the other units through the inverse spatial transformation (Anselin 2001). The matrix 2Z ^ ]?65S?

also determines the parameter space for this model, because it is required to be a non-singular matrix in order to be inverted. When the spatial weight matrix is row-standardized, this is always true for |]?| ` 11.

SARE models 1.4.3

When spatial dependence is incorporated in the error term, - becomes non-spherical and the structure of the spatial dependence is expressed by the off-diagonal elements of the covariance matrix. The OLS estimates are therefore unbiased but inefficient. This type of model, the spatial error

1 Since the diagonal elements of 6 are equal to 0, the diagonal elements of 2Z ^ ]?65 are 1 and, under the condition |]?| ` 1, strictly exceed the sum of the other elements in the row, which equals ]?. This makes the matrix 2Z ^ ]?65 strictly diagonally dominant and therefore always invertible.

15

model, can be specified in different ways. The most common ones incorporate spatial dependence in the error terms by defining them as spatial moving average or spatial autoregressive error (SARE) processes.

The latter is probably the most widely used and is specified as

! U4 V

-- ]06- V a , (1.6)

where a is an i.i.d. error term and ]0 is a spatial coefficient that measures spatial dependence between the errors -.

The reduced form for model (1.6) is expressed as:

! U4 V 2Z ^ ]065S?a (1.7)

and requires the matrix 2Z ^ ]065 to be a non-singular matrix. This condition is always verified under the assumption |]0| ` 1 when the spatial weight matrix is row-standardized. It follows that - = 2Z − ]065S?a and therefore X2-5 = 0 and X2-′-5 = b2cd5, where b2cd5 depends on the value of ]0:

b2cd5 = /0,2Z − ]065e2Z − ]065.S? (1.8) An important feature of this kind of models regards a possible interpretation of the presence of spatial autocorrelation in the error terms as the effect of relevant spatially autocorrelated omitted variables (Fingleton 1999), which will likely result in biased estimates if not properly modeled.

In this perspective the SARE model is capable of capturing the effect of omitted variables which is a common problem for economic modeling.

Model (1.6) can also be rewritten in a way such that a spatial lag of the dependent variable appears, as:

! = ]06! + U4 − ]06U4 + a. (1.9)

This is the so-called “Spatial Durbin model” (Anselin 1988a), which imposes some non-linear constraints on the coefficients. The presence of a spatial lag of the dependent variable in this specification complicates the

16

testing procedure for spatial autocorrelation, making it difficult to distinguish between the spatial lag and the spatial error alternatives.

Cross-regressive models 1.4.4

When a spatial lag of the exogenous variable(s) is included into a classical linear regression, a cross-regressive model is specified as

! U4 V 6fg V -, (1.10)

where f is an 7 matrix of exogenous variables which may correspond, totally or partially, to the variables included in U and g is a row-vector of spatial parameters. This kind of model is particularly useful for measuring the effects on ! of spatial spill-overs of exogenous variables.

For what it concerns the estimation of a cross-regressive model, it must be noticed that, as f only contains exogenous variables, model (1.10) can be estimated via OLS, as long as assumption OLS_1 holds for the matrix U ,U 6f. and assumptions OLS_2 and OLS_3 hold for the error terms -. Cross-regressive terms can also be added to previous specifications.

Spatial Cliff-Ord model 1.4.5

The Cliff and Ord type models, also known as SARAR(1,1) in analogy with time series literature, contains both a spatial lag of the dependent variable and of the error term (Kelejian and Prucha 1998):

! ]?6?! V U4 V -, |]?| ` 1

- ]060- V a, |]0| ` 1 (1.11) where i? and i0 may be the same spatial weight matrix or not. In particular, the two must be different from each other as a requirement for identification when applying Maximum Likelihood (ML) estimators2,

2 These identification problems that may arise in the ML estimation of this kind of model are such that almost no empirical application exists.

17

whereas an advantage of Instrumental Variables (IV) / Generalized Method of Moments (GMM) estimators is that the same spatial weight matrix can be used (Elhorst 2010). The model may also contain cross-regressive terms.