Spatial regression analysis and its models

Chapter 6: Spatial regression analysis

6.2 Spatial regression analysis and its models

This section briefly describes the backgrounds and some components related to the concepts of spatial regression analysis. I explicate the ways in which spatial regression analysis was applied to quantify the spatial relationships in the PIPEN model.

6.2.1. Some terms related to the concept of spatial data analysis

According to Fotheringham and Rogerson (2009, p. 1), spatial data are different from other data in that they contain ‘locational information as well as attribute information’; or, in other words, they are recorded at different locations which are coded as part of the data. In this sense, it is important to take into account spatial distribution in these data. Spatial analysis is ‘one of the techniques using this locational information to better understand the processes generating the observed attribute value’ (ibid). Haining (2003, p. 4) defines spatial analysis as ‘a collection of techniques and models that explicitly use the spatial referencing associated with each data describing the spatial relationships or spatial interactions between the cases’. He describes the three main elements of spatial analysis as follows:

… [F]irst, it includes cartographic modelling. Each data set is represented as a map and map-based operations (or implementing map algebras) generate new maps…[S]econd, it includes forms of mathematical modelling where model outcomes are dependent on the forms of spatial interaction between objects in the model, or spatial relationships or the geographical positioning of objects within the model…[F]inally, it includes the development and application of statistical techniques of the proper analysis of spatial data which, as a consequence, make use of the spatial referencing in the data. This is the area of spatial analysis that we refer to as spatial data analysis… (Haining, 2003, pp. 4- 5, orignial emphasis).

A useful study of spatial data analysis in the social sciences was conducted by Anselin (1992). According to his study, location has two spatial effects32

1970, p. 236

, namely spatial dependence and spatial heterogeneity. Building upon Tobler’s first law of geography, which states that “everything is related to everything else, but near things are more related than distant things”( ), Anselin, who refers spatial dependence to spatial autocorrelation or association, claims that similar values of a variable tend to occur in nearby locations, leading to spatial clusters. Spatial heterogeneity infers regional differentiation as each location has its own intrinsic uniqueness. To this end, Anselin suggests treating the crucial role of location for spatial data in both an absolute sense (coordinates) and a relative sense (spatial arrangement, distance) when conducting statistical analyses. Otherwise, the results of data analyses may prove invalid.

Following Anselin’s study and departing from standard assumptions of independence and homogeneity, special techniques relating to three features, e.g., rubrics of spatial statistics, geostatistics and spatial econometrics are needed. Anselin considered ‘spatial statistics’ the most general among the three. During the 1980s, spatial data analysis was not commonly

undertaken even through its techniques were considered important. Omission was due to the lack of operational software used for spatial data analysis. However, subsequently, many attempts were made to add features for spatial analysis to many of the existing data analysis software packages. Later, these techniques became widely applied by social scientists (see Anselin, 1992).

According to Anselin, spatial data may be dependent on its neighbours; for this reason, it may be wrong to use the ordinary regression model to analyse spatial data. Fotheringham,

Brunsdon, and Charlton (2002, p. 21) note that approaches attempting to generate a regression framework taking into account spatial dependency are referred to as ‘spatial regression

models’. Fotheringham et al., (2002) introduced the GWR technique that incorporates spatial data into models based on the traditional regression framework and by incorporating local spatial relationships into the regression framework in an intuitive and explicit manner. Charlton and Fotheringham (2009a, 2009b) have applied statistical methods to GWR in an attempt to capture both spatial dependence and spatial heterogeneity. Both of these methods are described below.

The OLS and GWR techniques are used to generalise coefficients to predict the relationships between dependent and explanatory variables; but, as suggested in Chapter 4, these two techniques are different spatial data analyses (see Fotheringham et al., 2002). On the one hand, the OLS is treated as a global model because the predicted coefficients between dependent and explanatory variables hold constant for an entire study. Its residuals are assumed to be independent and normally distributed with a mean of zero. This is because OLS holds classical, linear regression assumptions and with data stationary, which may not be realistic in reality. On the other hand, the GWR is viewed as a local model because it can depict different predicted coefficients for different locations within the study regions. The GWR does not hold classical assumptions and can deal with non-stationary data which is more realistic (see Chapters 1 and 2 in Fotheringham et al., 2002). More importantly, the single predicted coefficient of each explanatory variable in the OLS cannot be mapped; but, the various coefficients in the GWR can be mapped to show the pattern of each coefficient in the Model. While acknowledging their differences, it can be useful to experiment with both techniques to understand the various analytical methods employed in this chapter.

In studies dealing with spatial data, Charlton & Fotheringham (2009a, 2009b) and

Fotheringham et al., (2002) explain both OLS and GWR to investigate spatial relationships. They initially estimate the OLS first in order to understand the predicted values in the model;

after that, analysis shifted from the OLS to the GWR, incorporating an account of locations and spaces into the model. The results of these two methods were compared and evaluated when searching for a better performance by investigating their statistic diagnostics. Recently, these methods have been widely used in several empirical studies (see Charlton &

Fotheringham, 2009a; Fotheringham et al., 2002). Both the OLS and GWR tools have been developed and incorporated into many geostatistical software programs like ArcGIS with more convenient application (Charlton & Fotheringham, 2009a). As suggested in Chapter 4, this chapter (Chapter 6) will follow these methods to quantify the spatial relationships in the PIPEN model.

6.2.2. Spatial regression of PIPEN modelling

In line with the research analytical framework in Chapter 1 (see Figure 1.3), a model has been constructed to investigate the spatial relationships of private investment (PI) with PEN. As shown in Chapter 2 (see Figure 2.5), this model is named ‘PIPEN’. To estimate the spatial relationships of this model, two relevant equations were modelled and named Models 1 and 2, respectively (see Chapter 4).

Povertyi = b0 + b1 Investmenti + b2 Deforestationi +ε1i (Model 1)

Deforestationi = c0+ c1 Investmenti + c2 Povertyi+ε2i (Model 2)

In Model 1, the coefficient b1 is expected to be negative as additional investment is assumed

to reduce the poverty rate. Based on the vicious circle concept of PEN, the coefficient b2 is

expected to be positive as additional deforestation rate is predicted to increase the poverty rate. In Model 2, the coefficient c1 is expected to be positive as added investment is assumed

to increase the deforestation rate. Similarly, the coefficient c2 is expected to be positive as

additional poverty rate would associate with increase in the deforestation rate. Drawing from these two models, it may be said that the two estimated coefficients b1 and c1 will reveal the

spatial relationship patterns of investment on poverty and deforestation. The two estimated coefficients b2 and c2 will represent spatial relationships and locations of PEN. It is important

to note that these coefficients are not implied ‘causalities’.

When applying GWR techniques, it is important to carefully select the spatial options in the ArcGIS tools to meet the study objective. Many options for geographical weights are available in its tools. In this chapter, the GWR dependent and explanatory variables are the same as the OLS. But, for the geographical weight options, as stated in Chapter 4, the

ADAPTIVE method was chosen for the ‘Kernel Type’ and the corrected Akaike Information Criterion (AICc) was selected for the ‘Bandwidth method’ in the interests of simplicity. To support the estimations of the above two models, statistical dispersion of the three map data sets related to the PIPEN model is provided in Table 6.1 as measurements of data variation. For example, investment as in numbers of district investment shows large data range with a mean value of 6.92, median of 4.00, standard deviation of 7.81, and minimum and maximum value of 0 and 50, respectively.

Table 6. 1: Statistical dispersion of investment, poverty and deforestation Mean Median Std Deviation Min Max

Investment 6.92 4.00 7.814 0 50

Deforestation -0.02 -0.02 0.027 -0.087 0.055 Poverty 0.38 0.38 0.147 0.064 0.752

ArcGIS software version 10 was employed. The OLS and GWR for modelling spatial

relationships were employed, both of which are available in ArcToolbox under ArcMap. The next section starts with estimating regression of Model 1 with OLS, then GWR tools, the aim being to generalise the coefficients to predict the relationships between the dependent and explanatory variables in the above two Models.

In document Private Investment in the Resources Sector and the Poverty-Environment Nexus (PEN) in Laos (Page 166-170)