Methods for spatial regression analysis

Chapter 4: Research Methodology

4.4 Methods for spatial regression analysis

Following on the above section, a more complex statistical method needs to be employed to spatially testify to the relationships of poverty, deforestation and private investment. Drawing from studies of spatial statistics including Anselin (1992); Fotheringham, Brunsdon, and Charlton (2002); Haining (2003); and Charlton and Fotheringham (2009a, 2009b), spatial regression analysis is seen as one of the more powerful tools by social scientists. Spatial regression analysis is related to the geo-statistical techniques of Geographically Weighted Regression (GWR), which is considered more advanced than the traditional statistical analysis Ordinary Least Square (OLS) given that GWR takes into account spatial locations and spatial attributes of data, and provides more accurate analytical results. Therefore, many academic studies have applied spatial regression analysis with GWR as results from this technique are more reliable and its analytical tools are available in many types of software, particularly the Spatial Statistics Tool in ArcGIS (Charlton & Fotheringham, 2009a).

This section explores the research method of spatial regression analysis, using GIS software application, to examine the association of spatial relationships of poverty and deforestation on private investment. By doing so, it is expected to understand the spatial relationships between private investment and issues related to PEN. The findings from these methods will be

elaborated in Chapter 6. In this section, I explain the experiment models and variables employed for spatial data analysis; then, I describe two regression methods, the OLS and GWR techniques, and the differences between them. Finally, I elaborate upon the adopted analytical procedures.

4.4.1 Models and variables for spatial data analysis

In general, regression is a method for modelling the relationship between a dependent variable (the y-variable) and a set of one or more independent variables (the x-variables, predictor variables, or repressors). A simple linear regression model was constructed in the following form:

yi = β0 + β1xi + εi for i = 1…n

Where: yi – dependent variable at location i;

xi – independent variable,

εi – the error term, and

β0 and β1 – parameters or coefficients.

A multiple linear regression, which is simply the expansion of a simple regression, can be written as yi = β0 + β1x1i + β2x2i +β3x3i + …βmxmi + εi where m is the number of independent

variables. This type of model is usually fitted using a procedure known as the OLS (Charlton & Fotheringham, 2009b). The parameter or coefficient β (s) is the estimated value(s) to explain the relationship between the dependent and independent variables.

In Section 2.6 of Chapter 6, I have introduced the PIPEN model, which was employed to investigate the spatial relationships of private investment on issues related to PEN. By applying and combining both the linear regression model and the PIPEN model, this study proposes two linear models to investigate the spatial relationships:

Povertyi = b0+b1 Investmenti+b2 Deforestationi+ε1i (1)

Deforestationi = c0+c1 Investmenti+c2 Povertyi+ε2i (2)

Where: Poverty – rate of poverty incidence in each district in 2005,

Investment – private investment in resource sector during 2000-2009, Deforestation – change in percent forest cover between 2000 and 2005, ε - the error term, and

“i” – the district location.

In the first model, coefficient b1 was expected to have negative signs as investment was

increasing rate of deforestation would determine higher poverty incidence. Similarly, in the second model, c1 was expected to show negative signs as additional investment would be

expected to reduce deforestation: the coefficient c2 would show positive signs as the poverty

rate was predicted to increase along with the deforestation rate. Scrutiny of the two proposed models, coefficients b1 and c1, was expected to suggest spatial relationship patterns of

investment in poverty and deforestation, patterns that delineate the magnitude of impact of private investment on the rates of poverty and deforestation. In addition, coefficients b2 and c2

would reveal the spatial relationship patterns between poverty and deforestation, signifying district locations of PEN. It is important to note that the coefficients in both models were not implied “causalities” but relationships among the variables. These two models were not linked together.

4.4.2 OLS, GWR and their differences

Various regression techniques can be used to analyse spatial data. In this thesis, focus has been upon experimentation with both the OLS and GWR methods to investigate the above two proposed models. The OLS (known as the linear least square), which is a general form of linear regression, sets out to generate predictions or to model a dependent variable in terms of its relationships to a set of explanatory variables. GWR, a local form of linear regression, aims to model spatially varying relationships. Thus, it is a fitting model to predict the values of one variable response or dependent variables from a set of one or more independent or predictor variables. GWR, one of several spatial regression techniques increasingly used in geography and other disciplines, provides a local model of the variable or process-in order to understand or predict strength of relationship between the variables-by fitting a regression equation to every feature in the dataset (see Fotheringham et al., 2002).

The significant differences that appertain between these two methods are inscribed on their analytical features. For example, the OLS is recognised as a global model: assumed constant relationships between dependent and explanatory variables and its residuals are assumed to be independent and normally distributed with a mean of zero. The OLS holds classical linear assumptions, such as stationary data, none-autocorrelation and normality. The GWR, a local model, depicts different values for different locations within the study region, allows the relationships to vary over space, ignores classical assumptions, and can deal with non- stationary data. Unlike the OLS, the GWR has several special features; for example, it can integrate statistical analyses with geography information, and incorporate spatial localisation of data into the study (see Charlton & Fotheringham, 2009a, 2009b; Fotheringham et al.,

2002). Thus, the GWR results may prove better than those of the OLS. Despite realising this, I will experiment with both techniques in an attempt to explain the complex spatial

relationships that distinguish the PIPEN model.

Nowadays, when dealing with a large dataset, both techniques may be manipulated by using statistic soft programmes. One among these programs that is popularly used is the spatial statistic tool ArcMap (ArcGIS Version 10), which has been developed to capture both statistical and geographical analyses at the same time, a process suited to the OLS and GWR techniques. In ArcMap, the operation of OLS is simple like other software because it has few feature commands; GWR, on the other hand, needs to concentrate more on its optional feature commands and may be manipulated in many ways according to the users’ choice. Citing from GWR usage description in ArcMap Version 10,26

It is important to note that GWR constructs a separate equation for every feature in the dataset, incorporating dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and extent of the bandwidth

some of its feature commands are defined and appear directly quoted below.

is dependent upon user input for the Kernel28

Kernel type specifies if the kernel is constructed as a fixed distance, or if it is allowed to vary

in extent as a function of feature density.

type, Bandwidth method, Distance, and Number of neighbours’ parameters. In the Syntax of the GWR method, the Kernel type and Bandwidth method are important for user selection. Their descriptions appear below.

• FIXED — the spatial context (the Gaussian kernel) used to solve each local regression analysis is a fixed distance.

• ADAPTIVE — the spatial context (the Gaussian kernel) is a function of a specified number of neighbours. Where feature distribution is dense, the spatial context is smaller; where feature distribution is sparse, the spatial context is larger.

Bandwidth method specifies how the extent of the kernel should be determined. When AICc

or CV is selected, this option will find the optimal distance/ neighbour parameter for the user.

26_{Available online in ArcGIS Resource Center, Desktop 10 from:}

http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//005p00000021000000 27

The bandwidth is the distance between the data point and the regression point. Or, in statistical terms, the bandwidth is a measure of the distance-decay in the weighting function (see Fotheringham et al., 2002, pp. 44- 45).

28_{Kernel is a weighting function or a function to weight distance between data point and regression point (see} Fotheringham et al., 2002, p. 44)

• AICc —the extent of the kernel is determined using the Akaike Information Criterion (AICc).

• CV —the extent of the kernel is determined using Cross Validation.

• BANDWIDTH PARAMETER —the extent of the kernel is determined by a fixed distance or a fixed number of neighbours.

These feature commands allow users to vary their analytical options. This is why results from the GWR techniques can be readily matched to the reality.

4.4.3 Analytical procedure for spatial regression analysis

When dealing with spatial data, the GWR regression method is often employed to investigate spatial relationships. However, for exercising purpose, this study intends to experiment with both the OLS and GWR methods; then, their results need to be compared. The analytical procedure described in this section will be implemented in Chapter 6 by taking the four model estimations as shown in Figure 4.2 below. The first and second estimations will run the first model using the OLS and GWR techniques, respectively. Then, the third and fourth will run the second model using OLS and GWR.

Figure 4. 2: Analytical procedure diagram for spatial data analysis Source: Created by Author

As illustrated in the Figure 4.2, four steps are taken in each estimation procedure. First, the OLS/GWR in ArcMap tool is run, allowing each model to estimate its spatial relationships. Second, after obtaining the results, autocorrelation will be tested by using Moran’s Index (explained below). In this test, a null hypothesis states that the observed pattern is randomly distributed. Third, after testing, if the decision concludes that the observed pattern is dispersed or clustered, then the estimated results from either OLS or GWR are rejected. Alternatively, if

the observed pattern is random, the results are accepted. In the last step, only the accepted results from these estimations are compared and evaluated for better performance by investigating their statistical diagnostics. These steps have been widely used in several empirical studies (see Fotheringham et al., 2002).

In this diagram, it is important to test the autocorrelation to ensure that the residuals are statistically independent from each other. If they are correlated, their results will be

unreliable, especially that of OLS as violating its assumption. Moran’s Index is considered an appropriate test statistic because it can measure the level of spatial autocorrelation in the residuals and is also available in ArcMap tool (Charlton & Fotheringham, 2009a). More details pertaining to the test will be elaborated in Chapter 6 when the estimated results of the OLS and GRW become available.

In document Private Investment in the Resources Sector and the Poverty-Environment Nexus (PEN) in Laos (Page 123-128)