• No results found

Geographically Weighted Regression Models for Count Data

CHAPTER 2: LITERATURE REVIEW

2.4 Regression Methods in Count Data Modelling

2.4.2 Geographically Weighted Regression Models for Count Data

Location is one of the important factors that plays a role in the frequency of crime and collisions due to the variation in sociological, economic and demographic variables that influence crime and collision occurrence. Crimes and collisions are not homogenous over geographic surfaces. They are heterogeneous in nature and vary with sociodemographic and land use factors.

Recognition of spatial dependency has been attributed to Tobler’s law that relationships exist across geographic space. Recognition of the issue has cast doubt on the accuracy of fixed

61

parameter models such as the Poisson and negative binomial models. This has led to the development of spatial models that take into consideration the influence of nearby areas.

The increasing recognition of the spatial dependency and heterogeneity that characterizes data such as crime and collision data has led to much research in the last decades, and the geographically weighted regression method has been proposed (Loo and Anderson, 2015). Geographically weighted regression is based on the idea that parameters can be estimated for different location within a study area provided that a set of dependent and independent variables is given (Charlton and Fotheringham, 2009; Huang et al. 2010). It examines relationships that exist across geographically referenced observations and the factors causing them. Geographically weighted regression is based on the idea that factors causing crime and collisions are not stationary but vary geographically.

The first version of this type of model was proposed by Brunsdon et al. (1996). It assumes a Gaussian distribution for the response variable. The functional expression of the geographically weighted regression model is given as Equation (2-16):

𝑦𝑖 = 𝛽0(𝑒𝑗, 𝑣𝑗) + βˆ‘π‘˜π‘–=1𝛽𝑖(𝑒𝑗, 𝑣𝑗)π‘₯𝑖𝑗 + πœ€π‘— (2-16)

where:

𝑒𝑗, 𝑣𝑗 , represents the coordinate at which observation 𝑖 is taken;

𝑦𝑖 and π‘₯𝑖𝑗 are the dependent and the independent variables respectively;

62

π‘˜, and πœ€π‘—, are the number of independent variables and the error term for observation location 𝑗. Equation (2-16) can be written as (2-17) when exposure 𝑑𝑗, which represent the interval at which the event is studied, is taken into consideration.

𝑦𝑖~𝑑𝑗𝑒π‘₯𝑝(βˆ‘π‘˜π‘–=1𝛽𝑖(𝑒𝑗, 𝑣𝑗)π‘₯𝑖𝑗) (2-17)

The concept of geographically weighted regression is established based on non-parametric smoothing and curve fitting techniques in which local regression parameters are determined from a subset of data around estimation points. It uses data within a neighbouring location in a geographic space for model calibration to explore spatially varying relationships (Wheeler and Paez, 2010). Most importantly, geographically weighted regression assumes that observations closer to each other influence parameter estimates of a data point in a geographic space, thus incorporating the first law of geography – everything is related to everything, but closer things are more related (Thapa and Estoque, 2012)

Geographically weighted regression as a non-parametric technique that uses subsampling of observed data across space for statistical analysis and has become important in the study of spatially referenced data such as crime and collision data. However, the concept of subsampling statistical data, though innovative in spatial statistics, is not new to classical statistics. Its application in the field of spatial statistics may be attributed to Brunsdon et al.’s (1996) research that applied the concept of variable geographic space.

The original reference to the concept used by Brunsdon et al. (1996) could be linked to a smoothing technique for histograms. The idea of using distance weighting also existed in interpolating algorithms, but the advancement of weighted concepts in geography paved the way for a multivariate local spatial data approach. The advantages of understanding local relationships

63

and patterns have made the geographically weighted regression more popular (Paez et al., 2011). However, it can still be argued to be an extension of locally weighted regression like quantile regression which explores the variation in relationships among statistical data. Similarly, geographically weighted regression is related to conditionally parametric regression which is an extension of locally weighted regression. The difference is that the parameters (coefficients) of geographically weighted model are non-parametric estimates from longitudes and latitude, i.e. the distance between observations and target points of a locally weighted regression (McMillen, 2012).

The need to accommodate spatial dependency and heterogeneity has led to increased use of geographically weighted regression and has become a method for modelling spatial non- stationarity (Whigham and Hay, 2007). It has found wide application in various fields of study. Areas in which geographically weighted regression has been used include urban planning (Huang et al, 2010; Du and Molley, 2012; Kyratso, and Yiorgos, 2004), demographic studies (Bajat et al, 2011), health sciences (Comber et al., 2011; Lin and Wen, 2011), ecological biology (Windle et al, 2009), social sciences, crime studies (Cahill and Mulligan, 2007), and transportation studies (Hadayeghi et al., 2010; Zhang et al., 2015; Zheng et al., 2011). However, the geographically weighted regression proposed by Brunsdon et al. (1996) in not suitable for non-negative count data such as crime and collision data due to the Gaussian distribution placed on the response.

In the last decade, the geographically weighted regression method has been advanced to accommodate a Poisson and negative binomial distribution suitable for count data. These models are known as geographically weighted Poisson regression (GWPR) and geographically weighted negative binomial regression (GWNBR).

64 Geographically Weighted Poisson Regression (GWPR)

Geographically weighted regression has been extended from the originally proposed version by Brunsdon et al. (1996) to allow for a Poisson distribution on the response variable, thus making it suitable for count data that are not over-dispersed. This model is known as the GWPR and was proposed by Nakaya et al. (2005). It uses a conditional spatial kernel weighting function for the estimation of the parameter variation of the Poisson regression method (see Equation 2-18 for functional form of the GWPR):

𝑦𝑖~π‘ƒπ‘œπ‘–π‘ π‘ π‘œπ‘›[𝑑𝑗𝑒π‘₯𝑝(βˆ‘π‘˜π‘–=1𝛽𝑖(𝑒𝑗, 𝑣𝑗)π‘₯𝑖𝑗)] (2-18)

The parameters in Equation (2-18) are the same as those of (2-16) and (2-17) except that the model in Equation 18 follows a Poisson probability density distribution.

In a recent application of GWPR in the development of a collisions prediction model by Pirdvanni et al. (2014a), it was established that it outperformed the traditional negative binomial regression known to be the state of art modelling in road safety. This was attributed to the advantage of the GWPR model, i.e., capturing different patterns (spatial dependency/heterogeneity) in the relationship between the aggregated collisions and the socio- demographic predictors.

Similarly, Li et al. (2013) used the GWPR to model the relationship between collisions and their spatial correlates using California data as a case study. Results consistent with the study by Pirdvanni et al (2014a) were obtained and GWPR model was found useful for capturing the spatial relationship between collisions and reducing spatial correlation in the residual obtained from the models developed.

65

The GWPR model has also been used to explore the impact of teleworking on the frequency of collisions in the Flanders region of Belgium. The traffic safety benefit of teleworking was highlighted using this approach. As expected, teleworking reduced the total vehicle kilometers travelled which resulted in a reduction in vehicular collisions (Pirdvanni et al 2014b).

Most applications of the GWPR model are found in transportation studies. Most researches that have used geographically weighted regression in crime studies have still used the type proposed by Brunsdon et al. (1996).

Interestingly, most researchers that have used the GWPR model have compared the performances of the model developed with negative binomial regression and have reported the advantages of improved performance. However, the problem of over-dispersion that characterizes count data is downplayed in GWPR model.

While accounting for spatial dependency could enhance the accuracy of models, over- dispersion in count data is not considered by the GWPR and could constitute a problem.

Geographically Weighted Negative Binomial Regression (GWNBR)

Another extension of geographically weighted regression is the GWNBR. This model is based on the idea that the over-dispersion that characterizes spatially referenced count data such as crime and collision data should be taken into consideration. The GWNBR was proposed by Da Silva and Rodriguez (2014). The concept of this model is similar to that of negative binomial regression. However, rather than using a fixed parameter to describe relationships across space, the parameters vary locally. This is similar to the GWPR except that over-dispersion is accounted for. The functional form of the GWNBR model is given as Equation (2-19):

66

The parameters in Equation (2-19) are similar to those in (2-18) except that 𝛼 represents the over- dispersion parameter.

Geographically weighted Poisson regression and the traditional version, which assumes a Gaussian distribution for the response, was used extensively in crime and collision studies, but over-dispersion had not been thoroughly treated until Da Silva and Rodriguez (2014) proposed the GWNBR model. The proposed method has not yet been comprehensively evaluated to determine the improved performance offered compared with the GWPR model. This is a research gap that needs to be investigated.