Spatial Dependency and Heterogeneity - Technical Challenges in Macro-Level Collision and Crime

CHAPTER 2: LITERATURE REVIEW

2.3 Technical Challenges in Macro-Level Collision and Crime Prediction Modelling

2.3.1 Spatial Dependency and Heterogeneity

The problem of spatial dependency and heterogeneity is peculiar to aggregated data used in zonal level analysis. Data in which spatial effects (spatial dependency and heterogeneity) are known to occur include crime and collision data. Collision data are usually collected with reference to location in space (Quddus, 2008). So also are incidences of crime. When the spatial effects are not taken into consideration in modelling, the analysis is described as global models. Global analysis uses fixed parameters in describing relationships that exist across geographic space.

The assumption of a fixed parameter for data from a geographic reference is often incorrect (Haining, 2009; Meng, 2014; Li et al., 2011; Lloyd and Shuttleworth, 2005). Thus, accommodating spatial dependency and heterogeneity in modelling variables with spatial dimension becomes important. This is achieved by incorporating spatial weights for the observation locations.

The spatial clustering seen in macro-level modelling could be attributed to boundaries imposed on continuous geographic surfaces. In most cases, the boundaries used to delineate neighbouring zones are artificial so there is a likelihood of similarity in characteristics in dense areas when examined as a continuous surface.

A common example of spatial clustering of events in collision studies relates to Vehicle Kilometers Travelled (VKMT) which is used as exposure for modelling traffic collisions. It is generally accepted that as VKMT increases, the likelihood of collisions increases (Rolison and Moutari, 2017; Rhee et al., 2016; Soltani and Askari, 2017; Zegrass, 2010). TAZs with higher VKMT often tend to cluster giving an indication of spatial dependencies between nearby locations and thus similarities in the numbers of collisions observed in zones that are close in proximity. VKMT similarities in zones within close proximity could be linked to highly similar traffic characteristic in neighbouring zones. The artificial boundaries-imposed cause spatial clustering.

For examples, high volume arterial roads usually cross proximal zones. Spatial dependency associated with VKMT could also be linked to the conditions used in aggregating the information used to determine the boundaries. In the case of VKMT and zones with an arterial in common, the assumption is that both zones contribute equally to the traffic on the arterial. This means that the boundary that separates the zones could result in the clustering of collisions in both zones. This then gives rise to spatial dependency between zones with a boundary and major arterial in common. It is clear that VKMT and socio-economic and land use characteristics may contribute to the clustering of traffic collisions (Rhee et al., 2016; Soltani and Askari, 2017).

The presence of spatial dependency in crime data has been extensively discussed in crime studies (Deane et al., 2008; Light and Harris, 2012). According to Collins et al. (2006), the clustering of crimes at certain locations and areas within a city or neighbourhood strongly

influences characteristics and social interaction within areas. The higher rates of crime (both violent and non-violent) seen in some neighbourhoods could be attributed to high clustering of unemployed persons and/or low-income characteristics of the area. This type of socio-economic characteristic is typically spatially clustered within an area thus contributing to spatial dependency in the crime rate. Also, interactions not confined to an area may cause crime to spill to other adjacent location thereby leading to similarities in crime rates (Haining et al., 2009).

A solution to the challenges of spatial dependency requires allowing model coefficients to vary locally by incorporating additional information based on the spatial structure (Anselin, 2010). Specifying local relationships across space requires the use of spatial regression models and can be classified into three broad categories (Bernasco, and Elffers, 2010). These are the spatial error models, spatial lag models, and the geographically weighted regression models.

According to Anselin (2003), a spatial error model allows for spatial dependency in the error of a model and is dependent on the spatial weight matrix defined for a model. In a spatial error model, the error term is divided into uncorrelated and correlated parts. Bernasco and Elffers (2010) explained that a spatial error model is suitable if there is a possibility of spatial dependency in independent variables affecting a response variable. Spatial error models take into consideration spatial influences in unobserved variables (spatial dependency is specified on error term) (Ward and Gleditsch, 2018). A spatial error model is most suitable where there is a possibility of interdependence in the error term rather than in the independent and dependent variable.

A spatial lag model evaluates spatial interdependency in variables across units of analysis in a geographic space. It uses the observations from proximal locations or areas to provide a reason for the occurrence of an event in nearby areas. Rather than specifying spatial dependency on the error term, spatial dependency is specified on the fixed parameter estimate of the independent

variables of the model. A spatial lag regression model is like an ordinary regression model especially when the lag is placed on the independent variables While a spatial lag model allows for spatial dependency to be incorporated into modelling, interpretation of the results is considered more complex (Chi and Zhu, 2008).

Neither the spatial error nor the spatial lag model calibrates a local regression equation for all data points. The models do not give an indication of how the relationship between the dependent and the independent variables varies across space.

Another spatial regression model used in capturing spatial dependency in data is known as the geographically weighted regression model. This was proposed by Brunsdon et al. (1998) and allows model parameters to vary locally with a known parametric family of distributions placed on the response. Geographically weighted regression models (ordinary least square, logistic, Poisson or negative binomial) are calibrated using bandwidth and by weighting observations based on their proximity to reference point such that observations closer to the reference points are given higher weights than observations for more distant locations (see chapter 3).

The main advantage of these models over the spatial error or spatial lag model is that they allow various distributions to be specified on the response variable rather than following the normal distribution. The models also calibrate a separate regression model for each data point and provide opportunity for spatial dependency in relationships to be thoroughly tested.

This dissertation uses and explores the geographically weighted regression model as its spatial model.

In document Development of Hotzone Identification Models for Simultaneous Crime and Collision Reduction (Page 50-54)