2.2 Exploratory space-time data-driven modelling
2.2.3 Quantifying change in spatial event data
The timings and locations at which spatio-temporal interaction of event data arises, and its duration and geographic extent, have recently been of interest in a number of studies. Many of the tools used in analysing such effects are exploratory in nature, as they again require few modelling assumptions, which are typically informed by aggre-
gating statistics from the empirical data. The insights obtained by local perspectives of spatio-temporal interaction can be much more beneficial in a policy setting than the identification of global spatio-temporal interaction. The early identification of the lo- cal spreading of a disease or violence, for example, can lead to targeted vaccination or policing strategies that help to minimise its adverse impact and possible spreading.
One example of a more local and dynamic spatio-temporal exploratory technique is Kulldorff’s space-time permutation scan statistic (Kulldorff, 2001; Kulldorff et al., 2005), which can be used to detect the emergence of hotspots of activity. This statistic and its associated Monte-Carlo method for assessing statistical significance has been shown to be robust for different spatial resolutions (Jones and Kulldorff, 2012) and under incomplete and inaccurate data (Malizia, 2013). Examples of its use in relation to civil violence event data can be found in O’Loughlin et al. (2010a), O’Loughlin et al. (2010b) and O’Loughlin and Witmer (2010). The method deploys moving cylindrical space-time windows of varying spatial and temporal resolution over the study area and compares the counts of events with what would be expected under a null hypothesis (e.g. of spatial and temporal homogeneity). The statistic is given by the maximum over all deployed cylinders of the generalised likelihood ratio, a function that compares the counts of empirical events both inside and outside the space-time window with the counts that would be expected under a null hypothesis. Since the method is applied locally in space and time, it can be used to detect the emergence of hotspots of activity. A number of other studies have considered change in event patterns at a local level by, for each spatial regionj, calculating the tuple
Xj, X l WjlXl ! , (2.1)
where the variableXj is a variable of interest, taken in previous studies to be a stan-
dardised count of events, or a binary indicator of event occurrence, in spatial region
j, and Wjl is a row standardised matrix of spatial weights with zero diagonal. For a
suitable definition of the spatial weights matrix,Xj provides information about event
occurrence in spatial region j, and Yj =
P
lWjlXl provides information about event
occurrence in those areas nearby to regionj. In Anselin (1995), using a standardised
count of events, a comparison ofXj andYj is used to detect statistically significant ar-
point(Xj, Yj) lies in the plane indicates whether higher than average event occurrence
is near in geographic space to other higher than average counts, whether low counts cluster near to other low counts, or whether there is negative autocorrelation and low counts are near to high counts. The variableZj is specified, which can take one of four
values for each spatial regionj. If there is a high number of events in both the focal
regionj and its neighbouring regions, then Zj = HH. Conversely, if there are a low
number of events in bothj and its neighbouring regions, then Zj = LL. If there is
negative spatial autocorrelation, and a high count of events in the focal region is near to low counts, thenZj = HL. Zj = LH is defined analogously. Thus, Zj provides a
simple indication of the local spatial autocorrelation near to spatial regionj.
Cohen and Tita (1999) extend the local indicators of spatial association described in Anselin (1995) to consider temporal effects. By choosing an appropriate temporal partition of the empirical data, the authors calculate
Xj(tk), X l WjlXl(tk) ! , (2.2)
for some time steptkwhere the variablesXj(tk) and Yj(tk) =PlWjlXl(tk) are as in
equation 2.1 but specific to the time intervaltk. By determining the quadrant within
which this tuple lies on the plane for different areasj and times tk, the local character-
istics of spatial autocorrelation in the event data at each time interval can be visualised and, moreover, categorised.
DefiningZj(tk) analogously, and considering the change in Zj(tk) over different
time intervals leads to insights into to how the local spatial dependency in the event data changes. The transitionZj(tk) → Zj(tk+1), which can take one of 16 possible
values (e.g. HH → HH, HL → LH, etc.), can be interpreted as different dynamic
processes in the event data. The transitionHL→ LH, for example, corresponds to the
relocation of events in the focal region to neighbouring regions. Similarly,HL→ HH
corresponds to escalation of event occurrence from a focal region to neighbouring re- gions. The identification of these patterns in event data can lead to a better appreciation of the range of mechanisms that might be at play. In many cases, the counts of each type of diffusion are compared against the counts that would be expected under a null hypothesis of event independence, which can be computed using a Monte Carlo simu- lation.
Using this framework, Cohen and Tita (1999) identify the presence of the geo- graphic diffusion of homicide occurrence in Chicago; Hsueh et al. (2012) explore the different types of geographic diffusion in cases of Dengue fever in Taiwan; and LaFree et al. (2012) consider whether a change in strategy of the Spanish terrorism organisation ETA coincided with a change in the nature of the spatial diffusion of event occurrence. Two further studies—Rey et al. (2011), who investigate burglary events in Ari- zona, and Schutte and Weidmann (2011), who investigate conflict events during the civil wars in Bosnia, Kosovo, Burundi and Rwanda—employ binary measures of event occurrence in each spatial region, known as join counts, rather than using standardised counts as described in Anselin (1995). That is, the variablesXj(tk) and Yj(tk) deter-
mine whether at least one event occurred in, respectively, spatial region j or nearby
regions at time tk. Then, the transitions of the variable Zj(tk) = (Xj(tk), Yj(tk))
are considered. This approach is particularly well-suited to relatively rare events in space and time and it alleviates the need for modification of the event data, for example by normalising. In this case, no choice is required regarding how to normalise event counts, a choice which may have a significant influence on the resulting analysis. To explain, if event counts are normalised at each time step, then an area with an apparent high level of events at one time step may appear to become an area with low intensity due to the onset of events elsewhere and not due to any change in the original area. Conversely, if the count of events are normalised across all time intervals considered in the analysis, then the identification of high intensity locations is sensitive to variation in the overall intensity of events.
Importantly, the frameworks described in this section are all exploratory. The null models against which some of the statistics described are compared against can often be easily specified using Monte Carlo modelling. These models are constructed with minimal assumptions regarding the underlying mechanisms in the generation of the event data. One example of a Monte-Carlo model that can be generated is complete spatio-temporal randomness, in which events are equally likely to occur within any spatial region and at any point in time over the entire study area. Simulations are used to generate the same number of events as in the empirical data under this assumption. Often a more appropriate model when considering spatio-temporal interaction is given by a Monte Carlo simulation that preserves both the spatial and temporal distribution
of the event data but loses any spatio-temporal dependence by randomly permuting the times associated with each event. This model enables a comparison of the data against a scenario with no spatio-temporal interaction and is useful for considering the effects of event interdependency.