The purpose of the argument presented in the latter part of this chapter was to motivate the study of criminal phenomena using the approach and tools of complex- ity science. The social systems in which crime occurs display many of the features
which were outlined as being characteristic of complex systems, and the modelling challenges involved are similar to those highlighted. Indeed, many of the contexts in which complex systems research has previously carried out, such as ecology, are frequently invoked as analogies in the criminology literature. On this basis, the treatment of criminal phenomena in these terms is justified, and the following chap- ters will apply a number of the techniques identified in previous sections. The five substantial chapters, exploring a range of issues arising in the study of crime, will now follow.
Chapter 2
Characterisation of
spatio-temporal clustering via
network analysis
The primary subject of this chapter is the introduction of a novel method for the analysis of spatio-temporal event data, of the type commonly encountered during crime analysis. The work is motivated by the desire to characterise clustering phe- nomena in a more nuanced manner than previously possible, with the particular aim of developing the capability to identify types of clustering, rather than simply its existence. The terminology of complex networks provides a convenient and intu- itive framework for clustering analysis, and allows existing approaches to be refined through the application of graph-theoretical techniques. Two main contributions are outlined: the introduction of the technique itself, along with a statistical approach specific to the particular context, and the application of the technique to real-world crime data, which reveals distinctive patterns in event distributions beyond those which have previously been observed empirically.
2.1
Introduction
The question of whether a set of events is clustered in space and time is a funda- mental one in the analysis of crime, and one which is crucial to various aspects of theory and practice. As will be reviewed in more detail in Section 2.2, numerous empirical studies have shown that crime is distributed heterogeneously in space (e.g. Block et al., 1995), and that patterns can also be observed in the times at which it
occurs, as typified by the existence of daily regularities and seasonal cycles (see, for example, Farrell & Pease, 1994). In both cases, such patterns can be reconciled with theoretical arguments concerning the behaviour of individuals and its relationship with crime (Cohen & Felson, 1979).
Such regularities are also of practical relevance, since areas or periods of dispropor- tionate crime constitute an appealing target for interventions. ‘Hot-spot policing’ (Sherman et al., 1989; Ratcliffe, 2004), whereby policing effort is concentrated on identified areas of high crime, is a straightforward example of this, and has been shown to be effective in reducing crime (Braga, 2001). More generally, the direction of police resources to those areas or situations where they have the greatest likeli- hood of interrupting or discouraging crime is likely to be an efficient strategy for deployment.
While clustering in either time or space is well-understood, though, a more sub- tle question concerns the interaction between the two dimensions; that is, whether there is dependence between the spatial and temporal separation of crimes. In its most immediate form, this corresponds to the notion that crimes which are close in space are more likely to be close in time (and vice versa), a relationship which is exemplified by the phenomenon of (near-)repeat victimisation (Pease, 1998; Bowers & Johnson, 2005). In situations where this is evident, the occurrence of an initial crime event implies an elevation of risk in the spatial vicinity for some period after- wards; that is, the spatial distribution of crime is dependent on recent events. Again, theoretical explanations have been offered for why this should be the case (Pease, 1998; Johnson & Bowers, 2004b), and the phenomenon has clear implications for crime prevention. In general, its existence implies that risk is communicable, in the sense that locations can be at increased risk of crime occurrence simply by virtue of their proximity to a recent victimisation.
7
1
9
8
2
10
3
11
5
4
12
6
(a)6
9
5
2
12
10
3
1
11
8
4
7
(b)Figure 2.1: Close pair relationships for simple hypothetical sets of events. Vertex labels represent the time at which events occur, and their location is given by their position on the underlying grid, shown in grey. Red arrows represent close pairs, defined in this case as incidents occurring within two time units and one grid spacing.
jority of which involve comparison between the observed separation of events and that which would be expected if their spatial and temporal distributions were inde- pendent. Such methods typically involve the pair-wise comparison of events, where pairs are classified as being ‘close pairs’ if they lie within some specified thresholds in space and time (Knox, 1964; Mantel, 1967), or if they are nearest neighbours in space and time (Jacquez, 1996). These pair-wise relationships can then be used to characterise the data by, for example, comparing the number of close pairs against what would be expected if locations and timings were independent.
Although such techniques are certainly adequate to determine the existence of clus- tering per se, they are unable to provide any additional insight into the structure of the data. When defined on the basis of pair-counting only, the notion of clustering still allows for significant variability in the character of datasets, even amongst those which are found to be equivalently clustered. The hypothetical examples shown in Figure 2.1 illustrate the variation which is possible. Both datasets have the same number of close pairs, yet they exhibit perceptible qualitative differences: Figure 2.1a shows a series of isolated pairs, whereas two larger identifiable clusters are
present in Figure 2.1b. Since the marginal spatial and temporal distributions are identical in each case (the timings have simply been permuted), existing techniques would be unable to discriminate between the cases. The aim of the work presented here is to develop methods capable of identifying patterns such as these.
Considering Figure 2.1 in more depth, it can be seen that the essential difference between the cases is not in the number of close pairs, but rather their configuration. Any method capable of discerning between them should therefore consider the set of close pairs as a whole, and examine the relationships between them. A method by which this can be achieved arises from the observation that the dyadic ‘close pair’ relationship can be interpreted as defining a network. In this formulation, defined as an event network, events are represented as vertices and close pairs are joined by links, so that the analysis of the event data is translated to measurement of the network’s properties. Existing analytical techniques are trivially expressible in these terms, but the framework allows significantly more sophisticated techniques of network analysis, which have the potential to offer much greater insight into the structure of the underlying spatio-temporal data, to be applied.
One such technique with particular potential in this area is that of ‘motif analysis’ (Milo et al., 2002). This refers to the identification of small subgraphs (connected groups of vertices) which occur with disproportionate frequency within a network: these are known as motifs since they represent recurring signatures in the compo- sition of the network. Small structures such as these can be considered to be the fundamental ‘building blocks’ of the network, and thus reconciled with the micro- scale processes driving network formation. Analysis of this type has been successfully carried out in a variety of fields, including ecology and biology (e.g. Mangan & Alon, 2003), and also in a more abstract sense in the analysis of time series data (e.g. Xu et al., 2008). The approach is particularly appropriate in this context since motifs have a clear interpretation in terms of the spatio-temporal patterns they represent, and can be interpreted as signatures of the targeting processes of criminal actors.
In addition, further analysis will consider the existence, and length, of chains within event networks. These represent sequences of events in which each successive pair of events is a close pair, and their length corresponds to the number of events which can be serially linked in this way. Features such as these have been considered in the context of criminal data in the past (e.g. Johnson & Braithwaite, 2009), and again have straightforward interpretation. Chain length can give an indication of the typical lifetime of an outbreak of events, or the characteristic ‘capacity’ of an area (the point at which continued offending is no longer profitable).
Measurement of both of these features - motifs and chains - presents a technical chal- lenge due to the particular nature of event networks. Since the aim of the analysis is to identify patterns over and above the presence of clustering itself, it is necessary to control for such clustering when generating null distributions for comparison. In practical terms, this requires randomised networks to be matched in terms of the number of links present; however, while issues such as this are well-known in mo- tif analysis, the methods typically used to resolve them cannot be applied in this case. Since their construction is geometric, event networks are constrained in the form they can take, and the link-rewiring methods typically employed to generate random networks for comparison are not guaranteed to produce valid event net- works. Furthermore, in order to provide a meaningful comparison, the generation of such networks should correspond to random sampling from all possible sets of events. This is not the case for standard methods, for which the sampling is over all networks: the correspondence between event-sets and networks is not one-to-one. Because of these issues, bespoke methods are required to guarantee the validity of random network generation, and these will be introduced prior to the empirical work.
Application of the methods developed to real-world crime data demonstrates that they are indeed capable of identifying clustering phenomena at a higher resolution than previously possible. The method is applied to data for burglary and maritime
piracy, and spatio-temporal signatures can be clearly identified in both cases. These give rise to two key results: that the clustering present for these crimes is more dense than previously shown, and that the techniques developed are capable of dis- criminating between patterns which are indistinguishable by established approaches.
These results have considerable implications for work in crime analysis, which will ultimately be discussed. The emphasis of the chapter as a whole, however, is in- tended to be concentrated on the technique itself, rather than its specific application to crime. The reason for this is that, while crime analysis provides the motivation for its development, the method itself is agnostic to the nature of spatio-temporal data being considered, and could be applied in a number of other fields, such as epidemiology. With this versatility in mind, the theoretical context for these results in terms of criminal behaviour (e.g. routine activities, and ecological analogies) is not discussed in detail; indeed, since the precise context varies from crime to crime, the two examples considered here could not be discussed in a unified way. These criminological theories will be discussed fully in the particular context of burglary in the chapters which follow; here, however, the objective is simply to identify and characterise patterns of clustering.