Clustering - Theoretical background - Spatio-temporal modelling for issues in crime and securit

4.2 Theoretical background

4.3.2 Clustering

The existence of space-time clustering is the fact upon which all further research is predicated; indeed, the achievability of short-term crime modelling in general depends on it. In the case of burglary, it has been demonstrated across a wide range of settings (Johnson et al., 2007) using a simple statistical method. In order to establish this basic property, such analysis is repeated here.

4.3.2.1 Euclidean distance

The Knox (1964) test for space-time clustering has been applied frequently in the context of crime, and recent results for the case of burglary have been found using a version employing Monte-Carlo simulation (see, for example, Johnson et al., 2007). This test was introduced, and defined formally, in Section 2.3.2; however, a more sophisticated form is used here and so the general form will be presented again.

The essence of the Knox test lies in the pairwise comparison of incidents, and the classification of each pair according to the proximity of the incidents in space and time. Prior to the analysis, an ordered system of bands is defined for both space (e.g. 0-100m, 101-200m. . . ) and time (e.g. 0-6 days, 7-13 days. . . ), which define var- ious degrees of the concept of ‘closeness’ in each dimension. The incidents are then examined by comparing all possible pairs and calculating the spatial and temporal separation for each. These comparisons are used to populate a contingency table of the spatial and temporal bands (of the same shape as Table 4.1): the value for each cell is the number of pairs for which the separations fall within the corresponding two bands.

When these observed counts have been found, the remaining task is to determine a suitable ‘null’ distribution against which the value for each cell can be compared. Though this can be done in several ways, a popular form involves a permutation approach. For some number n_K of iterations, the spatial locations of the incidents are permuted and a new contingency table (of the same form as that for the ob-

served case) is produced for this modified set of incidents. When this process is complete, n_K tables have been produced, each of which represents a realisation of the classification process under the hypothesis that there is no association between the spatial and temporal distributions of the incidents. One of the advantages of the permutation approach is that the spatial and temporal distributions of offending are preserved at each iteration, so that the test is not distorted by clustering present in either of those dimensions independently.

For each cell of the contingency table, the observed value can be compared against the nK corresponding values in the null-generated tables. The extremity of the ob-

served value, relative to this set of values, is a measure of the extent to which the clustering of the observed incidents departs from what would be expected if their spatial and temporal characteristics were independent. Formally, the statistical sig- nificance of the observed value can be estimated by finding the position, RK, which

the observed value would occupy in a rank-ordered list of the null values for the cell and applying the formula

p = nK − RK+ 1 nK+ 1

(4.1)

for the pseudo p-value (North et al., 2002). The magnitude of any effect can also be estimated either by computing a z-score for the observed value, relative to the distribution, or by finding the ratio of the observed value to the median of the distribution. The former option will be used here.

In this way, the extent of clustering is quantified for every combination of spatial and temporal bands, e.g. “in the second week after a burglary has taken place, significantly more incidents tend to occur at a distance between 101 and 200 metres than would be expected on the basis of chance”. Because of this, the results can be used to assess the way in which the magnitude (and presence) of clustering varies over space and time, thereby providing an estimate of the spatial extent of risk ele- vation.

This test was carried out, with nK = 99 iterations, for the Birmingham burglary

data, and the results are shown in Table 4.1. Highly significant clustering is evident at almost all scales examined, and a general (though not monotonic) trend of de- creasing influence can be seen at increasing levels of separation. These results are entirely in line with expectation and consistent with those found elsewhere (Johnson et al., 2007).

Spatial band - upper limit (metres)

100 200 300 400 500 600 700 800 900 1000

Temporal band - upper limit (days)

7 9.78 8.35 8.07 8.20 7.21 6.56 5.39 4.39 5.85 3.58 14 7.79 7.25 6.19 5.56 6.90 6.25 4.35 4.34 5.16 3.63 21 6.63 5.47 6.10 5.39 5.83 4.76 3.60 3.30 4.30 4.00 28 5.93 5.07 5.19 5.72 3.78 3.05 4.67 3.80 2.29* 1.62* 35 5.75 5.71 4.11 4.20 4.67 3.51 3.33 3.49 1.65* 2.24

Table 4.1: Z-scores for a Knox test performed on 26,614 incidents of burglary in Birmingham. Bands are denoted by their upper limit, so that, for example, the band denoted by 200 comprises all values in the range (100, 200]. Non- significant values are marked with *; all others are significant with a p-value of 0.01.

4.3.2.2 Network distance

Although the technical definition of space-time clustering - dependence between the spatial and temporal distributions of crime - is a universal one, its precise meaning does, naturally, depend on the way in which space and time are represented. For space in particular, there are well-motivated alternatives to the ‘as-the-crow-flies’ notion of distance commonly used, which might better reflect the role of space in human activity. As has been argued previously in this thesis, there are good reasons to consider that the locations of places in an urban environment might be most ap- propriately thought of in terms of their position on the street network, particularly when navigability is a concern.

Adapting the Knox test to a network framework is straightforward: all that is required is to alter the distance metric used in pairwise comparisons and to modify

the spatial bands accordingly. Nevertheless, such an adaptation has not previously been used in published work, and so the issue of clustering has not been examined in network terms. Doing so here achieves this, and also provides a more immediate motivation for the work on directionality which follows. Of course, the two measures of distance - network and Euclidean - are highly correlated, and therefore a positive result is to be expected, given the findings of the previous section.

The notion of network distance used here is based on topological separation; i.e. the adjacency of streets is the fundamental concern. The distance between any pair of incidents is therefore defined as the number of topological steps between the street segments on which they occur: 0 if the same street, 1 if they occur on neighbouring streets, 2 if there are two degrees of separation, and so on. This is a discrete mea- surement, and so exact values can be used without the need for spatial bands.

Again, the choice to measure distance using these units is motivated by a combination of technical and theoretical concerns. Keeping in mind the ultimate aim of analysing (near-)repeat targeting, street segments represent a convenient unit for choice-based analysis, and so it is useful to understand clustering in discrete terms. As noted in Section 3.2.1.1, the inclusion of additional granularity (by using fractions of streets, for example) is likely to be counter-productive, since all significant varia- tion in network metrics takes place at the segment level. More practically, though, results stated in terms of degrees of separation represent an appealingly parsimo- nious, and useful, outcome. A finding expressed in terms of ‘the risk to properties 2 streets away’, for example, provides a contrasting perspective to that of metric distance, and might suggest an easily-comprehensible heuristic for intervention.

The results of this test, using a process identical in all other respects to that of Section 4.3.2.1, are shown in Table 4.2. In this case, the results are found to be highly significant for all levels of separation, as is consistent with the results for the Euclidean case. Particularly strong effects can be seen for the one-week window,

across several degrees of network separation, which provides a useful indication of the extent of such effects. One other observation is that, although there is an appar- ent decrease in effect magnitude with increasing spatial separation, the relationship is far from monotonic. The fact that this differs from a simple distance-decay relationship provides partial motivation for the study of whether the process is biased towards streets with certain characteristics. Overall, though, the main conclusion of the results is that space-time clustering is clearly evident, and can therefore be well identified in network terms.

Spatial separation (steps)

0 1 2 3 4 5

Temporal band - upper limit (days)

7 9.77 8.65 7.93 7.81 7.57 7.65 14 7.99 4.71 5.21 6.12 4.65 5.79 21 6.54 4.14 3.22 5.03 4.98 4.27 28 5.46 3.27 3.69 5.68 4.20 4.42 35 6.12 3.38 4.15 5.64 5.25 4.67

Table 4.2: Z-scores for a Knox test performed on 26,614 incidents of burglary in Birmingham, where distance is calculated in terms of degrees of separation between street segments. All values are statistically significant with a p-value of 0.01.

In document Spatio-temporal modelling for issues in crime and security (Page 158-162)