Live Test Data - Machine learning algorithms for crime prevention and predictive policing

3.4 Results

3.4.2 Live Test Data

Hazard, Cumulative Hazard and Survival Functions

Survival Plots Firstly, the survival function plots for four different crime examples will be displayed. A side-by-side comparison of the survival plots for two crimes that led to a reoffence is displayed in Figure 3.9.

Figure 3.9: S(t) Plots, Failures at t = 2 (top) and t = 30 (bottom). Please note that the scale of each graph is different.

Here, the two survival functions appear to be very similar - they start off with a 90% chance of ”surviving” (i.e. not reoffending within) the first 4 weeks, which quickly drops to around 65% by the 40 week mark. Once again, the offender’s probability of survival dropping sharply in the first few weeks and flattening out to around 50% by the time the offender has reached about 2 years without an offence.

The only difference between the two plots is that the first has a slightly steeper line than the second - it would therefore be reasonable to say that the first offender (who offended between 4 and 8 weeks post the original offence) would be slightly more likely to reoffend at an earlier time than the second.

The survival functions of two censored offences will now be examined. One has been committed 4 weeks prior to the end of the dataset (in this case, 31st December 2017), while the other has been committed over 2 years before the final censor date. The two plots are shown in Figure 3.10.

Figure 3.10: S(t) Plots, Censored at t = 1 (top) and t = 26 (bottom). Please note that the scale of each graph is different.

A large difference can be observed between the two plots - the first censored individual has an eventual 65% chance of survival, whereas the second has an ap-

proximately 80% chance of survival. From these plots, it is unlikely the second offence (which has already reached approximately 2 years without resulting in any further offence activity) is going to result in a reoffence - the first offence is far more likely to lead to a reoffence both eventually and within the next 4 week period.

Cumulative Hazard Plots The cumulative hazard function Λ(t) will now be plotted for each of the four examples in turn. The Λ(t) plots for two selected offences that led to a reoffence are shown in Figure 3.11 below.

Figure 3.11: Λ(t) Plots, Failures at t = 2 (top) and t = 30 (bottom). Please note that the scale of each graph is different.

Once again, very little difference is observed between the cumulative hazard for the offender who committed a reoffence between 4 and 8 weeks after the original offence and that for the offender who committed a reoffence over 2 years after the

original offence. A potential reason for this could be the differential sentencing treatment between the two offenders - the first offender could have been released with a fine (enabling them to be ”in the community” to offend much more quickly) while the second could have been put into prison for the offence for a number of months or even years, rendering them unable to offend for that time. It could also be that these were simply very similar risks that have simply resulted in a reoffence at different times due to chance.

To see whether different conclusions should be drawn on censored data, refer to the plots in Figure 3.12 below.

Figure 3.12: Λ(t) Plots, Censoring at t = 1 (top) and t = 26 (bottom). Please note that the scale of each graph is different.

those offences that led to a reoffence, the cumulative hazards here are very different - the first offence appears to be much more likely to lead to a reoffence at every point in time

Instantaneous Hazard Plots Here, this instantaneous hazard λ(t) is plotted for each of the available t values. These plots, beginning with the offences that led to a reoffence, are shown in Figure 3.13 below.

Figure 3.13: λ(t) Plots, Failures at t = 2 (top) and t = 30 (bottom). Please note that the scale of each graph is different.

Here, large fluctuations in the instantaneous hazard are not present until around t = 20 (between 76 and 80 weeks since the original offence) and follow a relatively continuous decreasing pattern. The differences between the two crimes is more evi- dent here and it is clear why the first offence may have led to a reoffence at t = 2

- the predicted instantaneous hazard of the first offence is much higher (0.06) at t = 2 than the instantaneous hazard of the second (0.02). The instantaneous hazard decreases much more sharply in the first case, but also starts much higher - this indicates that the first offence would be more likely to lead to an early reoffence than the second.

In order to see what sort of hazard plots can be expected from a censored offence, the plots for the two censored offences are shown in Figure 3.14 below.

Figure 3.14: λ(t) Plots, Censoring at t = 1 (top) and t = 26 (bottom). Please note that the scale of each graph is different.

Like the difference between the cumulative hazard and survival functions, there is a large difference between the instantaneous hazards here. While the t = 26 example is unlikely to ever lead to a reoffence, the t = 1 case may well still lead to one.

Metrics and Evaluation

Predictive Performance As in the ”best case” scenario outlined in the previous section, trees are grown using the Maxstat splitrule.

Table 3.8: Random Survival Forest Evaluation Metrics. Parameters: No. Trees = 100, No. Variables per Split = 13, Alpha = 0.1, Minprop = 0.156

Metric Value

Training Set Error 0.2091 Validation Set (OOB) Error 0.2458 Test Set Error 0.2507

The training and test set errors have, similarly to those produced by the clas- sification and probability forests, decreased slightly relative to those present in the original dataset. As before, this likely reflects the relative improvements in data qual- ity (especially relating to the area variables) between the original and live datasets. From this perspective, it appears that the algorithm is suitable for use on live data and performs relatively well on the live test set as a whole, once again performing better than many comparable survival models.

Variable Importances Now that it has been decided which of the splitrules is most appropriate for this task, permutation importances can be generated for each of the variables to be considered for prediction. For further details as to how these permutation importances are calculated, refer back to Chapter 2 of this thesis.

Table 3.9: Variable Importances. Parameters: No. Trees = 100, No. Variables per Split = 13,

Variables Permutation Importances Outcome 0.0337 NoPreviousArrests 0.0326 PrevFine 0.0217 HaversineDist 0.0190 OffenceCat 0.0155 Fine 0.01277 AgeCommitted 0.0111 MDAClass 0.0082 MultipleOffences 0.0072 Off EmpBenefits Perc 0.0069 Off Sex 0.0069 Off ASB per100 0.0042 Off Income Perc 0.0044 PrevViolence 0.0041 Crime Income Perc 0.0038

Here, the variable importances from the original dataset are somewhat similar to those shown here - once again, Outcome, the NoPreviousArrests, PrevFine and OffenceCat are within the top 5. The Outcome of the offence does, however, appear to be much more important in the live dataset than it was in the original dataset and is now considered to be slightly more important than the number of previous arrests committed by the offender.

Similarly to the Live and Original results from Chapter 2, the Haversine distance becomes more important in the Live dataset compared to the Original, reflecting the improvements in location recording within the database. Other area-related variables, however, are still not considered to be particularly important within the model.

3.5 Conclusions, Limitations and Further Research

In document Machine learning algorithms for crime prevention and predictive policing (Page 114-121)