• No results found

A hidden Markov model for criminal behaviour classification

N/A
N/A
Protected

Academic year: 2021

Share "A hidden Markov model for criminal behaviour classification"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

A hidden Markov model

for criminal behaviour classification

Francesco Bartolucci, Institute of economic sciences,

Urbino University, Italy.

Fulvia Pennoni, Department of Statistics,

(2)

Background

Analysis of criminal behaviour: we want to model offending patterns as well as taking into account the nature of offending and the

sequence of offence type;

criminal histories recorded as official histories: England and Wales

Offenders Index which is a court based record of the criminal histories

of all offenders in England and Wales from 1963 to the current day; general population sample of n = 5, 470 individuals paroled from the cohort of those born in 1953, and followed through to 1993;

offences are combined into J = 10 major categories described in the Offendex Index Codebook (1998);

following Francis et al. (2004) we have define T = 6 time windows or age strips:10-15,16-20, 21-25, 26-30, 31-35.

(3)

Univariate Latent Markov model

Used by Bijleveld and Mooijaart (2003): the offending pattern of a subject within strip age t, t =, . . . , T is represented by Xt a single discrete random variable;

{Xt} depends only on a random process {Ct};

{Ct} follows a first-order homogeneous Markov chain with k states,

initial probabilities πc’s and transition probabilities πc1c2; the joint distribution of {Xt} may be expressed as

p(X1 = x1, . . . , XT = xT) =  c1 φx1|c1πc1  c2 φx2|c2πc1c2 · · ·  cT φxT|cT πcT −1cT , where φ = p(X = x|C = c).

(4)

Multivariate Extension

Xtj is a binary random variable equal to 1 if he/she is convicted for

offence of type j within the strip age t and to 0 otherwise;

we assume local independence i.e. that for t = 1, ..., T , Xtj are conditionally independent given Ct:

φx|c = p(Xt = x|Ct = c) = J  j=1 λxj|cj (1 − λj|c)1−xj, where λj|c = p(Xtj = 1|Ct = c), Xt = (Xt1, · · · , XtJ) and xj denotes the j element of the vector x.

(5)

Restricted version of the model (unidimensional Rasch)

We assume that for each type of offence we have

logitj|c) = αc + βj, (1) where

αc is the tendency to commit crimes of the subject in the latent class c

(i.e. individual characteristic)

βj is the easiness to commit crime of type j;

it allows for an appropriate labelling of the latent classes to order the latent classes

λj|1 <= · · · <= λj|k, j = 1, . . . , J,

such constrain is used to formulate a latent class version of the Rasch (1961) model which is well-known in the Psychometric literature.

(6)

Restricted version of the model (multidimensional Rasch)

The previous model assumes that each type of offence has the same latent trait: this may be too much restrictive;

we consider that the crimes may be partitioned into s homogenous subgroups so that logitj|c) = s  d=1 δjdαcd + βj, (2) where

αcd is the tendency of the subject in the latent class c to commit

crimes in the subgroup d;

δjd is equal to 1 if the crime j is in the subgroup d and to 0 otherwise;

we can classify the offences into groups where crimes belonging to the same group have the same latent trait.

(7)

Likelihood inference

The log-likelihood of the model for an observed cohort of n subjects is

l(θ) =

n



i=1

log[Li(θ)],

where θ is the notation for all the parameters, Li(θ) is the function

p(xi1, . . . , xiT) defined evaluated at θ.

Li(θ) may be computed through the well-known recursions in the

hidden Markov literature (see Levinson et al., 1983, and MacDonald and Zucchini, 1997, Sec. 2.2);

l(θ) is maximized with the EM algorithm which requires the

(8)

The complete data log-likelihood may be expressed as l∗(θ) =  c v·1c log πc +  c1  c2 uc1c2 log πc1c2 +  i  t  c vitc  j

{xitj log λcj + (1 − xitj) log(1 − λcj)},

where vitc is a dummy variable, referred to the i-th subject, which is equal to 1 if Ct = c and to 0 otherwise, v·tc = i vitc and uc1c2 is the number of transitions from the c1-th to the c2-th state.

(9)

EM algorithm

E

: computes the conditional expected value of

l

(θ)

, given the

observed data and the current value of the parameters.

M

: updates the parameter estimates by maximizing the

expected value of

l

(θ)

computed above.

When the model is constrained (unidimensional or

multidimensional Rasch) the parameters

α

cd

and

β

j

are

estimated by fitting a logistic model with a suitable design

matrix

Z

defined according to the model of interest to the

data.

(10)

Choice of the number of classes (k)

The optimal number of latent classes can be chosen with the

likelihood ratio between the model with k states and that with k + 1 states, Dk = −2(ˆlk − ˆlk+1), for increasing values of k;

or using the Bayesian Information Criterion (Kass and Raftery, 1995) defined as

BICk = −2lk + rk log(n)

where rk is the number of parameters in the model with k states.

According to this strategy, the optimal number of states is the one for that BICk is minimum.

(11)

Choice of the number of latent traits

The crimes are clustered using a hierarchical algorithm.

At each step the algorithm aggregates the two cluster of crimes which are the closest in terms of deviance between the model fitted at the previous step and the multidimensional Rasch model fitted after the aggregation of the two clusters.

The steps are iterated until the BIC of the resulting model is lower than the unconstrained model.

(12)

An application

We applied the model to a sample of n = 5, 470 males taken from the dataset illustrated above;

we used the estimated number of live births in the cohort year 1953 as reported by Prime et al. (2001).

For a number of classes between 1 and 7 we obtain

k lk rk BICk 1 −21, 341 10 42, 768 2 −20, 076 23 40, 349 3 −19, 643 38 39, 612 4 −19, 284 55 39, 041 5 −19, 142 74 38, 921 6 −19, 086 95 38, 990 7 −19, 010 118 39, 036

(13)

Choice of the clusters

Using the hierarchical algorithm the best fit (BIC = 35, 433) was for the following cluster aggregations for each of the the 10 typology of crimes and the estimation of β’s .

latent trait

Offence’s category (j) 1 2 3 βj

Violence against the person X −5.824

Sexual offences X −7.787

Burglary X −7.004

Robbery X −10.212

Theft and handling stolen goods X −5.375

Fraud and Forgery X −6.473

Criminal Damage X −5.890

Drug Offences X −6.720

Motoring Offences X −8.170

(14)

Estimated

α’s parameters

Values of the estimated tendencies of the subject for each latent state in every subgroup

c

α

1

α

2

α

3

1

0.000

0.000

0.000

2

−0.134 2.860 −9.513

3

3.315

7.100

6.192

4

3.831

4.445

5.02

5

5.283

6.990

7.439

(15)

Estimate of

π and Π

Initial probabilities πc

π

1

π

2

π

3

π

4

π

5

0.393 0.552 0.054 0.000 0.000

Transition probabilities πcd’s of the Markov Chain are the following

c

1

2

3

4

5

1

0.996 0.000

0.000

0.003

0.000

2

0.364 0.375 0.010 0.226 0.024

3

0.000 0.241 0.288 0.172 0.300

4

0.555 0.012

0.000 0.429 0.005

0.000

0.071

0.014 0.445 0.470

(16)

Advantages of the proposed methodology

We achieve parsimonious description of the dynamic process underlying the data;

the approach is based on general population sample and not on an offender-based sample as in other studies;

it allows to estimate a waste choice of models and to choose the best one going to the simple latent class model to the constrained model with subgroups;

it can provide important information for policy, such as incarceration or incapacitation policy against the offenders.

(17)

Future extensions

Constraint the probabilities λj|c’s to be equal to 0 for a latent class so that this class may be identified as that of non-offensive subjects;

consider also models in which the transition probabilities may vary with age (non homogeneous of the Markov chains);

consider restriced models in which the transition matrix has a particular structure (e.g. triangular, symmetric);

(18)

References

Bijleveld, C. J. H., and Mooijaart, A. (2003). Latent Markov Modelling of Recidivism Data. Statistica Neerlandica, 57, 3, 305-320.

Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Statist. Soc. series B, 39, 1-38.

Feng, Z. and McCulloch, C. E. (1996). Using Bootstrap Likelihood Ratios in Finite Mixture Models.

J. R. Statist. Soc., B, 58, 3 609-617.

Francis, B., Soothill, K. and Fligelstone, R. (2004). Identifying Patterns and Pathways of Offending Behaviour: A New Approach to Typologies of Crime. European Journal of Criminology, 1, 47-87.

Kass R. E. and Raftery A. (1995). Bayes factors. Journal of the American Statistical Association, 90 (430),

773-795.

Lazarsfeld, P. F. and Henry, N. W (1968). Latent Structure Analysis. Boston: Houghton Mifflin.

Levinson S. E., Rabiner, L. R. and Sondhi, M. M. (1983). An introduction to an application of theory of

probabilistic functions of a Markov process to automatic speech recognition. Bell System Thechnical Journal, 62, 1035-74.

Lindsay, B., Clogg, C. and Grego, J. (1991). Semiparametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis.

(19)

McCutcheon, A. L. and Thomas, G. (1995). Patterns of drug use among white institutionalized delinquents in Georgia. Evidence from a latent class analysis. Journal of Drug Education, 25,

61-71.

MacDonald I. and Zucchini W. (1997). Hidden Markov and Other Models for Discrete-valued Time Series. London: Chapman & Hall.

McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models, New York, John and Wiley.

Research development and Statistics Directorate (1998). Offenders Index Codebook, London: Home Office. Available at

http://homeoffice.gov.uk/rds/pdfs/oicodes.pdf.

Prime, J., White, S., Liriano, S. and Patel, K. (2001). Criminal careers of those born between 1953 and 1978. Statistical Bulletin 4/01. London: Home Office.

Rasch, G. (1961). On general laws and the meaning of measurement in psychology, Proceedings of the IV Berkeley Symposium on Mathematical Statistics and Probability, 4, 321-333.

Wiggins, L. M. (1973). Panel Analysis: Latent Probability Models for Attitudes and Behavior Processes. Amsterdam: Elsevier.

References

Related documents

The various families and orders in which Vialaea has been placed are: Amphisphaeriaceae and Hyponectriaceae (Xylariales), Diatrypaceae (Diatrypales), Valsaceae and Vialaeaceae

In this study, the HRMPs scale consists of eight sub dimensions, which are Participation in decision making scale, Performance appraisal scale, empowerment

Home Builders Blitz 2016 is a partnership between Habitat for Humanity affiliates and the local building community to make sure more families.. have a chance to own a simple,

Speaking is one of four language skills which are very important to be mastered by students in order to be good communicator and the ability to speak English

Four patients (2%) underwent surgical revision: 3 femoral causes before 5 years, and 1 acetabular cause at 9 years; 109 implants in 107 patients were analyzable at the

19 Mean shrimp counts inside and outside feeding pits in early summer (ES) and late summer (LS)………...56 20 Mean lengths of ghost shrimp collected by coring at feeding

Faced with rising pressures to develop more environmental and social responsibili- ty, companies are developing new communication approaches in conjunction with attempts to