Forensic Science International

(1)

Different likelihood ratio approaches to evaluate the strength of evidence of

MDMA tablet comparisons

Annabel Bolck

a,

*

, Ce´line Weyermann

b

, Laurence Dujourdy

c

, Pierre Esseiva

b

, Jorrit van den Berg

a a_{Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands}

b

Institut de Police Scientiﬁque, University of Lausanne, Baˆtiment de Chimie, CH–1015, Lausanne-Dorigny, Switzerland c

Institut National de Police Scientiﬁque, Laboratoire de Police Scientiﬁque de Lyon, 31 Avenue Franklin Roosevelt, 69134 Ecully, France

1. Introduction

Illicit production and trafﬁcking of MDMA (3,4-methylenediox-ymethamphetamine) tablets, also calledEcstasy(XTC), remain a signiﬁcant problem in Europe[1,2]. The variation of physical and chemical properties of MDMA tablets from clandestine production is much higher than what is allowed in pharmaceutical tablets and formulations. Due to these variations valuable forensic informa-tion can be obtained to investigate relainforma-tions between drug seizures.

In general, two main steps can be distinguished in the production of MDMA tablets: (1) the synthesis of the active substance[3], and (2) the mixing of this substance with excipients/ cutting agents and subsequent compression into tablets[4]. These tablets have speciﬁc characteristics such as diameter, height, imprint (i.e., logo) and break line (i.e., score) that are well conserved.

The general procedure followed in the examination of tablets starts with the measurement of physical characteristics (i.e., post-tabletting characteristics) of the seized tablets. The shape, logo, colour and presence of a score on the tablets are described, while other characteristics such as diameter, weight and thickness are accurately measured. Then, qualitative and quantitative chemical analyses will be performed to determine the chemical composition (active substance(s), excipients, cutting agents, etc.) and dosage (also called purity) of the synthetic drug (i.e., pre-tabletting characteristics). The utilization of physical and chemical proﬁles in a strategic and operational context has recently become popular

[3–8]. Also MDMA related organic impurities, elemental composi-tion[9]and isotope ratios of the tablets are investigated. These can be used for evidence evaluation purposes as well.

This study will concentrate on the use of post-tabletting characteristics(also called physical characteristics) to investigate whether different seizures originate from the same-batch for court and investigation purposes [4,5]. The characteristics that are assumed the most discriminative, diameter, thickness and weight, are considered. The diameter is a ﬁxed parameter for a given set of tools (punches and dies), while the thickness and weight depend mainly on the composition of the tabletting mass, the quantity of powder per tablet (ﬁlling of cavity) and the set pressure at which

A R T I C L E I N F O

Article history:

Received 28 January 2009 Received in revised form 8 June 2009 Accepted 8 June 2009

Available online 15 July 2009

Keywords: Drug proﬁling MDMA Multivariate distributions Pearson Evidence evaluation Likelihood ratio A B S T R A C T

Two likelihood ratio (LR) approaches are presented to evaluate the strength of evidence of MDMA tablet comparisons. The first one is based on a more ‘traditional’ comparison of MDMA tablets by using distance measures (e.g., Pearson correlation distance or a Euclidean distance). In this approach, LRs are calculated using the distribution of distances between tablets of the same-batch and that of different-batches. The second approach is based on methods used in some other fields of forensic comparison. Here LRs are calculated based on the distribution of values of MDMA tablet characteristics within a specific batch and from all batches. The data used in this paper must be seen as examples to illustrate both methods. In future research the methods can be applied to other and more complex data. In this paper, the methods and their results are discussed, considering their performance in evidence evaluation and several practical aspects. With respect to evidence in favor of the correct hypothesis, the second method proved to be better than the first one. It is shown that the LRs in same-batch comparisons are generally higher compared to the first method and the LRs in different-batch comparisons are generally lower. On the other hand, for operational purposes (where quick information is needed), the first method may be preferred, because it is less time consuming. With this method a model has to be estimated only once in a while, which means that only a few measurements have to be done, while with the second method more measurements are needed because each time a new model has to be estimated.

* Corresponding author at: Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands. Tel.: +31 70 8886121/8886666; fax: +31 70 8886559.

E-mail address:a.bolck@nﬁ.minjus.nl(A. Bolck).

Contents lists available atScienceDirect

Forensic Science International

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / f o r s c i i n t

(2)

the tabletting machine operates. The approach presented here can also be applied to pre-tabletting characteristics, which will be performed in future research.

Currently, comparison of characteristics of MDMA tablets from different seizures (or consignments) is mainly based on visual inspection and distance or similarity measures[3–12]. Actually, the main and easiest comparison method used so far was the Pearson correlation distance[3–6,8]. If this Pearson correlation distance is below a certain threshold, it is assumed that the compared seizures have a common source.

The current forensic community discusses this way of drawing conclusions by forensic experts and suggests an approach based on Bayes rule for evidence purposes[13–20]. In a Bayesian approach forensic experts do not draw any conclusion about whether tablets originate from the same-batch or not. Instead, the ratio of the probability of finding specific (dis)similarities in characteristics when tablets have a common source and the probability of finding these (dis)similarities when tables have different sources is provided. This ratio is called the likelihood ratio (LR) and can be considered as a measure of the strength of evidence that the tablets have a common source versus that they have different sources.

This work assesses two methodologies using LRs to determine the strength of evidence in MDMA tablet comparison cases: (1) a distance method based on the Euclidean distance and Pearson correlation distance between tablets, traditionally used for comparisons[3–9], and (2) a method based on distributions of one characteristic, or sometimes referred to as one feature, (i.e., univariate) or more than one characteristics or features (i.e., multivariate) [14,16,20,21]. An explanation of the Bayesian approach in forensic evidence evaluation is given in Section2.

In Section3, a detailed description of the methods is presented. The two methods are compared in Section 4, based on post-tabletting characteristics, and assessed considering their respective performance in the comparison of MDMA tablets. Ultimately, advantages and disadvantages in evidence evaluation and opera-tional research will be discussed in Section5.

2. Bayesian approach

The Bayesian approach refers to a well-known theorem from probability theory that is named after Thomas Bayes. This theorem of Bayes in odds terms can be written using terminology appropriate for forensic science. This involves two hypotheses (HpandHd), one could be stated by the prosecution (Hp) and the other by the defense (Hd), e.g., in the case of MDMA tablet comparisons:

Hp. The tablets of consignments A and B come from the same-batch.

Hd. The tablets of consignments A and B come from different-batches.

Furthermore, it involves the evidence (E), such as found similarities and dissimilarities in the features (or characteristics). Bayes rule then says that the ratio of the probabilities of the hypotheses given the evidence is equal to the ratio of the probabilities of the evidence given the hypothesis times the ratio of the probabilities before the evidence:

PðHpjEÞ PðHdjEÞ ¼PðEjHpÞ PðEjHdÞ PðHpÞ PðHdÞ (1)

In other words, the posterior odds = LRprior odds. In this formula, it can easily be seen that probabilities about hypotheses given the evidence are not the same as probabilities about the evidence given the hypotheses. To get the posterior odds, that is

the ratio of the probabilities on the hypothesis given the evidence, the LR, that is the ratio of the probability of the evidence if the hypothesis of the prosecution is true and the probability of the evidence if the hypothesis of the deﬁance is true, has to be multiplied by the prior odds. The posterior odds and LR are only equal when the prior equals one. The LR can be estimated by the experts, and is generally accepted as a measure for the strength of the evidence [13,14,17–19]. To determine the probability of the hypotheses after the evidence is known, information is needed about the LR and the probability of the hypotheses before the evidence is known. These last probabilities depend on circum-stances and other evidence.

In case of comparing tablets, the evidence can best be divided in two parts. The values of certain characteristics, or the feature vector, (X) of the tablets of consignment A and the values of these characteristics (Y) of the tablets of consignment B. The LR can be calculated by PðX;YjHpÞ PðX;YjHdÞ ¼PðYjX;HpÞ PðYjX;HdÞ PðXjHpÞ PðXjHdÞ (2) The last term equals one, because the probabilities of certain characteristics of one batch are independent of whether A and B come from the same-batch. The denominator of the term before the last term can be reduced toP(YjHd) because the probability of certain values of characteristics of consignment B is independent of the values of these characteristics of consignment A if A and B come from different-batches.

2.1. Simple example

Let us assume that XTC tablets are always of the same shape when they originate from the same tabletting batch. This is a reasonable assumption, because clandestine tabletting machines in practice were found using one set of similar punches. Furthermore in this simple example it is assumed that there are only three shapes (which is less reasonable, but this is assumed here to keep the example simple), of which some are more rare than others:

P(round) = 3/6 P(triangle) = 2/6 P(oval) = 1/6

X: tablets A are triangle. Y: tablets B are triangle.

The strength of evidence, in the form of a LR, for consignments A and B to be both triangle then is:

LR¼PðX;YjHpÞ PðX;YjHdÞ ¼PðXjHpÞ PðXjHdÞ PðYjX;HpÞ PðYj;HdÞ ¼1 1 1=3¼3 (3)

Tablet shape is a discrete characteristic, because experts consider a ﬁnite number of shapes (here, only three). A LR of such a characteristic can easily be determined by using (estimates of) population frequencies (or rarities) of the various shapes. However, most characteristics, such as weight and purity, in comparing MDMA tablets are continuous. Continuous variables will be considered separately (univariate) in this paper and in combination (multivariate). Though the examples will concentrate on four characteristics, the Bayesian theory can be extended to many more characteristics, including organic impurities. 3. Methods

3.1. Variable selection

The data used in this work were collected during the CHAMP project (Collaborative Harmonisation of Methods for Proﬁling of Amphetamine Type

(3)

Stimulants) funded by the sixth framework programme of the European Commission (contract no. 502126). Physical characteristics were previously tested and selected based on their reproducibility within sample batches and discriminat-ing power between batches[4]: diameter, thickness and weight. Additionally purity (MDMA concentration), a reliable chemical characteristic routinely analysed in most laboratories, was selected to calculate likelihood ratios of a combination of statistically independent characteristics. The purity can be considered independent because it is conferred to Ecstasy tablets during a different production stage than the physical characteristics. Statistically, no signiﬁcant correlation of purity with the other three characteristics was found. Moreover purity is generally measured from the mixed powder of a number of crushed tablets, while physical characteristics are measured on single tablets.

3.2. Sampling

Two types of MDMA samples (considered here as batches) were collected and analysed for the purpose of developing and validating harmonised profiling methods (to be used by the seven participating laboratories)[4]. The first set of samples consisted of 26 seizures originating from Finland and Germany. From each of the 26 seizures, the seven partner laboratories received six tablets and 10 g of homogenised powder from ground tablets. Replicate measurements for diameter, thickness and weight (with a balance) were carried out on each tablet, while the purity was quantified on the homogenised powder six times (either with HPLC, GC/ MS or CE). These replicate analyses performed on each sample (i.e., six measurements by seven laboratories each =42 measurements per sample for each physical characteristic and purity) constituted the population of samples with correspondingpost-tabletting characteristics(i.e., same-batch).

A second set of samples, about 80 street samples, were collected by each laboratory among seizures in their own country between 1996 and 2004 and constituted the population of samples with supposedly different characteristics (i.e., different-batch). For this study, ﬁve replicates were performed on samples collected by Finland, The Netherlands, France and Switzerland (320 samples in total) for diameter, thickness and weight, while three analyses on homogenised powder were carried out for the determination of the purity.

In this paper, for statistical purposes, the data was divided in three groups: i. Reference data Xrepresenting measurements from a reference consignment

(denoted by consignment A), that may be stored in a database, with which other consignments of MDMA tablets will be compared. All 26 batches (with each 42 measurements) each can be considered as reference dataX.

ii. Case data Yfrom another consignment (B) than the reference data (A). The question is whether A and B originate from the same-batch. This dataYwill be compared toX(the reference data) andZ(the population data; see below). For research purposes there are two types ofYsamples: simulated data from the

reference batch it is compared with (to create same-batch samples) and randomly chosen half of the street samples (to create different-batch samples). iii.Population(or background)data Zrepresenting the population of MDMA tablets over the various street samples batches (seeFig. 1).Zis composed of the other half of the street samples data.

3.3. Statistical methods

In this paper two statistical methods to estimate the strength of evidence with a likelihood ratio are compared: (1) the distance method, based on distance measures or similarity indices between tablets (where the distances or similarities are measured based on corresponding characteristics), traditionally used in drug investigations; (2) a method based on the distributions of features or characteristics themselves. The numerical data was analysed statistically with two software packages: Microsoft Excel (Microsoft Corporation) and R ( http://www.r-projec-t.org/).

3.3.1. Distance method

The first method is based on distance or similarity measures. For short we will refer to this method as ‘the distance method’. A measure generally used in drug profiling is the Pearson correlation coefficient that can be calculated between tablet measurements based on corresponding characteristics[3–5]. This coefficient can be modified to fit a distance scale from 0 (maximum correlation distance) to 100 (minimum correlation distance):

r¼1r

2 100 (4)

This modiﬁed Pearson correlation coefﬁcient (r), or Pearson correlation distance as well as the Euclidean distance, that is the root of the sum of squared differences between the characteristic values of two tablets, is applied in this paper to the three selected physical characteristics: diameter (xd), thickness (xth) and weight (xw).

Moreover, a one-dimensional Euclidean distance, that is the root of the squared difference between the purity of tabletsA(xp) and tabletsB(yp), is used to compare

purities of consignments. This is adequate for calculating a distance between two single (univariate) measurements only[22]. The distances between the purities have to be calculated separately from the distances between the physical characteristics, because the purities are measured based on a mixture of crushed tablets and not based on the same tablets as the measurements of the physical characteristics.

3.3.1.1. Traditional distance method (TDM).Distances (between samples, not characteristics) are calculated for same-batch samplesX(over all 26 batches) representing the within-batch distribution of distances, and different-batch samplesZ,representing the between-batch distribution of distances[3–8,10–12].

(4)

A histogram can be used to visualize the distributions of the within-batch and between-batch distances[4,22]. To smooth the distributions represented by the histograms, several methods can be used including Kernel density estimation (KDE) and ﬁtting a parametric model such as the beta distribution

[23,24].

Here KDE with a Gaussian kernel for each data point (that is here the distance measurement) is used. Parametric ﬁtting with a Beta distribution was also investigated[25], but the results were not satisfactory in comparison to KDE.

The likelihood ratio, based on the three physical characteristics, is then obtained, not by the ratio of the frequency of the characteristics itself, but rather by the density of the distance (d) between two measurementsxandywhen one of two competing hypotheses is true.

LR¼ fðdðx;yÞjHpÞ

fðdðx;yÞjHdÞ

¼ fðdðfxd;xth;xwg;fyd;yth;ywgÞjHpÞ

fðdðfxd;xth;xwg;fyd;yth;ywgÞjHdÞ

(5)

withfthe Kernel density function.

3.3.1.2. Reﬁned distance method (RDM).The TDM method does not take into account the fact that the distribution of distances between tablets of the same-batch may vary from same-batch to same-batch. Only the distances between (values of the same characteristics of compared) tablets of same-batch comparisons in general are considered. It was therefore decided to adopt a second method to evaluate that effect. This reﬁned distance method (RDM) takes into account the distance particularities of each batch (i.e., frequency correlation distances within a batch). For each batch a within-batch distribution based on KDE is made. Therefore, there are as many models as analysed batches. The distribution of the between-batch distances is the same as in the traditional distance method (TDM).

3.3.2. Distribution method

Within the distribution method LRs are based on the distribution of the values of the characteristics of the tablets, rather than the distribution of distances between values of corresponding characteristics of two tablets. This is similar to what is done in the simple example of Section2, where the shapes of tablets of two consignments were compared. Shape, however, is a discrete characteristic, while diameter, thickness, weight and purity are continuous. Therefore probability density functionsf(), instead of probabilitiesP() are used.

LR¼fðyjx;HpÞ fðyjHdÞ

(6)

The numerator in Eq.(6)is the density of measured valueyof a characteristic of tablets of consignment B given the distribution of value x of that same characteristic of consignment A, while the denominator contains the density of measured values y given the distribution of the characteristic within the population of MDMA tablets.

The repeated measurements within each batch have a distribution representing not only frequency of occurrence (like in the shape example), but also measurement error (or noise). In general, a normal distribution, with meanuand variances2_{, is}

suitable for this variation. Furthermore, it is assumed that the meansuthemselves have a distribution. In the simplest case we assume that they also follow a normal distribution with meanmand variancet2

. In a more realistic case a KDE with Gaussian kernel is assumed. This structure, where the means of the distribution within a batch that are assumed to have a distribution themselves can best be described by a two-level random effect model[12,14–16,20]. In this model the ﬁrst level is the level of the batches, and the second level are the repeated measurements on various tablets within each batch.

The general form for the LR using the two-level random effect model[14–16]

now is:

LR¼fðyjx;HpÞ fðyjHdÞ ¼

R

fðyju;HpÞfðujx;HpÞdu

R

fðyju;HdÞfðujHdÞdu

; (7)

where the numerator can be seen as the posterior predictive distribution of a new valuey. The denominator can be seen as the prior predictive distribution.

Above formulas are (mathematically) worked out in the literature for speciﬁc situations[14–16]. Below this is done in a slightly different way, which provides formulas of a different (and new) appearance that are however mathematically equivalent to the ones presented in the literature (more details can be found in the paper presenting the theoretical background to this[20]).

3.3.2.1. Univariate distribution method.In the simplest case there is only one characteristic (e.g., weight) and this characteristic is assumed to come from a normal distribution. Thus, weight measurementsXon tablets of consignment A will vary, due to random error, according to a normal distribution around a meanuxand

with variances2

x. Weight measurementsYon tablets of consignment B will vary

according to a normal distribution around a meanuyand with variances2y. In

symbols

XiNð

u

x;

s

x2Þ

YiNð

u

y;

s

y2Þ:

Furthermore, the batch variances are assumed known, while the batch means are assumed to have a normal distribution. This is regarded as the prior distribution of batch means.

pð

u

iÞ Nð

m

0;

t

02Þ

After updating this prior distribution of the batch means in a Bayesian way with the distribution of the mean (¯x) of measurementsxof consignment A, according to standard Bayesian theory a posterior distribution[26]is obtained:

pð

u

j¯xÞ Nð

m

n;

t

2nÞ

with

m

n¼ ðð1=t20Þ

m

0þ ðnx=s2Þ¯xÞ=ð1=t20þnx=s2Þ and

t

2

n¼

t

20

s

2=

ð

s

2_þ_n

x

t

20Þ

Using these prior and posterior distributions in the prior and posterior predictive distributions of Eq.(7), this results in the following representation of the likelihood ratio for the mean (¯y) of measurementsyof consignment B (given the mean (¯x) of measurementsxof consignment A)[20]: LR¼u0 unexp 1 2 ð¯y

m

0Þ2 u2 0 ð¯y

m

nÞ 2 u2 n ! ( ) (8) withu2 0¼

t

02þ

s

y2=nyandu2n¼

t

n2þ

s

y2=ny

Without going into detail it can be said that the ﬁrst term between the brackets represents rarity and the second term represents random error. This means that the rarer the values of certain characteristics are the higher the LR will be. The more noise there is in the data the lower the LR will be.

Fitting a normal distribution on a characteristic such as weight within one batch may often be a reasonable assumption (Fig. 1). In most cases it represents only random error. A normal distribution for the batch meanuis less reasonable, in which case a KDE, is selected as a better option in this paper. In the same way it can be discussed whether assumed known batch variances are always reasonable. For reasons of simplicity in this paper variances that are assumed known but may vary from batch to batch are used. In future research priors on variances will be used[27]. Detailed information about LR formulas with kernel density estimates and other extensions can be found in related work [20]. Like Eq. (8) the construction of these LR formulas slightly deviates from previous work in the ﬁeld[14–17], though it results in the same calculated values for the LRs.

3.3.2.2. Multivariate distribution method (MLR).When more than one characteristic is considered at the same time, multivariate instead of univariate distributions are used. The LR is constructed in the same way as above.

LR¼ fðyjx;HpÞ

fðyjHdÞ

¼ fðfyd;yth;. . .ywgjfxd;xth;. . .xwg;HpÞ

fðfyd;yth;. . .ywgjHdÞ

(9)

The LR calculation using multivariate normal distributions for the measurements within a batch (XNðux;SxÞ; YNðuy;SyÞ) and multivariate normal distribution

for the batch means (uijSiNðm0;T0Þ), using standard Bayesian theory (in contrast

to the construction in the literature), then is[20]:

LR¼jU0j 1=2 jUnj1=2exp½1=2ðð¯y

m

0Þ 0 U1 0 ð¯y

m

0Þð¯y

m

nÞ 0 U1 n ð¯y

m

nÞÞ (10) withU0¼T0þny1

S

yenUn¼Tnþny1

S

yand

m

n¼T0ðT0þnx1

S

xÞ 1 ¯xþn1 x

S

xðT0þnx1

S

xÞ 1

m

0 Tn¼T0T0ðT0þnx1

S

xÞ 1 T0

Again, normality within a batch is a reasonable assumption, but the use of KDE for the batch means is more appropriate, resulting in adapted LRs [20], see

Appendix A.

The univariate distribution method (a) can be seen as a special case of the multivariate distribution method (b), with only one characteristic instead of more than one. Therefore both situations will be referred to as MLR.

3.3.3. Combination of independent variables

The purity is assumed to be statistically independent of the physical characteristics and therefore the LR of the four characteristics (diameter, thickness,

(5)

weight and purity together) equals the LR of the three physical characteristics times the LR of the purity:

LRð4Þ ¼LRð3Þ LRðpurityÞ (11)

4. Results

The 26 batches (X) described in the Section3were used to build models. For comparison of the LRs calculation methods, a ﬁner selection was made out of the 26 batches. The batches that showed within the batch the largest variation among each others were originally selected using descriptive statistical techniques and principal component analysis (PCA) (i.e., batches 1, 2, 3, 5, 7, 9, 12, 23, 24 and 25). These 10 batches were found to represent adequately the variation in the remaining batches. The different methods were compared by calculating LR values for samples coming from the same-batch (intra-batch comparisons) and LR values for samples supposed to come from different-batches (inter-batch comparisons) for the 10 selected batches as reference.

4.1. One characteristic (univariate)

The strength of evidence of a single characteristic is not very high, resulting in LR values (based on one characteristic and not a distance), varying generally from 0 to 100 at most. This means that the similar values of a characteristic in two consignments are at the most a 100 times more likely when they are from the same-batch than when they are from different-batches. The LR values for weight and purity were generally lower than for thickness and diameter. Diameter is stable and thickness is a setting, while weight and purity depend on composition and mixing and are therefore likely to show more variation, resulting in lower LRs. However, the results differ from batch to batch and depend on the rarity of the characteristics in a given batch. As an example the distribution of LRs of 144 different-batch comparisons and 159 same-batch comparisons for the weight of tablets compared to the 10 selected batches are presented inFig. 2.

4.2. Three characteristics (weight, thickness, diameter)

The LR values obtained using the TDM, the RDM (both based on the Pearson correlation distance) and the multivariate likelihood approach (MLR) for the comparison of MDMA tablets based on three characteristics were compared (Fig. 3).

LR values varied from 0 to 60 for the traditional distance method (TDM) using the Pearson correlation distance, while the LR values for the reﬁned distance method (RDM) on the same distance measure were generally higher and reached 120 for batch 24. The Euclidean distance was also tested, and the results were comparable to those based on the Pearson correlation: LRs based on Euclidean distances varied from 0 to 40. MLR values varied between ca. 150 and 20000 for intra-batch comparison, while inter-batch comparison yielded much lower values.

Same-batch comparisons for batch 1, 2, 3, 9 and 12 yielded identical boxplots, and all batches yielded equal maximum LR with TDM (Fig. 3– TDM. Values of Pearson’s correlation distances for all these batches are close (mean values vary from 1.38105 _to 1.56105_{). Furthermore, the maximum LR that can be obtained} is the maximum density of the distribution of same-batch comparisons divided by the density of different-batch comparisons at that same distance value as the maximum density of same-batch comparisons is found. This maximum LR with TDM is the same for all comparisons (and equals here 60) because the same distribu-tions (the same model) are used for all comparisons with TDM. For the RDM and MLR method each time a new model has to be constructed. Therefore this phenomenon of equal maximum LR values for various comparisons does not occur there.

The median MLR (Fig. 3, below) was about 3000 (ca. e8_{), while} batch 7 even had a median MLR around 400,000 (ca. e13). Probably batch 7 has a rare combination of characteristics compared to the characteristics of the reference batch. The LR values obtained were the highest for the multivariate method, generally more than 1000 times higher than the traditional method used routinely in drug proﬁling. LRs for different-batch comparisons are generally lower for the MLR method than the distance method. LR values calculated

Fig. 2.Boxplots representing the LRs calculated for comparisons with 10 selected batches labeled b1, b2, b3, b5, b7, b9, b12, b23, b24 and b25 are shown. The LRs are calculated

based on the weight of MDMA tablets. Normality within the batches and of the batch mean is assumed. On the left each time the same 144 cases (Y)that are assumed to originate from different-batches are compared with the mean of 42 measurements (X) of each of the 10 batches. On the right each time the same 159 case (Y) that are known to originate from the same-batch are compared with the mean of 42 measurements (X) of each of the 10 batches. The horizontal bars in the boxplots represent the median, the grey box all values between the 25% and 75% percentile, and the whiskers at 1.5 times the interquartile range. Outliers are represented by circles.

(6)

Fig. 3.Boxplots representing the LRs calculated for comparisons with 10 selected batches labeled b1, b2, b3, b5, b7, b9, b12, b23, b24 and b25 based on three physical characteristics: diameter, thickness and weight. In the upper row the traditional distance method (TDM) based on the Pearson correlation distance is used, in the middle row the reﬁned (RDM) distance method, based on the Pearson correlation distance, and below the multivariate distribution method with normality assumed (MLR – log scale) is used. LRs for different-batch comparisons (left side) and LRs for same-batch comparison (right side) are shown. The horizontal bars in the boxplots represent the median, the grey box all values between the 25% and 75% percentile, and the whiskers at 1.5 times the interquartile range. Outliers are represented by circles.

(7)

with the MLR method using KDE for the batch means are not equal but within the same range as the results from this method using normal distributions.

False positive rate (different-batch comparisons with a LR>1) and false negative rate (same-batch comparisons with a LR<1) were calculated to evaluate the potential of the methods in discriminating same-batch comparisons from different-batch comparisons (Table 1). A LR>1 is evidence in favor of the hypothesis that consignments come from the same source, while a LR<1 is evidence in favor of the hypothesis that the consignments come from different sources. In some cases, the mean false positive rate is relatively high for TDM and the mean false negative rate is relatively high for RDM. This can partly be explained by the fact that several Ecstasy tablets from different-batches gave the same Pearson correlation (using three characteristics). LRs based on Euclidean distance gave again comparable results as the results based on the Pearson correlation distance. The MLR false positive rate was low and there were hardly any false negatives. MLR has a higher discriminating power than TDM and to a lesser extent than RDM.

The robustness of TDM and MLR was determined by calculating the LR values as a function of the number of replicates in a reference batch. The variation within the LR values gives an indication of the minimum number of replicates needed to obtain a more or less constant result. In practice, only ﬁve replicates (or less) are generally carried out, while 42 replicates were used in this study. For TDM, the LR values did not increase much with the number of replicate when doing same-batch comparison. However the LR values stabilized after about 10 replicates. The MLR values increased with the number of replicates, and also stabilized after about 10 replicates. Overall they are much larger than TDM values (Fig. 4).

4.3. Four characteristics (weight, thickness, diameter and purity) The above models can be extended with an additional characteristic, independent of the three physical characteristics. The combined LR can be calculated by multiplying the above calculated LR with the LR based on the independent characteristic. Purity is assumed to be such an independent characteristic since it is a pre-tabletting characteristic, determined by the chemical composition of the tabletting mass and not by the tabletting process. The LR based on the purity can therefore be multiplied by LR based on the three characteristics (Table 2).

The TDM or RDM LR, based on the three physical characteristics, is multiplied with the LR value based on purity and calculated using the Euclidean Distance. The MLR, based on the three physical characteristics, is multiplied with the LR value based on purity and calculated using (univariate) distribution method. The obtained combined LR values for same-batch comparisons were generally comparable or larger than LR values obtained using three Table 1

False positive and negative rates obtained by three comparison methods are provided: the traditional distance method (TDM), the reﬁned distance method (RDM) and the multivariate likelihood ratio method (MLR). False positive rate gives the percentage of all different-batch comparisons that have a LR that point in the direction of ‘‘come from the same-batch’’ (LR>1). False negative rate gives the percentage of all same-batch comparisons that have a LR that point in the direction of ‘‘come from a different source’’ (LR<1). The use of Euclidean distance instead of Pearson gave comparable results.

Batch # False positive rate (%), LR>1 when sample are different

False negative rate (%), LR<1 when samples are the same TDM RDM MLR TDM RDM MLR b1 32.7 1.0 3.5 0.4 28.2 0.0 b2 8.7 0.9 4.1 0.2 7.2 0.0 b3 25.2 1.6 7.6 0.6 21.9 0.0 b5 29.2 0.8 2.1 0.1 21.9 0.0 b7 0.2 5.2 1.4 7.0 0.6 0.0 b9 14.9 2.0 3.4 0.5 12.8 0.0 b11 9.7 0.3 5.6 4.2 17.5 0.0 b12 9.7 5.7 1.4 0.3 13.3 0.0 b23 10.0 0.9 1.4 14.3 21.0 0.0 b24 33.4 5.1 0.0 0.0 26.5 0.6 Table 2

Comparison of the median LR values for same-batch comparisons with three physical characteristics (diameter, thickness and weight) and an additional fourth characteristic (purity) using the traditional distance method (TDM), the reﬁned distance method (RDM) and the multivariate likelihood ratio method (MLR). The median for different-batches comparison was close to zero for all methods. Batch # Three characteristics Four characteristics

TDM RDM MLR TDM RDM MLR b1 54 65 1412 132 286 12,314 b2 54 68 3742 168 309 53,879 b3 54 65 610 186 317 7486 b5 54 91 4234 169 470 30,933 b7 29 28 434,305 105 137 4,223,969 b9 54 67 1923 130 276 10,750 b12 36 29 36,711 135 154 1,654,388 b23 54 54 5820 230 274 15,103 b24 16 15 5200 9 23 919,420 b25 54 122 679 335 546 26,684

Fig. 4.LR values as a function of the number of replicates for MDMA tablets

comparison using diameter, thickness and weight with the traditional distance method (above) and the multivariate distance method (below) for same-batch comparison (batch 9).

(8)

characteristics. In general, LRs were still under 1000 for TDM and RDM, while MLR values were much larger (Table 2– right). An extreme difference is observed for batch 24, with very low values for the traditional methods and a very high value for the MLR method.

4.4. Case examples

In this section three cases are presented to illustrate the use of the above methods in various situations (Fig. 5;Table 3)

A–A: Same-batch comparison of one consignment (20 tablets from consignment A, divided in two groups of 10 tablets). A–B: Same-batch comparison of two consignments, at ﬁrst glance comparable (10 tablets of consignment A with 10 tablets of consignment B).

A–C: Different-batch comparison of two distinct consignments, with different appearance (10 tablets of consignment A with 10 tablets of consignment C).

The LR values (seeTable 4) obtained using the MLR method are much higher than those obtained using the TDM method for same-batch comparisons. The LR values obtained using the TDM method for comparison A–A and comparison A–B are of the same order of magnitude, while the MLR method resulted in lower LRs for comparison A–B compared to A–A. TDM and RDM methods suggest that the calculated distances are 30–40 times more likely if the consignments are from the same-batch than if they are from different-batches. The MLR method suggests that the values of the

characteristics for A–A are 20,000 times more likely if the consignments are from the same-batch than if they are from different-batches, and for A–B 500 times more likely. These cases are just examples and the LR values may vary much when using different-batches for comparison.

5. Discussion

Several criteria were used to evaluate the different comparison methods: the LRs values obtained (boxplots), their discriminating power (false positive and negative rates), their robustness (inﬂuence of the number of replicate analyses), their simplicity (number of models or data needed), and the usability for operational or court purposes.

In order to estimate the strength of evidence the discriminating power in terms of the number of false positives and negatives, needs to be as small as possible. Furthermore, the stronger the evidence, the larger the LR should be. The MLR method resulted in the highest LRs for same-batch comparisons and lowest LRs for different-batch comparisons and also showed the smallest number of false positives and false negatives.

Two main reasons may explain this. The ﬁrst one is the fact that MLR uses the whole multivariate structure of the physical characteristics to calculate LRs, while distance methods reduce this structure to a univariate measure, the distance, before calculating LRs. With this reduction relevant information is lost. The second reason is that the MLR method includes the aspect of rarity of a set of characteristics in calculating the strength of evidence, while TDM considers the distance measurements only. Indeed two tablets with a similar but rare combination of characteristics can have the same Pearson correlation coefﬁcient like two tablets with a similar but common combination of characteristics. This means that two pairs of tablets that are evenly close will produce the same LR with TDM, while LRs calculated with MLR depend on the rarity. The rarer the characteristics that two tablets share the stronger the evidence, and thus the higher the LR.

For batch 24 (seeTable 2), a large difference in LRs is observed between TDM/RDM and MLR, with very low values for the traditional methods and a very high value for the MLR method. Apparently, for batch 24 the distances for same-batch comparisons are almost as common as for different-batch comparisons while the combination of characteristics of batch 24 tablets are relatively rare compared to the reference batch. The mean diameter of 9.1 mm, the mean thickness of 4.2 mm (to a lesser extent), the mean weight of 345 mg and especially the mean purity of 9% are indeed rare in the reference batch (Z)considered (see alsoFig. 1). When applied to case samples (see Section4.4comparison A–A and A–B) high LRs are found with MLR because of a diameter of 7.6 mm which is a rare value. The difference between the A–A comparison and A–B comparison may perhaps be explained by the fact that A and B originate from the same tabletting facility, but not the same production batch (as is the case with an A–A comparison). The RDM method also takes into account rarity by building a model for each comparison. The rarity considered is the rarity of the distance and not the rarity of the values of the characteristics themselves. Therefore the RDM method has the disadvantage of creating a new model each time without the advantage of accounting for the rarity of values of characteristics.

A point of consideration when taking into account the rarity of the characteristics is the dependence on the background data used to represent the population of MDMA tablets. The tablets in the background database are assumed to be representative for the entire population of MDMA present on the market. However, the background data may not be randomly chosen among the population of ecstasy tablets on the illicit market. It actually

Fig. 5.Photography of MDMA tablets (face, back, side): consignments A, B and C.

Table 3

Physical characteristics of three selected consignments.

Consignment # Diameter (mm) Thickness (mm) Weight (mg)

A 7.6 5.0 280

B 7.6 5.0 278

C 9.0 3.0 243

Table 4

LR values obtained for batch comparisons with traditional distance (TDM) and multivariate likelihood ratio (MLR; normality and unequal variances assumed) methods using three characteristics (diameter, thickness and weight).

Compared batches TDM MLR

A–A 38 20,978

A–B 35 516

(9)

represents the part of the tablet population that has been sent in for analysis. If a big network is targeted and all data are used, some characteristics may appear frequently in the database, while if only one random seizure is made of tablets that are in large quantities on the market, the characteristics may then appear as rare in the database. The frequency of occurrence of a speciﬁc tablet in the background data depends on the selection of consignments used for the construction of the background database. Therefore care should be taken in selecting the background data. Random samples of 70% of the background data used in this paper did not alter the results substantially, suggesting some robustness of the back-ground data used here.

For all methods above when using 10 replicate analyses not much ﬂuctuation in the LR was observed; all methods were shown to be comparatively robust.

With respect to simplicity, the number of models or data needed, the TDM method proved to be advantageous over MLR. It needs only one model for a given period of time, depending on changing tablet population. After the construction of this model, a relatively low number of repeated measurements of tablets from the consignments to be compared (either from a case or already existing database) are needed to determine the distances. This relatively simple method is fast and therefore suitable for operational purposes when forensic intelligence is needed quickly. When using the (multivariate) distribution method for each comparison a new model has to be constructed. This in general requires a larger number of repeated measurements. It is not the construction of the model itself, but the large amount of repeated measurements that is time consuming. However, for court purposes, the MLR method, yielding higher LR and less false positive and negative results, represents the better alternative, given sound research is carried out on reference populations.

As shown physical characteristics can be highly discriminat-ing when compardiscriminat-ing tablets, though still a considerable percentage of tablets will falsely show evidence in favor of coming from the same-batch rather than coming from different-batches (TDM 0–34%, MLR 0–14%), and the other way around. When investigating tabletting facilities in the Netherlands, it has been observed that tablets are produced with different imprints and colour but with comparable characteristics like diameter, weight, purity and thickness. Based on the latter characteristics tablets from different-batches from these tabletting facilities could not be discriminated. Discriminating power of both methods can be improved, by extension with chemical char-acteristics such as impurity proﬁles, elemental compositions and/or isotope ratios, i.e., different aspects of clandestine ecstasy tablet production. These are topics for future research. The physical characteristics used here only were chosen as a simple example of how LR calculations can be used for tablet comparisons.

6. Conclusion

To our knowledge, this publication is the ﬁrst one on the calculation of numerical LRs for MDMA tablet comparisons. In this paper, two types of methodologies are presented, one based on distances and the other one based on (multivariate) statistical distributions of characteristics, to calculate LRs for evidence evaluation in MDMA.

The traditional distance method (TDM) is considered to be a rather simple method. It requires a relatively low number of repeated measurements of tablets from the consignments to be compared (either from a case or already existing database) and is therefore suitable for operational purposes. When using the (multivariate) distribution method, a new model has to be constructed for each comparison. This in general requires a

larger number of repeated measurements, which is time consuming.

The (multivariate) distribution method (MLR) proved to be more discriminating between same-batch and different-batch comparisons than the distance methods and yielded very high LR values for same-batch comparisons and very low LR values for different-batch comparisons. Therefore it is very suitable for court cases. This method uses the whole multivariate structure and incorporates the rarity of the characteristics. It should be realized that the rarity of the characteristics is highly dependent on the background data used to represent the population of MDMA tablets.

With the reﬁned distance method (RDM) a model is also needed for each reference batch. This is as time consuming as the multivariate distribution method. In addition the discriminating power is in the same order of magnitude as the traditional distance method. It is therefore not advantageous over the TDM (i.e., comparable LR values), while adding the disadvantages of the MLR (i.e., time consuming).

In summary, this paper presents two methodologies to calculate LRs as measures for the strength of evidence when comparing MDMA tablets with their advantages and disadvan-tages. Physical characteristics of the tablets are used as an example, but in future work this can be extended to chemical characteristics, which are expected to give better discriminating results.

Acknowledgments

The authors wish to thank the CHAMP project funded by the 6th framework of the European Commission (contract no. 502126), and the Federal Office of Education and Science in Switzerland (contract no. 04.0005). The data used in this project was collected by the CHAMP partner laboratories: Finnish National Bureau of Investigation (FI), Netherlands Forensic Institute (NL), Laboratoire de Police Scientifique de Lyon (FR), Institut de Police Scientifique de l’Universite´ de Lausanne (CH), Police of Czech Republic (CZ), Bundeskriminalamt (DE), Drug Enforcement Administration (US).

Appendix A

A Corresponding formula to that of Eq.(8), the univariate case (with only one characteristic considered), with Kernel densities instead of normal distributions in the denominator is

LR¼mð

s

2 y=nyþh2Þ 1=2 ð

s

2 y=nyþ

t

2_hÞ1=2 Pm

i¼1expðai=2Þexpð1=2½ð¯y

m

hiÞ2=ð

s

2y=nyþ

t

2_hÞÞ Pm i¼1expðai=2ÞPmi¼1expðbi=2Þ with

m

hi¼ ðð1=h 2 Þ¯ziþ ðnx=s2xÞ¯xÞ=ðð1=h 2 Þ þ ðnx=s2xÞÞ,

t

2h¼h 2

s

x2=

s

2 xþnxh2, ai¼ ð¯x¯ziÞ2=ð

s

2x=nxþh2Þ, and bi¼ ð¯y¯ziÞ2= ð

s

2 y=nyþh2Þ:

Here we will use a Gaussian kernel. This results in the following kernel density for the batch means of thembatches.

ˆfð

u

Þ ¼1 m Xm i¼1 1 ð2pÞ1=2hexpf 1 2h2ð

u

¯ziÞ 2 g;

with as data points the sample batch means of each of the m batches, and the optimal bandwidth: hopt¼ ð4=ðpþ2ÞmÞ1=5

t

0¼

1:06m1=5

_t

0[23].

The corresponding formula (to Eq.(10)) in the multivariate case (thus, with more than one characteristic) is

(10)

References

[1] European Union EU Drugs Action Plan 2005–2008, Ofﬁcial Journal, C168, 01 (2005). http://europa.eu/legislation_summaries/justice_freedom_security/ combating_drugs/c22568_en.htm.

[2] UNODC, World Drug Report 2007.www.unodc.org/unodc/en/data-and-analysis/ WDR-2007.html.

[3] C. Weyermann, R. Marquis, C. Delaporte, P. Esseiva, L. Dujourdy, E. Lock, L. Aalberg, S. Dieckmann, F. Zrcek, J. Bosenko, Drug intelligence based on MDMA tablets data: (1) organic impurities proﬁling, Forensic Science International 177 (1) (2008) 11–16. [4] R. Marquis, C. Weyermann, C. Delaporte, P. Esseiva, L. Dujourdy, C. Koper, L. Aalberg, R. Dahlenburg, F. Zrcek, J. Bosenko, Drug intelligence based on MDMA tablets data: (2) physical characteristics proﬁling, Forensic Science International 178 (1) (2008) 34–39.

[5] Q. Milliet, C. Weyermann, P. Esseiva, The proﬁling of MDMA tablets: A study of the combination of physical characteristics and organic impurities as sources of information, Forensic Science International 187 (1–3) (2009) 58–65.

[6] E. Lock, Development of a Harmonised Method for the Proﬁling of Amphetamine. PhD Thesis, Institut de Police Scientiﬁque, University of Lausanne, Switzerland, 2005.

[7] P. Esseiva, S. Ioset, F. Anglada, L. Gaste, O. Ribaux, P. Margot, A. Gallusser, A. Biedermann, Y. Specht, E. Ottinger, Forensic drug intelligence: an important tool in law enforcement, Forensic Science International 167 (2–3) (2007) 247–254. [8] P. Esseiva, L. Dujourdy, F. Anglada, F. Taroni, P. Margot, A methodology for illicit

heroin seizures comparison in a drug intelligence perspective using large data-bases, Forensic Science International 132 (2003) 139–152.

[9] C. Koper, C. van den Boom, W. Wiarda, M. Schrader, P. de Joode, G. van der Peijl, A. Bolck, Elemental analysis of 3,4-methylenedioxymethamphetamine (MDMA): a tool to determine the production method and trace links, Forensic Science International 171 (2–3) (2006) 171–179.

[10] S. Lociciro, P. Esseiva, P. Hayoz, L. Dujourdy, F. Besacier, P. Margot, Cocaine proﬁling for strategic intelligence, a cross-border project between France and Switzerland: part II. Validation of the statistical methodology for the proﬁling of cocaine, Forensic Science International 177 (2–3) (2008) 199–206.

[11] V. Dufey, L. Dujourdy, F. Besacier, H. Chaudron, A quick and automated method for proﬁling heroin samples for tactical intelligence purposes, Forensic Science International 169 (2–3) (2007) 108–117.

[12] K. Andersson, E. Lock, K. Jalava, H. Huizer, S. Jonson, E. Kaa, A. Lopes, A. Poortman-van der Meer, E. Sippola, L. Dujourdy, J. Dahlen, Development of a harmonised method for the proﬁling of amphetamines VI. Evaluation of methods for comparison of amphetamine, Forensic Science International 169 (86–99.) (2007).

[13] I.W. Evett, Towards a uniform framework for reporting opinions in forensic science casework, Science & Justice 38 (3) (1998) 198–202.

[14] C.G.G. Aitken, F. Taroni, Statistics and the Evaluation of Evidence for Forensic Scientists, John Wiley & Sons, Chichester, UK, 2004.

[15] C.G.G. Aitken, D. Lucy, Evaluation of trace evidence in the form of multivariate data, Applied Statistics 53 (1) (2004) 109–122.

[16] C.G.G. Aitken, G. Zadorra, D. Lucy, A two-level model for evidence evaluation, Journal of Forensic Sciences 52 (2) (2007) 412–419.

[17] R. Royal, Statistical Evidence, The Johns Hopkins University Press, Baltimor, London, UK, 1992.

[18] B. Robertson, G.A. Vignaux, Interpreting Evidence; evaluating Forensic Science in the Courtroom, John Wiley & Sons, Chichester, UK, 1995.

[19] C. Champod, Overview and Meaning of Identiﬁcation Encyclopedia of Forensic Sciences, Academic Press, 2000, pp. 1077–1084.

[20] A. Bolck, I. Alberink, Bayesian Likelihood ratios for continuous variables—applica-tion on evidence evaluavariables—applica-tion in comparing MDMA tablets, Law, probability and Risk, submitted for publication 2009).

[21] D.V. Lindley, A problem in forensic Science, Biometrika 64 (1977) 207– 213.

[22] R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, 4th ed., Prentice Hall, 1998.

[23] B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, UK, 1986.

[24] AMCTB no. 4, Royal Society of Chemistry.www.rsc.org/amc/.

[25] L. Dujourdy, G. Barbati, F. Taroni, O. Gue´niat, P. Esseiva, F. Anglada, P. Margot, Evaluation of links in heroin seizures, Forensic Science International 131 (2–3) (2003) 171–183.

[26] A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin, Bayesian Data Analysis, 2nd ed., Chapman and Hall/CRC, 2004.

[27] S. Bozza, F. Taroni, R. Marquis, M. Schmittbuhl, Probabilistic evaluating of hand-writing evidence: likelihood ratio for authorship, Applied Statistics 57 (4) (2008) 329–341.

LR¼ fðyjx;HdÞ

fðyjHpÞ ¼m

jUhnj1=2Xm

i¼1

expfð1=2Þð¯x¯ziÞtðUhxÞ1ð¯x¯ziÞgexpfð1=2Þð¯y

m

hiÞ T

ðUhnÞ1ð¯y

m

hiÞg jUh0j1=2

Xm i¼1

expfð1=2Þð¯x¯ziÞtðUhxÞ1ð¯x¯ziÞgX m

i¼1

expfð1=2Þð¯y¯ziÞtðUh0Þ1ð¯y¯ziÞg

withUhx¼h2T0þnx1

S

x,Uh0¼h 2 T0þny1

S

yenUhn¼Thnþny1

S

yand

m

hi¼h 2 T0ðh2T0þnx1

S

xÞ 1 ¯xþn1 x

S

xðh2T0þnx1

S

xÞ 1 ¯zi Thn¼h 2 T0h2T0ðh2T0þnx1

S

xÞ 1 h2T0: