LR-based casework - The definition of the relevant population and the collection of data for li

Despite widespread acceptance of the LR in principle, the mass of LR-based research

and the developments in techniques for LR testing, only seven of the 34 (20.5%) experts

surveyed by Gold and French (2011) use the LR in FVC casework and, of those, just four (11.8%) use the numerical LR approach. Rose (2013b) is the only published report

of the application of the numerical LR to casework in which LR-based evidence was received by the court (reference is also made to the presentation of LR-based FVC

evidence in the courts in Australia in Morrison 2009a). The case came to trial in 2008 and involved a fraudulent telephone call (containing 14 seconds of offender speech)

made to an Australian bank requesting the transfer of $150 million. The suspect samples were a series of recordings made during police interviews and house searches, as well

telephone intercepts of the suspect talking to a friend.

The comparison focused on the wordyesand the phrasenot too bad. Fromyes, the onset, midpoint and offset of the first three formants of /je/ were analysed along with a

“crude” (Rose 2013b: 304) analysis of the lower cut-off in the spectrum of /s/. From not too bad, Rose analysed time-normalised f0 contours sampled across their trajectory

and F1, F2 and F3 midpoints from /o/ innot, /u:/ intooand /æ/ inbad (using Rose’s phoneme symbols). In the absence of a specific alternative hypothesis, the relevant

population was defined as adult male speakers of General Australian English (AusEng) (based on assumptions about the offender; see §2.3). The reference data consisted

of 35 adult males aged between 20 and 70, recorded over the telephone. The analyst prompted responses ofyesandnot too badusing questions such ashow’s it going? and

attempted to “indirectly prime” the speakers by producing the phrasenot too badwith the “correct intonation” at the beginning of the conversation (Rose 2013b: 285). Each

participant was recorded twice to obtain non-contemporaneous (i.e. random variability introduced by recording speakers on two separate occasions separated by some period

of time) assessments of within-speaker variability.

Modelling the reference data both normally and with kernel density (KD), Rose achieved

a LR of 70 for the formant analysis of /je/. The low cut-off analysis of /s/ generated a roughly estimated LR of 2.5. The acoustic analysis of the f0 pattern in not too

bad generated a LR of 20, while a categorical analysis of the tonal structure of the phrase generated a rough LR estimate of marginally greater than one. For the formants

extracted from not too bad, LRs of 24 (/o/), five (/u:/) and 11 (/æ/) were estimated respectively. Despite calculating an OLR of 11 million using naïve Bayes, a more

conservative OLR of 300,000 was arrived at by “simply discard(ing) the putatively correlated LRs (e.g. from individual formants innot)” (2013b: 305).

Rose (2013b) also provides a critique of the procedures applied, claiming that system performance should ideally have been presented to the court as a means of interpreting

the validity and reliability of the final OLR. The availability of data for pre-testing would also have allowed for the OLR to be calibrated (see §3.2.4), thus potentially

improving system validity. Rose’s analysis also fails to empirically account for between- variable correlations in determining a conservative OLR. As outlined in §2.2.2, since

2008 techniques for doing this have been developed for FVC. Finally, Rose highlights that the use of relatively small amounts of suspect and offender data means that the LR

estimate will be relatively imprecise and that, in such cases, “it is better, if possible, to try to avoid (absolute numerical values for the OLR)” (Rose 2013b: 305).

for the choice of variables is not made explicit. The analysis is based on a limited set of continuous, acoustic variables. However, withinyesandnot too badthere are other

variables that may have affected evidential support. There may also have been variables of evidential value in the other sections of the offender sample aside fromyesandnot

too bad. Secondly, the ecological validity of the procedures used to collect reference data is questionable. The context in which the samples for the reference data were

made is cognitively different from that of the evidential samples. Further, reference speakers were prompted to produce the target word and phrase and were primed to

produce the appropriate intonation contour. The potential effects of such mismatch, or of the non-Bayesian, subjective decisions made by the analyst, on LR output are not

explored in Rose (2013b). There are also a number of issues with the definition of the relevant population which are considered in §2.3.1.

There are no published guidelines for the application of the numerical LR to FVC casework. However, given the current state of methodological techniques, a set of

procedures can be determined based on the paradigm advocated in Morrison (2014) and its application in Enzinger and Morrison (2014). The procedures for computing a

LR for a single variable are:

1. Extraction of acoustic data from the variable of interest from the suspect and offender samples.

2. Decision regarding the relevant population (see §2.3).

3. Multiple recordings, matching the facts of the case at trial, from a sample of the

relevant population collected for use as development, test and reference data.

4. Extraction of acoustic data from the variable of interest for the development, test

and reference speakers.

5. SS and DS scores (prior to calibration LRs are referred to asscores) computed (using an appropriate LR formula) for the development and test data using the

reference data to assess typicality (feature-to-scorestage; see §3.2.2).

6. Calibration coefficients generated by applying logistic regression (see §3.2.4.1)

7. Calibration coefficients applied to the scores from the test data to convert the scores into calibrated LRs (score-to-LRmapping; see §3.2.4).

8. Validity and reliability (see §3.2.3 calculated based on calibrated LRs from the test data (this is presented to the court as a means of interpreting the performance

of the system under the conditions of the case at trial).

9. Score computed for the suspect and offender data using the same LR formula as in (5).

10. Calibration coefficients generated from the development data applied to the score for the suspect and offender data to convert the value into a calibrated LR.

If multiple correlated variables are analysed, as is typical in linguistic-phonetic FVC, further stages of analysis are implemented:

1. Stages (1) to (5) repeated for each variable.

2. Logistic regression fusion coefficients (Brümmeret al. 2007) derived from the scores for the development set.

3. Fusion coefficients applied to the scores for the test set to convert the scores for individual variables into a calibrated OLR (which incorporates the correlation

between the variables).

4. System validity and reliability metrics calculated based on the OLRs for the test

data.

5. As in stage (9), scores computed for each variable using the suspect and offender data.

6. Scores for the suspect and offender data combined using the fusion coefficients from the development data to generate a calibrated OLR.

In document The definition of the relevant population and the collection of data for likelihood ratio-based forensic voice comparison (Page 55-58)