5.3 Results
5.3.3 Experiment (3): Speaker-specific patterns
Figure 5.9 displays LLRs based on input from the first three formants of /aI/ analysed individually and in combination using DyViS speakers only (20 development/ 20 test/
57 reference). The SS median LLRs based on F1-only and F2-only were within the same order of magnitude (limitedsupport), although numerically strength of evidence
was generally better using F2-only. The ranges of SS LLRs for F1-only and F2-only were also broadly equivalent, with values spread from marginally less than zero to
around +1. Although the median SS LLR for F3-only was also located within the zero to +1 range, the absolute numerical value was much closer to +1. Further, the maximum
strength of SS evidence for F3-only was +2.72 (moderately strongsupport) indicating that F3 in some cases outperformed F1 and F2 by up to two orders of magnitude. The
strength of SS evidence was, however, greatest when using a combination of all three formants, with LLRs generally one order of magnitude higher compared with any
formant individually (moderatesupport). The proportion of misses also decreased from maximally 15% using F1-only to 5% using all three formants.
Figure 5.9: Tippett plot of SS and DS LLRs using F1-only (blue), F2-only (red), F3-only
(green) and a combination of the three formants (orange) of /aI/ from DyViS
Similar results are revealed in the distributions of DS LLRs. Numerically, the weakest
DS LLRs were achieved using F1-only, followed by F2-only. The difference in median values was equivalent to one order of magnitude fromlimited(F1-only) tomoderate
(F2-only) support for the defence. However, unlike the SS comparisons, F3-only input generated generally stronger LLRs than the combination of the three formants. The
median DS LLR based on F3-only was -4.11 (very strongsupport), compared with -3.66 (strongsupport) using F1∼F3. Further, the range of DS LLRs for F3-only extended to -35.4, compared with -19.5 for F1∼F3. However, F3-only input also generated a higher false hit rate, as well as higher magnitude contrary-to-fact DS LLRs compared with the
combination of formants.
Figure 5.10 displays EER and Cllr values for each of the four sets of formant data. Despite achieving somewhat weaker DS LLRs compared with F3-only, the combination
of formants produced the best performing system in terms of both EER and Cllr.
F1∼F3 outperformed F3-only by 5% in terms of EER and 0.2 in terms ofCllr. The
achieving EER values of around 20% andCllrvalues of around 0.6. Consistent with patterns in Experiments (1) and (2), the improved performance of the combination of
formants over F3-only in terms of the strength of SS LLRs and system validity provides evidence that F1 and F2 do carry speaker-specific information. However, given that
F1 and F2 encode so muchspeechinformation (i.e. they are carriers of contrast), their value as individual discriminants is relatively minimal. Clearly in terms of individual
formants, F3 dominates with regard to speaker discrimination.
F1, F2 and F3 F1-only F2-only F3-only 0 5 10 15 20 25 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Log LR Cost (Cllr ) EER (% )
Figure 5.10: Log LR Cost (Cllr) plotted against EER (%) for different DyViS formant
input for /aI/
5.4
Discussion
The results of Experiment (1) revealed a number of effects of using regionally Matched and Mixed BrEng data at both the feature-to-score and score-to-LR stages of system
testing using /aI/. Consistent with predictions in §5.1, the effects of using regionally
Mixed system data were considerably more severe for /aI/ than for /u:/, owing primarily to the regional variation encoded in /aI/ in BrEng. The distributions of SS LLRs were
Matched and Mixed system. However, DS LLRs were weaker by up to four orders of magnitude in the Mixed condition (using F1∼F3). Further, consistent with the results in §4.3.2, validity was consistently worse (by up to 7% EER and 0.15Cllr) when using the Mixed system compared with the Matched system.
The removal of F1 and then F2 in Experiment (1) generated lower magnitude LLRs and generally worse system validity across both systems. This, along with the results
of Experiment (3), suggests that F1 and F2, which are primarily thought to encode phonetic contrast and systematic regional and social variation, are capable of carrying
considerable speaker discriminatory information. Further, the removal of F1 and F2 in Experiment (1) reduced the divergence between the Matched and Mixed systems
in terms the distributions of LLRs, such that LLRs were most similar across systems when using F3-only input. These results suggest that there may be a trade-off between
the speaker discriminatory potential that lower formants (F1 and F2) provide and the regional sensitivity they introduce into the LR-based analysis. That is, with the removal
of F1 and F2, the strength of evidence and overall system performance may be lower, but the effects of regional variation, at least in terms of the magnitudes of the LLRs
themselves, may be minimised.
Somewhat different patterns were revealed in terms of the Matched and Mixed validity across the three sets of /aI/ input. The EER for the Mixed system was only marginally
higher than that of the Matched system when using all three formants and with the
removal of F1. However, the largest difference between the systems in terms of EER was found when using F3-only (c. 7%). Similarly, the smallest difference between the
systems in terms ofCllrwas found using F1∼F3, followed by F2 and F3. As with EER,
the largestCllrdifference between systems was found using F3-only (c. 0.15). This
finding runs contrary to the earlier prediction that LR output based on F3 may be most robust to different definitions of the relevant population based on the hypothesis that
it encodes more information relating to theindividualrather than regional and social information relating to thegroup(Garvin and Ladefoged 1963).
In Experiment (2), the cubic coefficients of F1 and F2 were both able to correctly
assign around 64% of the 320 tokens to the regional group (four regional groups) of the speaker, and both outperformed F3. This suggests, predictably, that F1 and F2 (and
in particular the intercept (absolute frequency) and slope elements of the trajectory) are primarily responsible for the differences between the four sets (as shown in Figure
5.2). F3 generated a classification rate of 40.6% which, although worse than F1 and F2, was better than chance (25%). Further, when analysing the individual elements of the
trajectory using DA, the intercept generated the highest classification rate compared with coefficients relating to the dynamics of the trajectory. This suggests that F3 does
encode some region-specific information primarily in the absolute frequency element of the trajectory. This may be due to intrinsic factors (i.e. an inherent property of F3
itself) such as VQ and vocal setting (see Stevens and French 2012), as well as extrinsic factors (i.e. extraneous) such as correlation with F2 (although no consistent correlations
between elements of F2 and F3 were found when this was tested using these data). Formal analysis of these factors was not possible, however, due to the small number of
speakers and regional sets available.
Despite evidence of region-specific patterns of F3 variation, consistent with previous
studies, in Experiment (3) F3 outperformed F1 and F2 in terms of the magnitude of LLRs and system validity. There was also evidence of speaker-specificity in the lower
formants, with F1∼F3 generating higher magnitude SS LLRs and better overall system performance than any individual formants. However, the addition of F1 and F2 to F3
did generate lower magnitude DS LLRs. The combined results of Experiments (2) and (3) suggest that for F3, Garvin and Ladefoged’s (1963)group-individualdistinction is a
continuum rather than a dichotomy, since F3 was found to encode at least some regional information along with considerable speaker discriminatory power. More importantly
when considered in terms of the results of Experiment (1), it is clear that the inevitable regional and social information to which linguistic-phonetic variables respond may
affect different elements of LR output (e.g. magnitude of LLRs, validity) in potentially unpredictable ways and to unpredictable extents. Potential explanations for the results