7.2 General comparisons
7.2.3 Minimizing anatomical/physiological variation
To establish how well the 12 procedures minimize anatomical/physiological speaker-related variation in the acoustic data, four series of linear discriminant analyses were carried out: LDA 2-5. In these analyses, combinations of the four acoustic variables,F0,F1,F2, andF3, transformed through each of the 12 normalization procedures, were entered as predictors.
In LDA 2, the normalization procedures were evaluated on how well they produce output that can be classified as produced by a male or by a female speaker. In this LDA, speaker-sex was the dependent variable, having two levels; (transformed versions of)F0,F1,F2, and
F3 were entered as predictors. For LDA 2, it was expected thatF0 would be the dominant predictor, especially for the baseline transformation (HZ). Large differences in values ofF0 between both sexes were expected.
To investigate whether differences between the procedures found for LDA 2 can be attributed to differences in F0, or that they are due to differences inF1, F2, andF3, two additional LDAs were carried out. In both analyses, speaker-sex served as the dependent variable. In LDA 3,F0was entered as the sole predictor and in LDA 4,F1,F2, andF3were entered as predictors.
In LDA 5,F0,F1,F2, andF3were entered as predictors. Here, it was evaluated how well the procedures classify vowel tokens as being produced by a younger or older speaker. The dependent variable was the factor age, also having two levels. Any procedure that effectively eliminates anatomical/physiological variation related to the speaker’s sex or age must not perform above chance level in the LDA. If a normalization procedure is performing at chance level in classifying tokens as male of female, it has eliminated all anatomical/physiological variation.
Table 7.3 presents the results for all four LDAs, i.e., the percentage of vowel tokens successfully classified as one of the two sexes (LDA 2, LDA 3, LDA 4) or as one of the two age-groups (LDA 5). ForSYRDAL&GOPALandMILLER, the analyses could not be carried out for LDA 3 and LDA 4, because these two procedures do not useF0, orF1,F2, orF3in the same way as the other procedures (cf. Chapter 2). For instance,SYRDAL&GOPALuse
DB
1 −DB0 as their first dimension (see equation (2.6)).
Table 7.3 shows that, for LDA 2, 93% of the vowel tokens were categorized correctly on speaker-sex forHZ. This can be interpreted as that most of the anatomical/physiological was preserved in the raw data. If all speaker-sex related variation would have been eliminated, the percentage of correctly classified vowel tokens would have been 50%. LOBANOV(50%) andCLIHi4(50%) performed best, they removed all variation related to the speaker’s sex. GERSTMAN(53%) andSYRDAL &GOPAL(53%) removed nearly all sex-related variation, followed byMILLER(79%),CLIHs4(81%),NORDSTROM¨ &LINDBLOM (83%). The scale transformationsLOG,BARK,MEL, andERBdid not eliminate any anatomical/physiological variation related to the speaker’s sex. Only three procedures perform at chance level for
Table 7.3: Percent correctly classified vowel tokens for LDA 2-5. For all four LDAs, the chance level was 50%. For LDA 2, all percentages lower than 92% are significantly different from the baseline (HZ). For LDA 3, this is 87%, and for LDA 4, this is 78%. For all LDAs: all percentages are higher than 53% are significantly higher than chance level. All percentages were rounded off to the nearest whole number.
% LDA 2 LDA 3 LDA 4 LDA 5
Dependent variable Speaker- sex Speaker- sex Speaker- sex Speaker- age Predictor variables F0, F1,F2, F3 F1,F2,F3 F0 F1,F2,F3 HZ 93 89 80 57 LOG 93 89 80 57 BARK 93 89 80 58 ERB 93 89 80 57 MEL 92 89 80 58
SYRDAL&GOPAL 53 - - 51
LOBANOV 50 51 51 52
GERSTMAN 53 53 51 52
CLIHi4 50 51 49 50
CLIHs4 81 78 69 57
NORDSTROM¨ &LINDBLOM 83 82 52 57
MILLER 79 - - 51
LDA 2:LOBANOV,CLIHi4, andGERSTMAN. All other procedures do not eliminate variation related to speaker-sex from the vowel data effectively enough.
The results for LDA 3 and LDA 4 show thatF0contains considerable anatomical/physi- ological variation. This variation can be attributed to differences between male and female speakers, because 89% of the vowel tokens could be correctly classified in LDA 3 (in which
F0was entered as the sole predictor). The variation inF0stems most likely from differences in the anatomy and physiology of the larynx of males and females. However, the three formant frequencies display anatomical/physiological sex-related variation as well, although less thanF0. The variation in the formant frequencies is caused by differences in vocal- tract-length between males and females. The results for LDA 4 show thatNORDSTROM¨ & LINDBLOM, a procedure designed to account for vocal-tract-length differences, succeeded in eliminating these differences. The results for NORDSTROM¨ & LINDBLOM in LDA 2, LDA 3, and LDA 4 indicate that this procedure dealt effectively with (vocal-tract-related)
anatomical/physiological variation in the formants, but that it did not succeed in eliminating the (larynx-related) anatomical/physiological variation in the fundamental frequency.
The results for LDA 5 in Table 7.3 show that less anatomical/physiological age-related than sex-related variation was present; the percentages of correctly classified vowel tokens according to speaker-age are considerably lower than the percentages for speaker-sex. Five procedures perform at chance level for LDA 5:SYRDAL&GOPAL,LOBANOV,GERSTMAN, CLIHi4, andMILLER. All other procedures perform (slightly) above chance level and did not eliminate all age-related anatomical/physiological variation from the vowel tokens50.
To sum up, the results in Table 7.3 show that the acoustic consequences of the anatomi- cal/physiological related to speaker-age were overall considerably smaller than the acoustic consequences of the speaker-sex. For age, the percentages across all procedures are overall just above chance level. Most of the variation in the acoustic signal seems to be related to the anatomical/physiological differences in the vocal tract and larynx of female and male speakers, whereas the differences related to speaker-age could not be attributed univocally to specific anatomical or physiological differences between younger and older speakers. Overall,LOBANOV,CLIHi4, andGERSTMANeliminated anatomical/physiological variation from the acoustic measurements best of all 12 procedures.