Chapter four Data collection
4.5 Selecting representative speakers
4.5.1 Set-up of the speaker-selection test
As the most representative male and female from each group of 20 we considered that we should locate neither the best nor the poorest but the most typical, i.e. average, speakers within the peer groups. The most typical speaker can be located
6 This solution was preferred over the use of professional, high-quality recording equipment
on the grounds of the argument that we needed recordings of uniform quality regardless whether these were made in The Netherlands or in China. Since we knew beforehand that no recording studio and professional equipment would be available at Jilin University, we decided to downgrade the Leiden recording environment so as to be comparable to the Chinese facilities.
CHAPTER FOUR
74
only through comparing his/her intelligibility with that of the other members in the group, so that again a very large, in fact unmanageably large, experiment would have to be run. We therefore decided to base our search for the most typical speakers only on the first two datasets we recorded, i.e. the vowel test and the simplex consonant test, since these are arguably the severest tests on the quality of the speaker’s pronunciation. These two tests present the stimuli without any lexical redundancy, i.e., knowledge of the lexicon or of sentence-level constraints does not help the listener here at all. The same would apply to the consonant cluster set, but preliminary experiments had already indicated that clusters were more easily identified by all groups of listeners than simplex consonants (Wang and Van Heuven, 2003). In order to reduce the materials further, and at the same time make the screening test more efficient, we decided not to include all the 19 vowels and 24 simplex consonants, but restrict the presentation to the ten most difficult vowels and ten most difficult consonants within each speaker group.
The preliminary experiment (Wang and Van Heuven, 2003, see also footnote 4) produced complete confusion matrices for the vowels and simplex consonants for each of the nine combinations of speaker and hearer nationalities. We decided to select only the confusion matrices obtained for speaker-hearer groups that shared the same native language. As a result, the optimally representative Chinese speaker of English will be selected oh on the basis of his/her intelligibility in English for fellow Chinese listeners. The same principle, mutatis mutandis of course, was applied to
the selection of the Dutch and American speakers. The original confusion structures in the pilot experiments can be consulted in the literature, be it for the vowels only (see Wang and Van Heuven, 2004). It is clear from these confusion matrices that the order of difficulty, as evidenced by the error percentages in the identifications, is not the same for the three speaker/listener groups.
4.5.2 Stimuli
Tables 4.1 and 4.2 present the subsets of the ten most difficult vowels and consonants, respectively, for each of the three nationalities. In principle, the ten vowels or consonants selected are among the top-10 error percentages, but on some occasions we had to replace one or two sounds with high error percentages by alternatives with much lower error percentages; this was necessary in order to include attractive distractors in the list of ten. For instance, /f/ turned out to be an easy consonant for Chinese speakers/listeners but was included in the set of ten in order to provide an attractive response alternative for /v/ − which was a very difficult sound indeed. Moreover, in the selection of vowel sounds (full) diphthongs and /r/-colored vowels were excluded, so that only monophthongs could be selected.
DATA COLLECTION 75
Table 4.1. Percent error in vowel identification in pilot experiment for Chinese, Dutch and American speakers of English. Listeners shared the language background of the speaker. Vowels marked with an asterisk were selected for the screening test.
Vowel Chinese Dutch American
1. iÜ 12 * 0 13 2. I 44 * 0 * 38 * 3. eÜ 12 * 11 * 19 * 4. E 65 * 0 * 12 * 5. AÜ 21 6 13 6. œ 82 * 50 * 13 * 7. uÜ 76 * 50 * 19 * 8. U 56 * 6 * 56 * 9. OÜ 71 * 83 * 75 * 10. O 21 0 * 50 * 11. oÜ 50 * 33 * 19 * 12. ø 76 * 28 * 63 * 13. ´Ü 24 11 13 14. ai 12 6 13 15. OI 29 6 13 16. au 35 0 12 17. I´ 26 22 31 18. U´ 24 17 6 19. E´ 6 50 13 Total 41 20 26
CHAPTER FOUR
76
Table 4.2. Percent error in consonant identification in pilot experiment for Chinese, Dutch and American speakers of English. Listeners shared the language background of the speaker. Consonants marked with an asterisk were selected for the screening test.
Consonants Chinese Dutch American
01 p 3 0 0 02 b 3 0 6 03 t 6 0 * 0 * 04 d 15 17 * 6 05 k 6 0 6 06 g 18 0 0 07 s 41 * 17 * 56 * 08 z 47 * 22 * 19 * 09 S 6 * 0 * 31 * 10 Z 47 * 6 * 94 *7 11 T 44 * 33 * 94 * 12 D 76 * 39 * 75 * 13 h 12 0 6 14 r 15 0 6 15 f 0 * 6 0 * 16 v 74 * 17 12 17 tS 21 * 0 * 0 18 dZ 21 * 17 * 13 19 m 0 0 0 20 n 6 44 6 21 N 15 11 0 22 l 15 6 0 * 23 j 21 6 0 24 w 35 0 25 * Total 23 10 19
Summary statistics on the subsets of ten vowels and ten consonants are provided in Tables 4.3 and 4.4, respectively.
7 In the pilot experiment the consonants /T/ produced by the American female speaker and /Z/
produced by the American male speaker were both strongly confused by the listeners with /t/ and /f/. This depressed the consonant identification scores for this group. We may have recorded very poor native speakers for these two consonants, but we interpret the confusion structures in the pilot experiment such that these two consonants may be the most confusing consonants for the vast majority of American listeners. In order to enable these confusions we chose /t/ and /f/ as the contrast consonants to compare with /T/ and /Z/.
DATA COLLECTION 77
Table 4.3. Percent identification error obtained in preliminary experiment for the selection of ten most problematic vowels produced in /hVd/ frames. Mean, standard deviation, minimum, maximum and range of error percentage are indicated. The mean error percentage for the full set of 19 vowels is given in parentheses.
Speakers/listener group Mean SD Min. Max. Range Chinese 58.8 (41.3) 20.8 11.8 82.4 70.8
Dutch 26.1 (19.9) 28.2 0 83.3 83.3 American 36.3 (25.7) 23.2 12.5 75.0 62.5
Table 4.4. Percent identification error obtained in preliminary experiment for the selection of ten most problematic simplex consonants produced in /A:CA:/ frames. Mean, standard deviation, minimum, maximum and range of error percentage are indicated. The mean error percentage for the full set of 24 consonants is given in parentheses.
Speakers/listener group Mean SD Min. Max. Range Chinese 25.0 (22.7) 25.9 20.6 76.5 55.9
Dutch 13.7 (10.0) 13.9 5.6 44.4 38.8 American 28.9 (19.0) 37.7 6.3 93.4 87.1
As can be seen in Tables 4.3 and 4.4, the mean difficulty (percent error obtained in the preliminary study) was greater for the vowel test than for the consonant test. Also the level of difficulty was not uniform across the three speaker/hearer groups. These differences, of course, do not invalidate the screening test; they just show that what is difficult in one group may not be difficult for another group. What is important is that the overall level of difficulty in the selections was closer to 50% error than the means found in the pilot experiment; on account of this, the selections provide a more efficient and discriminating testing instrument than when the full set of 19 vowels and 24 consonants had been included.
Two separate tests were constructed from the selections for each of the three listener groups. For each listener group, the first test comprised the ten hVd tokens for the ten male and ten female speakers sharing the same language background as the prospective listeners, in quasi random order. Immediate succession of the same vowel type or tokens produced by the same speaker were systematically excluded. This resulted in a vowel identification test for each listener group comprising 20 (speakers) × 10 (vowel types) = 200 stimuli. These were preceded by ten practice items, randomly chosen from the set of 200.
CHAPTER FOUR
78
Three consonant identification tests, one for each listener group, were compiled in analogous fashion, yielding 20 (speakers) × 10 (consonant types) = 200 stimuli, again preceded by ten practice items.