Chapter 2: Methodologies
2.2. Data Elicitation
2.2.1. Corpus design
The preparation of a good data corpus is the key to linguistic phonetic fieldwork (Ladefoged, 1999). A good corpus is a body of data within which patterns, systems, or generalisations can be discovered, resulting in hypotheses proposed by induction (Samarin, 1967). In this study, two separate corpora resulted from the data collection: a controlled one and a general one.
(1) For the acoustic investigation, a controlled corpus was carefully designed under the collaboration between the local community and myself as a native-speaking linguist candidate. This controlled corpus incorporated two sets of word lists: one for investigating Zhangzhou citation tones and the other for investigating the multisyllabic tones. A supplementary word list was also designed for the segmental investigation. The principle of the controlled corpus design is described in the rest of this sub-section.
(2) For vocabulary documentation, a word list of 1366 words, categorised according to different semantic domains, was prepared; however, this corpus was not used for experimental phonetic purposes in this thesis. The list can be referred to in Appendix C (attached USB).
2.2.1.1. A word list for citation tone elicitation
The preliminary investigation identified seven monosyllabic citation tones in Zhangzhou (see Huang et al., 2016). In the word list for citation tone elicitation, there were usually 20 tokens for each tone, resulting in approximately 160 tokens to be recorded by 21 speakers in the field and to be processed acoustically and statistically in the lab. However, with the progress of the research in the post-field stage, eight tones rather than seven were posited in this study (see Chapter 4). This word list of 160 tokens was, thus, effectively shared by eight tones, with some tones (e.g., tones 7 and 8) having less than 20 tokens to be processed. While it appears impractical to return back to the field site and collect an equal number of tokens for tones 7 and 8 from the same speakers, the existing corpus from 21 speakers can to a large degree compensate the shortage and ensure the derived results represent Zhangzhou variety as a whole.
The word list included as many minimal or sub-minimal pairs as phonotactics allowed. Tokens were chosen across different syllable types and contained comparable numbers of syllable onsets with different manners and places of articulation as well as vowels of different height and backness, which maximally balanced the intrinsic perturbation effects on the realisation of tonal F0 from tautosyllabic segments. Table 2-4 shows examples from this word list, but more examples for each tone appear in Table A1 in Appendix A.
2.2.1.2. A word list for disyllabic tone elicitation
Based on the preliminary assumption of seven tones in Zhangzhou, logically, there would be 49 (= 7 * 7) disyllabic combinations. Twelve examples were thus selected for each combination, and 588 (= 12 * 49) tokens made up the word list to be elicited in the field. As stated above, eight tones rather than seven were ascertained in this thesis; therefore, these 588 tokens were effectively spread across 64 (= 8 * 8) tonal combinations, leading some combinations to have no tokens or
less than 12 tokens for processing on the basis of this word list. Nevertheless, additional tokens were chosen from the supplemented word list, as introduced below, to make sure every combination had tokens to be processed auditorily, acoustically, and statistically.
The tokens for disyllabic tone elicitation were also designed under the principle of maximally balancing the tautosyllabic segments’ intrinsic perturbation on the tonal F0 realisation. Another special consideration with respect to this corpus design was that the selected constructions largely contained one morpheme having a tone that can both precede and follow any tone in the inventory. For example, in Table 2-5, these disyllabic phrases all contain one common morpheme (/ɓɛ3/ ‘horse’) that can precede (A) and follow (B) any of the eight citation tones. Designing in this way can largely enable investigation of how the tone of a given morpheme is realised across different contexts and how surrounding tones influence its realisation. Additionally, the tokens are all disyllabic phrases, mainly incorporating noun phrases, verb phrases, adverbial phrases, and adjective phrases, which are commonly used in the daily life of the local community rather than being arbitrarily created for experimental purposes. All examples for disyllabic tone elicitation appear in Appendix A (Table A2), which has been reorganised in accordance to the tonal pattern.
Table 2-4. Examples of monosyllabic tokens for citation tone elicitation
Tone Example 1 Example 2 Example 3 Example 4
1 /si/ ‘poetry’ /kɔ/ ‘mushroom’ /tɐŋ/ ‘east’ /ʔi/ ‘he/she’ 2 /si/ ‘time’ /kɔ/ ‘glue’ /tɐŋ/ ‘copper’ /ʔi/ ‘move’ 3 /si/ ‘die’ /kɔ/ ‘drum’ /tɐŋ/ ‘wait’ /ʔi/ ‘chair’ 4 /si/ ‘four’ /kɔ/ ‘look after’ /tɐŋ/ ‘chilly’ /ʔi/ ‘intenson’ 5 /si/ ‘affirmative’ /ħɔ/ ‘rain’ /tɐŋ/ ‘heavy’ /ʔi/ ‘play’ 6 /sik/ ‘colour’ /kɔk/ ‘country’ /tɐk/ ‘kick’ /ʔit/ ‘one’ 7 /sit/ ‘solid’ /tɔk/ ‘poison’ /tɐt/ ‘stimulating (food)’ /ʔik/ ‘bathe’ 8 /tsi/ ‘tongue’ /kɔ̃/ ‘snore’ /tɐ/ ‘step on’ /zi/ ‘press’
Table 2-5. Examples of disyllabic tokens for disyllabic tone elicitation
Pattern A Example Pattern B Example
3+1 /ɓɛ.kʰɐ/ ‘horse feet’ 1+3 /ʔɔ.ɓɛ/ ‘black horse’ 3+2 /ɓɛ.te/ ‘horse hoof’ 2+3 /tɐŋ.ɓɛ/ ‘copper horse’
3+3 /ɓɛ.sɐj/ ‘horse shit’ 3+3 /pɵ.ɓɛ/ ‘precious horse, BMW’ 3+4 /ɓɛ.tsʰwi/ ‘horse mouth’ 4+3 /tsjɐn.ɓɛ/ ‘battle horse’ 3+5 /ɓɛ.pɔ/ ‘horse stance’ 5+3 /tsʰi.ɓɛ/ ‘to feed horse’ 3+6 /ɓɛ.kut/ ‘horse bone’ 6+3 /pɐt.ɓɛ/ ‘Eight Horse’ 3+7 /ɓɛ.sut/ ‘horsemanship’ 7+3 /pɐt.ɓɛ/ ‘to bind horse’ 3+8 /ɓɛ.tsi/ ‘horse tongue’ 8+3 /pɛ.ɓɛ/ ‘white horse’
2.2.1.3. A supplemented word list for multisyllabic tone elicitation
In addition to the two well-controlled word lists for citation and disyllabic tone elicitation, a supplementary word list was also prepared to further illustrate segmental contrasts and their allophonic distributions, as well as some multisyllabic data showing specific sandhi patterns from the mainstream. These word lists ensured obtaining sufficient language data for the descriptive and explanatory study of the sound system of Zhangzhou. Examples will be shown in relevant sections in this thesis.