3.2 Methods and Procedures
3.2.3 Segmentation and Annotations
The methods and procedures for segmentations and acoustic measurements are discussed in Sections 2.3 and 2.4. Measurements were made using Praat (Boersma and Weenink, 2016).
For the diphthongs that are in open syllables, and the following word starts with a stop consonant /k/, it is easier to visually segment the vowel offset in the spectrogram than if the following word started with a vowel. In one case, [ɡeɐ] “went” in the CP context proved difficult when inserting boundaries for the diphthong such that the target sounds were in open syllables (CV), since the following word started with a vowel /e/. In this case, along with the clear formant structure of F1 and F2, the waveforms were used to
82
insert the boundaries, especially at the end of the vowel, when the waveform started to get less complex, just before the beginning of the following vowel.
In addition, in order to measure the formant transition duration, another Praat tier named “Transition” was used in order to manually insert interval boundaries around the diphthong transition period. The start point of the interval was inserted where the second formant started to change from its steady state and the end point of the interval was inserted when the second formant started to appear in a steady state (cf. Lindau, Norlin, and Svantesson, 1990). These criteria are shown in Figure 3. 1 and Figure 3. 2.
The number of analysed tokens was 1307. As discussed in Section 2.4.2, data from two males and two female speakers were not included in the final analysis. The total number of tokens analysed per vowel, by males (M) and females (F) and in Carrier Phrases (CP) and Full Sentence (FS) are given in Table 3. 3.
Figure 3. 1: word level segmentation for diphthong in [ɡeɐ] (+voiced V, open syllable – the word following the target does not begin with /k/) [left] and for diphthong in [pɑe] (- voiced V, open syllable - the word following the target begins with /k/) [right]
83
Figure 3. 2: word level segmentation for transition interval for diphthong in [pɑe]
Table 3. 3: total number of tokens analysed for each vowel, per gender (M and F) and per context (CP and FS). Speakers Context Vowel Tokens F M CP FS ɑe 220 110 110 111 109 ɑʊ 218 109 109 108 110 oe 212 108 104 103 109 ɪɐ 220 110 110 109 111 eɐ 217 109 108 107 110 ʊɑ 220 110 110 110 110 Total 1307 656 651 648 659
84
3.2.4 Automatic Formant Extraction
Praat scripts (see Appendix 2E) were used to extract the frequencies of the first, second and third formant of monophthongs in two temporal positions, 20% and 80% (cf. Williams and Escudero, 2014; Hillenbrand, 2003), and the duration in milliseconds. Following measurements by Mayr and Davies (2011), Kirtley et al. (2016) and Williams and Escudero (2014) of diphthong trajectories, F1 and F2 movement, and vowel inherent
spectral change (VISC), the formant frequencies were additionally measured at seven
equidistant points for each formant, i.e. 20%, 30%, …, 80%.
As discussed in Section 2.1.2, F1 and F2 in the vowel steady state (usually midpoint) is not sufficient to investigate the acoustic properties of diphthongs, because the vowel quality changes resulting in a decrease or increase in F1 value, depending on whether the first segment is open (e.g. /ɑ/) or closed (e.g. /ɪ/). Therefore, the rate of change (ROC) approach, as employed by Gay (1968), Deterding (2000) and Kent and Read (1992 cited in Deterding, 2000), was also used to measure the change in the quality and spectral change in diphthongs. Further, following Lindau et al. (1990), the transition duration was measured for each diphthong, as detailed above. In addition to formant frequencies, the total duration of diphthongs was measured.
In order to compare the first target and the second target in the diphthongs with their monophthongal counterparts, the formant frequencies of the first two formants were extracted at the midpoint of the first and second target and were analysed acoustically and statistically. In addition, in order to compare the duration of the first and second target of the diphthongs beyond a visual inspection of the spectrogram (cf. Mayr and Davies, 2009), the duration before and after the transition period was measured for acoustic and statistical analysis. These measurements aided in determining whether the second vowel in Urdu diphthongs is always long (Waqar and Waqar, 2002), or both vowels are equally
85
long (Bhatti and Mumtaz, 2016), or it merely depends on each individual vowel. These measurements also help with the IPA transcriptions of Urdu diphthongs.
Previous studies have reported diphthong duration to compare cross-dialect differences (cf. for Welsh: Mayr and Davies, 2011; for American English: Jacewicz and Fox, 2013; for Southern and Northern dialect of British English: Williams and Escudero, 2014). In the present study diphthong duration was compared with monophthong duration in order to determine if the two vowels in the target words have a combined duration comparable to a single vowel (i.e. total duration will be equal to or less than the long monophthongs in Urdu, as reported by Khurshid et al., 2003) or two separate vowels (i.e. the total duration of the two vowels in the diphthong will be less than the sum of the two corresponding monophthongs).
Following Mayr and Davies (2011) and Fox and Jacewicz (2009), the vowel section length (VSL) was calculated. In the present study six sections were calculated as opposed to the four sections calculated in previous studies to provide sufficient resolution for subsequent visual comparison with manual segmentation (see Section 3.2.3). That is, we calculate VSL for sections 20%-30%, 30%-40%, 40%-50%, 50%-60%, 60%-70%, and 70%-80% across each diphthong duration with the following Euclidean distance formula:
𝑉𝑆𝐿𝑛 = √(𝐹1𝑛− 𝐹1𝑛+1)2+ (𝐹2𝑛 − 𝐹2𝑛+1)2 (1)
where VSLn is the section length with section number n (i.e. n=1 for 20%-30%, n=2 for
30%-40%, …, n=6 for 70%-80%) and F1n/F2n are the format values at sample number n
(i.e. n=1 for 20%, n=2 for 30%, …, n=7 for 80%).
The trajectory length (TL) was then calculated for each diphthong.
𝑇𝐿 = ∑6
86
Trajectory length (TL) can be defined as the length of the diphthong’s path through F1/F2 vowel space.
The overall rate-of-change of this trajectory is then the trajectory length divided by the portion of the overall duration that the trajectory covers (i.e. 60% of the duration)
𝑇𝐿𝑟𝑜𝑐 = 𝑇𝐿
0.60 ×𝑉𝑑𝑢𝑟 (3)
This gives the values of trajectory length rate of change in Hz per millisecond.
Vowel section length (VSL) rate of change was calculated separately for each section of each diphthong with the following formula:
𝑉𝑆𝐿𝑟𝑜𝑐𝑛 = 𝑉𝑆𝐿𝑛
0.15×𝑉𝑑𝑢𝑟 (4)
This means that the VSL of each section of diphthong (in Hz) was divided by the duration (in ms) of that section; this gave the values of spectral rate of change in Hz per milliseconds.
3.3 Statistical Analysis
The models for statistical analysis are identical to those discussed in Section 2.4.