Chapter 2: Methodologies
2.5. Statistical Testing
In spite of the best efforts to appropriately select and measure a solid body of high-quality data, certain remaining differences among the groups were attributed merely to the inherent variability of samples rather than observed differences in data values (Butler, 1985; Johnson, 2008). Thus, applying statistical techniques to test the hypotheses formulated from the observed patterns is crucial in determining whether the observed differences between sets of data could reasonably have been expected to occur by chance, for example, as a result of sampling variation or other factors (Butler, 1985; Dörnyei, 2007; Johnson, 2008; Woodrow, 2014; Levshina, 2015).
This study takes advantages of pairwise t test, hierarchical clustering algorithm, and the SplitsTree software to reveal those patterns and structures that remain hidden in the raw data, which are then presented in a logically structured, visually tangible, and scientifically reliable way.
2.5.1.Pairwise t-test comparison
While the t test is limited to comparing the means of two groups, the function of pairwise t tests allows for performing multiple t tests on a multitude of groups (Levshina, 2015). Thus, the ordinary t test is apparently not suitable for this study involving 8 citation tones and 64 disyllabic tones from 21 speakers. Correspondingly, it is appropriate to employ the pairwise t tests in this study to determine whether the variables (e.g., F0 or duration) among a population (e.g., sets of tone) differ from each other in a significant way under the assumption that the paired differences are independent and identically normally distributed.
For example, examining whether the eight Zhangzhou citation tones differ significantly in duration requires all possible pairwise comparisons of the values derived from acoustic quantification and normalisation. Figure 2-9 shows the comparison result among 28 (=8*7/2) paired differences.
Figure 2-9. Pairwise t-test comparison of normalised duration among Zhangzhou citation tones.
During the application of pairwise t tests, the Bonferroni correction has to be performed to control for the Type I Error and achieve significance (Levshina, 2015). The corrected alpha is calculated by dividing the critical P value by the number of comparisons under consideration. For example, in this duration case, the corrected alpha is 0.00186 (= 0.05/28). If the calculated t value is less than the corrected alpha, then the paired difference is considered to be statistically significant,
and vice versa. For example, the durational difference between tones 7 and 1 is statistically significant (2e-16<0.00186), while it is not significant between tone 8 and tone 2 (1>0.00186). As discussed, pairwise t tests can help determine whether the observed variables between sets of data are statistically significantly different from each other. The testing result can be visualised using the hierarchical clustering algorithms, which can help assess how many levels the sets of data can be clustered into from the scientific point of view. For example, Figure 2-10 presents the clustering result of normalised length levels of Zhangzhou citation tones by pairwise t tests.
Figure 2-10. Clustering of normalised length levels of Zhangzhou citation tones by pairwise t tests.
The vertical lines in Figure 2-10 are branches representing the amount of significant difference in duration across different tones. The longer the branch in the vertical dimension, the larger the amount of significant difference in duration. The horizontal dimension indicates which tone connects to which in terms of length. The closer the tones, the higher the probability that they belong to the same length level. The scale at the far left of the figure is the threshold distance for significance. The higher the threshold, the lesser the length levels that can be generated.
This study primarily selected the threshold at 1 to determine the number of significantly different levels for normalised F0 and duration. The clustering results at this threshold were largely consistent with results determined on the basis of either the auditory impressions or the acoustic quantifications, which will be presented in Chapters 5, 7 and 8. Nevertheless, the threshold may be slightly modified for the data set having a relatively larger effect size. For example, the clustering of normalised F0 contours for citation tones involved 16 putative points to be pairwise tested and hierarchically clustered. A selection of threshold at 1.5 can group the putative points into higher-quality clusters (see Chapter 5). Once the number of the length level has been decided, the ranking can be further conducted for a better representation.
The pairwise t tests and the hierarchical clustering algorithms were mainly employed to address a series of specific research question in the thesis, including
● How many normalised F0 and duration levels are contrastive among Zhangzhou citation tones? (Chapter 5)
● Are the F0 and duration realisations of Zhangzhou phrase-initial tones affected by their following tones? If so, to what extent are they affected, and what conditions the variations? (Chapter 7)
● How many normalised F0 and duration levels are contrastive among Zhangzhou phrase-initial tones? (Chapter 7)
● Are the F0 and duration realisations of Zhangzhou phrase-final tones affected by their preceding tones? If so, to what extent are they affected, and what conditions the variations? (Chapter 8)
● How many normalised F0 and duration levels are contrastive among Zhangzhou phrase-final tones? (Chapter 8)
2.5.2.SplitsTree
This study asserts that tonal realisations in Zhangzhou are multidimensional, involving a variety of co-varying phonetic parameters that include pitch/F0, duration, vowel quality, voice quality, and obstruent coda. Each tone thus consists of a bundle of phonetic outputs, forming a multidimensional framework for the whole tonal system (see later chapters). Revealing the patterns and structures that remain hidden within the geometry is imperative for understanding the relatedness between tones from the phonological point of view.
This study employs the SplitsTree software (Kloepper & Huson, 2008) to generate phylograms in order to hierarchically visualise the mapping for a set of tones under consideration. The phylogenetic network is applied under the assumption that each tone in a sequence can change its phonetic outputs independently from the other tones. Before a phylogram is created, the phonetic outputs for a set of tones under investigation have to be transformed into a multiple sequence alignment to be computed in the SplitsTree software. The sequencing of the multiple alignment varies along with the changes of tonal realisations across different linguistic contexts.
Figure 2-11 demonstrates how the phylogenetic tree works for the mapping of phonological relatedness between Zhangzhou tones in this study. The root of the tree encodes what type of data set is under consideration (e.g., Zhangzhou citation tones). The horizontal lines are branches representing the amount of similarity in terms of multidimensional parameters across tones being investigated. The shorter the branch in the horizontal dimension, the stronger the similarity that tones share in phonetics, and vice versa. The vertical dimension indicates which tone connects to which other tone. The more divergent the tones, the higher the probability that they are phonologically unrelated, and vice versa. The bar on the top of the phylogram shows the length of the branch with an amount similarity change of 0.07.
Figure 2-11. Example of phylogenetic mapping for Zhangzhou tones (C=Citation tone).
The SplitsTree software was mainly employed to address the relatedness between a set of tones from the phonological perspective, the results of which can significantly help to develop a conception of the totality of tonal contrasts and tonal neutralisations. The software was applied only when the multidimensional properties for each set of tones had been established and had been transformed into a multiple sequence alignment. The application of SplitsTree was mainly to address a series of specific research questions, including
● How are Zhangzhou tones related to each other in citation? (Chapter 5)
● How are Zhangzhou tones related to each other phrase initially? (Chapter 7)
● How are Zhangzhou tones related to each other phrase finally? (Chapter 8)
● How are Zhangzhou phrase-initial tones related to citation tones? (Chapter 9)
● How are Zhangzhou phrase-final tones related to citation tones? (Chapter 9)
The techniques of pairwise t testing, the hierarchical clustering algorithm, and the SplitsTree enabled the identifications of the relations among independent variables as logically structured, visually tangible, and objectively presented, which helped this study achieve a higher level of generalisation and interpretation. The procedure of pairwise t tests and hierarchical clustering used in this study was accomplished with the very kind assistance of my colleague, Dr. Siva Kalyan. The application of SplitsTree was conducted under the instruction of my supervisor, Dr. Paul Sidwell. Many concepts with respect to statistical testing were also enlightened by the precious feedback and advice from Prof. Phil Rose. I acknowledge their generous help. The codes for the pairwise t-test comparison and clustering, as well as the multiple sequence alignments for the SplitsTree operation are all provided in the attached USB.