Analysis - Acoustic models of consonant recognition in cochlear implant users

Chapter 2. Background

3.3 Methodology

3.3.6 Analysis

For each subject/listening condition, a test run comprised randomised presentation of 3 instances of each of the 20 consonants. As noted below, each test run generated a series of responses which were coded as a confusion matrix. Subsequent data analysis could be divided into two main stages: first, the derivation of consonant feature transmission values and, second, descriptive and inferential statistical analyses. These stages are described below.

Before undertaking the first data analysis stage, it was necessary to decide on a set of features for information transfer analysis. As noted, consonant confusion data can be analysed with various levels of phonological detail: simple total correct values can be computed, as in the majority of studies using consonant recognition in CI users. Alternatively, a tripartite division into voicing, place and manner can be used; this is the approach that has been used in all more detailed studies, as in 2.1.2. At a greater level of detail, Chomsky and Halle (1968) described a large number of binary phonological features; it would be possible to use all of these features to analyse CI confusion data. However, such a detailed phonological analysis would be

cumbersome when exploring a large number of independent variables and, moreover, it is important that data analysis methods have a clear rationale. It was clear that the three categories of voicing, place and manner needed to be included in phonological analysis, for the purposes of comparison with the existing literature and because of the fairly clear distinction in perceptual terms between these categories. However, there was also some justification for expanding on these three categories. As with some other studies, the “envelope” feature was included as this was based on perceptual abilities of CI users. It was hypothesised that this feature would be more robust than other features, and was arguably more purely “temporal” (e.g. effectively reliant on within-channel information) than the other features, e.g. even as compared with voicing (see the discussion in 2.6.4). It was also of interest to assess the perception of nasality, as this feature is similar to voicing in its reliance on low-frequency

structure. Finally, the fricative vs. non-fricative distinction was also used as a way of determining how well CI users can code noise information- given the long duration of the noise spectrum in fricatives the ability to resolve the noise in the time domain should not be a confounding variable. Moreover, this feature provided a larger reliance on spectral processing than other features apart from place.

Based on these choices, each confusion matrix yielded seven dependent variables: percentage total correct, and percentage information transmission for the consonant features voicing, place, manner, nasality, fricative and envelope. The following steps were taken to derive feature-specific information transmission values for all four experiments. Responses for each subject/test run generated by the Praat programme were tabulated and then converted to an Excel file. A macro transformed the data into a format usable for further analysis. Two further pieces of speech analysis software were used for consonant confusion analysis, namely FIX and SCORE, developed by the Department of Phonetics and Linguistics at University College London

(www.phon.ucl.ac.uk/resource/software.html). The SCORE programme combined a defined stimulus and response data for each subject in each listening condition and generated a confusion matrix. Table 3.1 showed a typical confusion matrix in which stimuli are along the y-axis and responses indicated along the x-axis. For each

confusion matrix, the FIX programme computed percent information transmission for the six features voicing, place, manner, fricative, nasality and envelope feature

according to the feature transmission matrix in table 3.1 (although feature matrices are normally presented with features on the y-axis, the large number of stimuli

necessitates the alternative presentation in this case). All percentage transmission values were computed from a single-iteration of SINFA analysis (see 2.1.1 for a discussion of this issue). Resulting total correct and information transmission values were entered into SPSS files. Subsequent data analysis was undertaken on the resulting feature transmission and total correct values.

voicing fricative nasal place manner envelope

b yes no no bil plo vpf

d yes no no alv plo vpf

g yes no no vel plo vpf

w yes no no bil con ng

j yes no no alv con ng

ɹ yes no no ret con ng

l yes no no alv con ng

v yes no no lad fri vpf

z yes yes no alv fri vpf

ʤ yes yes no ret aff vpf

m yes no yes bil nas ng

n yes no yes alv nas ng

p no no no bil plo vlp

t no no no alv plo vlp

k no no no vel plo vlp

f no yes no lad fri vlf

ɵ no yes no den fri vlf

s no yes no alv fri vlf

ʃ no yes no ret fri vlf

ʧ no yes no ret aff vlp

Table 3.1. Feature transmission matrix used for phonological feature analysis

The aim of inferential statistical analysis for each of the four experiments reported in subsequent chapters was to determine the effect of one or more independent variables, and their interactions, on the seven dependent measures derived from the confusion matrices. The independent variables were either categorical (as in experiment 2 or 3 and all variables apart from channel interaction in experiment 4) or had a small number of possible values (5 in experiment 1, 3 for channel interaction in experiment 4). Given that each listening condition generated a number of dependent variables, the appropriate statistical technique was considered to be multivariate analysis of variance (MANOVA).

For experiment 1, it was also important to test the hypothesis that voicing and manner would exceed place. This hypothesis required a direct comparison between feature transmission values, and therefore a single factor repeated measures ANOVA was used in this case, in which consonant feature was the only factor. However, this was the only use of this approach as the direct comparison between consonant features

was considered of less importance than the assessment of the effect of the various independent variables on transmission of specific features. As the different feature transmission values represented multiple dependent variables, it was deemed appropriate to use MANOVA rather than a series of separate ANOVAs on each feature (it should be noted that the latter approach would have increased the

possibility of type I errors). Because of the larger number of independent variables in experiments 2 and 4, the resulting number of degrees of freedom for many of the MANOVAs was low (1 or 2), which would also increase the possibility of type II errors. In general, the approach taken in various aspects of the design of the experimental work was to err on the side of minimising type I errors with a

consequent increase in the possibility of increasing type II errors. This meant that the interpretation of statistically significant results could be more conclusive than might be the case otherwise.

ANOVA and MANOVA, as with other parametric statistical tests, are based on the assumption that data were normally distributed. For each experiment this was considered by applying the Kolgomornov-Smirnoff (K-S) test to each variable. The K-S test assesses the hypothesis that the distribution of a variable deviates

significantly from the normal distribution. As noted in the relevant sections, the great majority of the variables in each of the 4 experiments were not found to be significant using this test, e.g. were consistent with a normal distribution. In a few cases (noted in relevant sections), distributions were skewed where the mean approached 100%, e.g. ceiling effects. However, it is considered that the F test used in MANOVA is robust to the problem of skewed distribution (Howell, 2003) (whereas it is not to outliers, a problem that did not occur) and, in any case, this only occurred for a small number of variables; consequently, it was assumed that MANOVA was appropriate from this point of view.

In document Acoustic models of consonant recognition in cochlear implant users (Page 113-117)