Keating (2010) - New methods in the analysis of pitch range

CHAPTER 2: ACOUSTIC ANALYSES OF PITCH RANGE

2.4 New methods in the analysis of pitch range

2.4.3 Keating (2010)

The model proposed by Keating and her collaborators, Jason Bishop and Grace Kuo, is based on the assumption that voice quality is a cue to location in F0 range.

‘If voice quality is useful for recovering the location of an F0 in an individual speaker’s own range, it means there is a sufficiently salient relationship between a value on acoustic parameter X and a speaker’s location in her own individual F0 range, such that value Y on acoustic parameter X indicates location Z in range’

(Bishop and Keating, 2010: 115).

Voice quality can be acoustically measured from a spectrum where relative amplitude of harmonics in the source has to be calculated. In order to do this, one needs to get estimates of the formant frequencies and correct the harmonic amplitudes. Harmonics are numbered from the first harmonic (H1), which is equal to F0, to the second harmonic (H2), the third harmonic (H3) and so on. Harmonics nearest the formants are called A1, A2, A3 and so forth. Bishop and Keating (2010) make a direct and indirect use of voice quality to determine a speaker’s specific F0 range.

The direct method consists on the direct correlation between F0 and H1*-H2*3. In a study by Iseli et al. (2007), mentioned also by Bishop and Keating (2010), it is shown that to a low value of H1*-H2* corresponds a low F0 in the sample. Thus, a correlation can be established between F0 and H1*-H2* values. This also implies that any parameter of voice quality ‘varies along a speaker’s F0 range, and it does so more reliably than does with F0 across speakers’ (Bishop and Keating, 2010: 115).

The indirect method is based on the idea that voice quality is a cue for listeners, who compare the F0 range of a speaker to the reference level they are accustomed to. It is

3_{Formants boost harmonic amplitudes so that, to obtain reliable values, it is necessary to correct the} harmonic amplitudes. Harmonic amplitudes are corrected for formant frequency and estimated bandwidth, as indicated by an asterisk, e.g. H1*, A1* etc. (Bishop and Keating, 2010).

probable that subjects make decisions about F0 range ‘not directly by way of indicating the location in a given speaker’s own range per se, but more indirectly by helping to identify the sex of the speakers’ (Bishop and Keating, 2010: 116). For example, H1-H2 values are reported to be higher for females than males (Henton and Bladen, 1985; Klatt and Klatt, 1990). Thus, the identification of male vs. female voices is based on a series of factors including F0 range, formant frequencies and other voice properties (Kreiman and Sidtis, 2011). Bishop and Keating (2012) aim at estimating listener’s skills in locating F0 changes with a speaker-specific F0 range and examining what factors contribute to identify speaker sex. In order to do this, they calculate F0; cepstral peak prominence (CPP); difference of amplitude between the first and second harmonics, also called open quotient – OQ (H1*-H*2); first, second and third formants (F1, F2 and F3); spectral tilt (H*1-A1* and H*1-A3*); a measure for high-pitched voice quality, characteristic of falsetto (H2*-H4*). In the perceptual experiment, listeners were asked to locate a token in a speaker’s individual range. The results were obtained by comparing the following parameters: (a) listener’s language; (b) speaker’s sex; (c) F0 of the token; (d) measures of voice quality. It was found that ‘the greatest predictor of listeners’ judgment of F0 location was F0 itself’ and ‘listeners have separate expectations about F0 ranges for each of the sexes’ (Bishop and Keating, 2010: 137). Being F0 the primary predictor of F0 range, voice quality measures were found to be not as significant as previously thought, with the exception of the H2*-H4* parameter considered to be crucial to distinguish male from female voices.

The approach presented in Keating and Kuo (2010) is quite different from the method used in Bishop and Keating (2010). In their comparison of F0 in English and Mandarin, Keating and Kuo (2010) neglected the analysis of voice quality measures in favor of a very detailed study of F0 properties. They based their experiment on two different methods: (1) the cepstral plus manual method and (2) the semi-automated STRAIGHT method.

The cepstral plus manual method used the cepstral pitchtracker in the PCQuirer/Pitchworks program. Since this program occasionally makes some mistakes in correspondence with unvoiced sounds, creaky intervals or missing values, a substantial revision of pitch tracking errors was needed. Pitch setting parameters had to be adjusted and some F0 values were calculated directly from the waveform using the formula for

method was found to be reliable but definitely time-consuming. For this reason, only a small selection of the corpus materials was analyzed with the cepstral plus manual method.

The semi-automated STRAIGHT method focuses on an algorithm used in a new application for voice analysis called VoiceSauce (Shue et al., 2009, 2011). VoiceSauce has been developed by a group of linguistic and electrical engineering researchers at University of California, Los Angeles. This program gives automated voice measurements over time from audio recordings and computes a number of voice measures4 including automatic corrections for formant frequencies and bandwidths. In particular, VoiceSauce allows running entirely automatic measures of F0 values at 1 ms intervals. The algorithm and specific corrections incorporated in VoiceSauce permits to minimize pitch tracking errors since STRAIGTH finds ‘very low F0 values directly from the waveform’ (Keating and Kuo, 2010).

In the experiment, some measures reviewed by Baken and Orlikoff (2000) were adopted. Also the most extreme F0 values for each speaker were included. Table 1 lists all the values used in the study by Keating and Kuo (2010).

Measures computed by VoiceSauce include: F0 from STRAIGHT (Kawahara et al., 1999), Snack Sound Toolkit (Sjölander, 2004) and Praat (Boersma and Weenink, 2008) algorithms; harmonics measures both corrected (*) and uncorrected, H1-H2 (*), H1-A1 (*), H1-A2 (*), H1-A3 (*), H2-H4 (*); formants and bandwidths (F1, F2, F3, F4, B1, B2, B3, B4); energy; subharmonic to harmonic ratio, cepstral peak prominence; harmonic to noise ratios.

Table 6. Inventory of F0 measures calculated in Keating and Kuo’s STRAIGHT method (2010: 172).

The results of the Keating and Kuo (2010) study, obtained with the semi-automated STRAIGHT method, showed a significant size effect for sex in English vs. Mandarin. By contrast, no significant effect was found with the cepstral plus manual method.

In document The Pitch Range of Italians and Americans. A Comparative Study (Page 83-86)