• No results found

6.1 Spectral dynamic feature extraction

6.1.2 Trajectory modelling

Each parameter trajectory, computed through the voiced region in the vowel centre of the natural speech recordings, was modelled using a polynomial. The polynomial coefficients were computed that best fit the parameter trajectories with respect to the mean squared error. The order of the polynomial to model the trajectory was found to be important. As illustrated in Fig. 6.3, a low order polynomial will not sufficiently capture the variation in the data and over-modelling with an excessively high order polynomial will introduce variations in the trajectory that do not relate to the underlying data. Polynomials of order 4 were found to produce consistently good trajectory models. The number of data points included in the model was also found to be important. Empirically it was found that 9 data points gave an effective polynomial approximation. This corresponds with employing 9 pitch periods in the trajectory model. Fitting too many data points, covering a longer time span, was found to have a negative impact on results due to over modelling in regions distant from the join. Fitting too few points did not effectively represent the spectral changes about the join. The polynomial models computed for data points over 5, 9 and 15 pitch periods are illustrated in Fig. 6.4, each of which produce a different model and as such a different estimate of the spectral dynamic behaviour at the unit boundary. In particular note the slope difference at the centre data point where the derivatives are to be computed.

The first and second derivatives with respect to time were computed at the unit bound- aries from the estimated polynomials and were employed to represent the spectral dynamics of the corresponding units. As a baseline for comparison, delta coefficients were also com- puted. The delta coefficients were computed using pitch synchronous analysis. A window length of one pitch period was employed to extract the features in the two pitch periods preceding the unit edge. The delta coefficients were generated by computing the difference between the feature vectors. Pitch synchronous feature extraction with a window length

460 470 480 490 500 510 520 530 540 Time [s] Frequency [Hz]

Actual Trajectory Points Polynomial order 2 Polynomial order 4 Polynomial order 10 Overfitting with high order polynomial

Figure 6.3: Plot of spectral feature (LSF) varying with respect to time, with the estimated polynomial models to fit the data. This illustrates the impact of the polynomial order in fitting a polynomial model to the data points.

420 440 460 480 500 520 540 Time [s] Frequency [Hz]

Actual Trajectory Points Polynomial − 5 points Polynomial − 9 points Polynomial − 15 points Data point at which

slope is to be estimated

Figure 6.4: Plot of spectral feature (LSF) varying with respect to time, with the estimated polynomial models to fit the data. This illustrates the impact of the number of data points used to compute the polynomial model.

1000 2000 3000 4000 Frequency (Hz) Polynomial Estimate Actual Trajectory −1 0 1 Frequency [Hz] Time [s] 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0 2000 4000 6000 8000

Figure 6.5: Illustrating the LSF trajectories in the vowel centre of the word ‘went’; the speech waveform, the computed LSF parameters, the polynomial fit and the spectrogram are shown.

of one pitch period was found to be the optimum strategy for the task of detecting dis- continuities with static spectral measures on the test database. Note that no attempt was made to select parameters to improve the performance of the delta coefficients.

6.2

Results

Each of the spectral dynamic measures were tested for the task of detecting discontinuities in the test database. In this section the results are presented for each candidate measure of spectral dynamics. The results are represented by the AUC value computed for each measure. The results for the spectral dynamic measures are presented in Table 6.1. The dynamic measures evaluated from the polynomial based derivatives are denoted by dx/dt and d2x/dt2, the static measures by x and the delta features by ∆x. The Euclidean distance was used to quantify the degree of mismatch between feature vectors. The choice

of metric did not significantly influence the results. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P FA PH MFCC MFCC − Delta Coefficients MFCC − 1st Derivative MFCC − 2nd Derivative

Figure 6.6: Comparing the ROC curves for MFCC measures from standard features, delta features and from the 1st and 2nd derivative trajectory based measures .

Features x ∆x dx/dt d2x/dt2

MFCC 0.77458 0.52688 0.70860 0.65568

LSF 0.74238 0.50997 0.70801 0.67225

Table 6.1: Comparison of results for each candidate measure of spectral dynamics, the table entries indicate the AUC value for each of the measures.

The highest AUC value for spectral dynamic measures was obtained from the first derivative of the MFCC trajectories with an AUC value of 0.70860. The AUC values ob- tained for the delta coefficients was just above the value of pure chance, which corresponds with an AUC value of 0.5. The low detection rates for delta coefficients may be due to sen- sitivity to noise in numerical differentiation or perhaps that the measure reflects spectral change on a shorter time scale, due to shorter window lengths, which may not be relevant for the detection of discontinuities. The spectral dynamic measures based on the second derivatives were found to correlate with human perception, the results of which are also contained in Table 6.1. The results suggest that dynamic measures correlate with human perceptual results provided the features are appropriately extracted. The difference in us- ing either LSFs or MFCCs for the dynamic measures had little impact on the results and

they are statistically equivalent for each type of dynamic measure with respect to the stan- dard error for the AUC values presented. The same patterns emerge for both parameter sets. LSFs performed better than MFCCs for the measures based on the second derivative although they are still statistically equivalent. ROC curves comparing the performance of the dynamic measures and standard MFCCs are illustrated in Fig. 6.6.

It was found that adopting a longer window yielded better results with the proposed spectral dynamic measures, this is counter to static measures for which a single pitch period was found to yield the best results. Longer window lengths gave rise to less variation between successive spectral estimates and in turn gave rise to smoother trajectories, which were found to yield spectral dynamic measures with higher AUC values. It is significant to note that different feature extraction parameters are required for static and dynamic measures.