4.1 Summary
This chapter will begin by presenting the simulation results, and the data analysis will follow. Simulation or analysis checks will also be included to lend credence to the accuracy of the procedure and final results. Detailing this process is important because there is an unexpected result that initially appears to be incorrect. The chapter will conclude with a brief look at the templates created from the known data, and present the results of classifications performed with modified templates that increase the probability of correct classification. For all simulations, the poles of the AR(p) processes will be placed inside the unit circle for stability.
4.2 Simulation Results
The initial simulation was kept very simple for two reasons. The first reason was to be able to validate the algorithm by checking expected results It is expected that the the Itakura-Saito would outperform the RPLF in the absence of noise or channel distortions. This is because the simulation data is independent and identically distributed WSS Gaussian noise being filtered by a linear all-pole filter. Therefore, the output will be Gaussian. Since
the Itakura-Saito is derived from an asymptotic equivalent of the likelihood function for a Gaussian random process it should outperform the RPLF. It is also expected that when the poles of the two possible filters are right on top of each other, the probability of correct classification is one half, Pcc = 0.5.
This is expected because then both processes are generated from exactly the same filter, so classification is a “50-50 shot”. The second reason the initial simulation was kept simple is so there are not any assumptions made that may affect the results, whether known or unknown. Simple simulations utilizing WSS Gaussian noise and a simple filter ensures the exclusion of anything unknown in the data.
The first simulation was performed using the procedure presented in Figure 3.1. Figure 4.1a shows the PSDs used in the test statistics for this simulation with Figure 4.1b an example periodogram for the data to be classified. Figure 4.2 shows the output of the simulation with the number of samples equal to 250. The frequency f0 is the frequency of the poles for the second filter, the filter where the poles are fixed.
(a) f0 = 0.25, 0.3, radius = 0.9 (b) f0= 0.3,, radius = 0.9
Figure 4.1: Power spectral densities for initial simulations, (4.1a) for the all-pole filters, (4.1b) for one instance of the AR(p) process.
Figure 4.2: Simulation result AR(2), f0 = 0.3, radius = 0.9, N= 250
This result are exactly as expected, the Itakura-Saito outperformed the RPLF.
Also, the probability of correct classification is one half when the poles are
exactly the same. The RPLF did very well, giving hope that it will outperform the Itakura-Saito when noise or channel distortion is added. However, it is difficult to differentiate between the two classifiers because there is a 100%
probability of correct classification by the time the two peaks are separated by a frequency of 0.03. In order to get results that may better distinguish between thee classifiers the number of samples was decreased to 50 and the simulation was run again, results presented in Figure 4.3.
Figure 4.3: Simulation result AR(2), f0 = 0.3, radius = 0.9, N= 50
Restricting the number of samples to fifty yielded unexpected results: in the top right corner of the figure, when the pole crosses over around 0.34 the RPLF starts to outperform the Itakura-Saito. The next step was to determine if the algorithm or the data was responsible for the unexpected results. The simulation
was repeated and included the exact Gaussian classifier, equation (2.4), since the expected result of the exact Gaussian classifier is the true upper bound.
This simulation is presented in Figure 4.4. This result is as expected, the exact Gaussian outperforms both the RPLF and the Itakura-Saito, indicating the data is generated correctly and the algorithm is functioning correctly.
Figure 4.4: Simulation result AR(2), f0 = 0.3, radius = 0.9, N= 50
To further analyze the problem, if the number of samples was increased would the Itakura-Saito approach the exact Gaussian? For the next simulation the number of samples was set to 500 and results are presented in Figure 4.5.
Figure 4.5: Simulation result AR(2), f0 = 0.3, radius = 0.9, N= 500
As the number of samples increases the Itakura-Saito does approach the exact Gaussian. This result also indicates the algorithm and the data are correct, so why does the RPLF outperform the Itakura-Saito? To investigate this further, a simulation using an AR(1) process was completed since it is even simpler then the AR(2) process employed so far. An AR(1) process has a linear filter with a single pole where,
x[n] = u[n] − a[1]x[n − 1];
H(f ) = 1
1 + a[1]e−j2πf Px(f ) = σu2
|1 + a[1]e−j2πf|2
(4.1)
The AR(1) filter has a single pole and for a real process, that pole must be on the real axis. In this simulation, the first filter had a fixed pole location at a[1] = −0.8 while the pole from the second filter was swept from −1 < a[1] < 1.
The output of the simulation is shown in Figure 4.6.
Figure 4.6: Simulation result AR(1), a[1] = 0.8, radius = 0.9, N= 50
It is interesting to note how the Itakura-Saito drops off dramatically as the pole gets closer the unit circle. Due to the simplicity of an AR(1) process, it is possible to go back to first principles and derive the Itakura-Saito and the exact Gaussian based only on pole location, a[1], and the number of samples, N . The difference and a divergence are calculated using this basic informa-tion. The derivation for the divergence is in appendix (A), along with results
from [14], [12]. The derivation starts with the log-pdfs version of the exact
and the asymptotic Gaussian,
ln(pA(xxx)) = −N
The PDFs were found in the desired form and are presented below. First the exact Gaussian,
Then the asymptotic Gaussian
ln(pA(xxx)) = − N
And the difference,
The Kullback-Leibler divergence is the divergence between two PDFs. For this divergence we have used the exact Gaussian and the asymptotic Gaussian as PDFs. While the asymptotic Gaussian is not a true PDF, as it does not integrate to one, the Kullback-Leibler divergence may provide some useful insight. The derivation of the result is presented in appendix (A.2). The Kullback-Leibler divergence takes the form,
∞ asymptotic Gaussian. After taking the exponential of (4.3) and (4.2) and sub-stituting them into (4.4) the result is,
= N
2 ln(1 − a2[1]) + a2[1]
(1 − a2[1])
Examining this result as |a[1]| −→ 1 the second term in the equation takes over and the divergence goes to infinity. Figure 4.7 plots the divergence for different number of samples N as |a[1]| goes from 0 −→ 1.
Figure 4.7: Divergence for the difference between asymptotic Gaussian and exact Gaussian
As discussed previously the Kullback-Leibler divergence can not be negative, however, the functions in the above figure go negative. This is explained by the fact that of the asymptotic Gaussian is not really a PDF. In order to be a true PDF, the asymptotic Gaussian would need to be normalized. Despite this limitation, the above figure shows that as the pole radius approaches the unit circle, the distance between the exact Gaussian and asymptotic Gaussian increases, especially for smaller sample sizes. This effect holds true for an AR(2) process, as presented in the following figures. Figures 4.8 - 4.11 number of samples increases, while holding the radius constant. While in Figures 4.12 - 4.14 the radius of the zeros increases while holding the number of samples constant.
Figure 4.8: Simulation results, N = 50, f0 = 0.1, radius = 0.7.
Figure 4.9: Simulation results, N = 100, f0 = 0.1, radius = 0.7.
Figure 4.10: Simulation results, N = 250, f0 = 0.1, radius = 0.7.
Figure 4.11: Simulation results, N = 500, f0 = 0.1, radius = 0.7.
Figure 4.12: Simulation results, N = 100, f0 = 0.1, radius = 0.5.
Figure 4.13: Simulation results, N = 100, f0 = 0.1, radius = 0.7.
Figure 4.14: Simulation results, N = 100, f0 = 0.1, radius = 0.9.
In order to get reliable results from the simulations utilizing an AR(2) process, the poles were kept in a location where their position did not have an adverse effect on the result. In the initial simulations, placement had a significant effect on the Itakura-Saito but the RPLF was much less affected. This is the first indication of the RPLF being more robust.
Using the procedure in Figure 3.1, many different simulations were performed utilizing different frequencies, radii and number of samples. The Itakura-Saito outperformed the RPLF when the poles were placed away from the unit circle and away from the real axis, f0 = 0 and f0 = .5. Table 4.1 summarizes the results. In Table 4.1 the frequency column is the frequency of the the fixed
pole location filter associated with the second AR(p) processes. Both of the processes had the same fixed radii, presented in the radius column.
Simulation Frequency Radius Summary
1 0.1 0.9 The Itakura-Saito performed
much worse then expected, espe-cially at N= 50 and 100 and did not perform as well as the RPLF.
2 0.1 0.7 The Itakura-Saito unperformed
compared the RPLF at N = 50 and 100.
3 0.1 0.5 The Itakura-Saito outperformed the RPLF, and got very close to the Exact Gaussian at N = 250 and 500.
4 0.2 0.9 The Itakura-Saito outperformed the RPLF in all cases except a very little at N=50. Both the Itakura-Saito and the RPLF came close to the exact Gaussian
5 0.3 0.9 As originally expected the
Itakura-Saito outperformed the RPLF and came very close to the exact Gaussian.
6 0.3 0.5 All methods performed almost
exactly the same.
7 0.4 0.9 Almost identical to simulation 1 except for a mirror image.
Table 4.1: Summary of simulations, for a AR(2) processes with no added noise
The next simulations added noise to the AR(p) process using the procedure in Figure 3.3. Table 4.2 and Figure 4.15 summarize the results.
RPLF vs Itakura-Saito with additive noise
figure f0 f1 radius realizations ≈dB ≈ DPCC ≈ DPCC crossover low SNR high SNR
4.15a 0.15 0.20 0.45 20000 12 -.05 .02
4.15b 0.28 0.30 0.85 20000 12 -.15 .05
4.15c 0.25 0.30 0.75 20000 8 -.1 .025
4.15d 0.25 0.30 0.75 100000 8 -.1 .025
Table 4.2: Summary of results for simulations
In Table 4.2 the Difference in Probability of Correct Classification (DPCC) is the Itakura-Saito probability of correct classification minus the RPLF probability of correct classification, pccIS − pccRP LF. This was calculated for the beginning and end of each plot. The first column refers to the plots in Figure 4.15.
(a) (b)
(c) (d)
Figure 4.15: Probability of correct classification vs SNR
The Itakura-Saito performed slightly better in environments with a high SNR.
The RPLF was more robust in low SNR environments.
The last set of simulations followed the procedure in Figure 3.4, adding multi-path to the AR(2) processes. Pole-zero plots and plots of the PSD for one of the ARM A(2) processes was checked to validate the algorithms.
(a) PSD for zero radius = 0.0 (b) Pole zero plot for zero radius = 0.0
(c) PSD for zero radius = 0.4 (d) Pole zero plot for zero radius = 0.4
(e) PSD for zero radius = 0.75 (f) Pole zero plot for zero radius = 0.75 Figure 4.16: PSD and pole zero plots for AR(2) ‘True AR PSD’ and and the ARM A(2) ‘Averaged ARM A PSD’s’ processes, frequency = 0.3, 10000 realiza-tions
Upon analyzing the periodograms in Figure 4.16, and the effects of the added
zeros, the algorithm appeared to be correct. Once the algorithm was verified, the simulation was performed, results are presented in Figure 4.17.
Figure 4.17: Probability of Correct Classification for an AR(2) process with added multipath.
As expected, when the zeros are near the origin and have little effect on the periodogram the Itakura-Saito outperformed the RPLF. As the “multipath”
worsened the RPLF performed better. The simulation gave the expected result of pcc = 0.5 with a pole-zero cancellation.
4.3 Data Results
The results of the simulations point toward the RPLF having greater robustness to certain spectral deformations than the Itakura-Saito. The next step was to perform the analysis with the ISOLET database to see if real data yielded the
same results.
A baseline analysis was completed with the raw data, as described in chapter 3. Results for this baseline analysis are presented in Tables 4.3 and 4.4.
Classified RPLF, pcc(L) = .8
Table 4.3: Confusion matrix for the RPLF test statistic. AR(p) , p = 12
Classified Itakura-Saito, pcc(L) = .74
XXXX
Table 4.4: Confusion matrix for the Itakura-Saito test statistic. AR(p) , p = 12
The probability of correct classification in Tables 4.3 and 4.4 is,
pcc(L) = P number of results where i = j P number of results
There are the five vowels being classified in this analysis, resulting in the tem-plate letters (Li) and observed letters (Lj) where 0 ≤ i, j ≤ 5.
Even in this baseline analysis the RPLF has a better probability of classification
than the Itakura-Saito. It should be noted, however that several factors could be responsible for this difference. Without a test of significance, it is not clear whether the observed difference between classifiers is due to individual sample variation versus a true difference in performance between the RPLF and the Itakura-Saito. The letter names for “I” and “U” are diphthongs and therefore are non-stationary. The exact location of the transition is unknown, but it is possible it is in sampled window. The RPLF outperformed the Ikaura-Saito classifying both of these letters. This may be an indication of the RPLF being a robust classifier to non-stationary data, further research needs to be done to test this hypothesis.
Other reasons for superior classification performance are, pole location, zero placement or noise. The SNR is high, 31.5dB, which means it is unlikely that noise is causing the difference. The letter “A” performed the worst of all the letters. There may be some zeros that are not accounted for in the model: a pole-zero plot of the AR parameters was completed to investigate this further and is presented in Figure 4.18.
Figure 4.18: Pole Zero plot for the letter “A”.
There are two poles very close to the unit circle at around f ≈ 0.025; however, the next pole did not occur until f ≈ 0.15. Was the model trying to account for a zero in between these two zeros? Upon examination of the other pole-zero plots, the letter “E” appeared to display the same attribute but did not suffer the same decrease in performance. The reason for the observed decrease in per-formance of the Itakura-Saito is unknown. Future work, beyond the scope of this thesis, would be required to fully investigate this.
The next analysis repeated the procedure but added independent white Gaus-sian noise. With the results presented in Tables 4.5 and 4.5.
Classified RPLF, pcc(L) = .77
Table 4.5: Confusion matrix for the RPLF test statistic with added noise. AR(p) , p=12, SNR =9dB
Classified Itakura-Saito, pcc(L) = .5
XXXX
Table 4.6: Confusion matrix for the Itakura-Saito test statistic with added noise.
AR(p) , p=12, SNR =9dB
The RPLF does a much better job of classifying when the signal is embed-ded in white Gaussian noise. The difference between the probability of cor-rect classification for no noise vs noise is represented as, pcc(L) − pcc−n(n). In our data, the difference between the RPLF with noise and without noise was 0.03 (0.80 − 0.77). For the Itakura-Saito it was 0.24 (0.74 − 0.50). Performance with added noise decreased only slightly for the RPLF, approximately 3%, but diminished by approximately 30% for the Itakura-Saito. This analysis was for only a single SNR, a later analysis would include a range of SNR.
The next analysis simulated multipath by adding a set of zeros at a fixed fre-quency and radius. With the results presented in Tables 4.7 and 4.7.
Classified RPLF, pcc(L) = .76
Table 4.7: Confusion matrix for the RPLF test statistic. AR(p) , p=12, with zero.
Classified Itakura-Saito, pcc(L) = .57
XXXX
Table 4.8: Confusion matrix for the AG test statistic. AR(p) , p=12, with zero.
The Itakura-Saito performance decreased by approximately 23% in this analysis, slightly better than in the last analysis. However it was vastly outperformed by the RPLF, for which diminished performance was nearly the same, decreasing only about 5%. The difference (pcc(L) − pcc−z(n)) for the RPLF was 0.04 (0.8 − 0.76) and for the Itakura-Saito was, 0.17 (0.74 − 0.57)
Figure 4.19 shows the results of the next analysis, the cepstrum classifier was added and included a range of SNR levels.
Figure 4.19: Probability of correct classification for RPLF, Itakura-Saito and Cepstrum classifiers, with a range of added noise.
It was expected that the cepstrum classifier would have a better probability of correct classification then the Itakura-Saito with speech data. With other types of data the Itakura-Saito may perform better. The performance of the RPLF exceeded both the Itakura-Saito and the cepstrum. The simulation looks to have run correctly, since the lowest probability of correct classification is 0.2. The 0.2 is the lowest probability of correct classification because there are five different letters and they each have an equal probability of being selected,
therefore 0.2.
The next couple of analyses added zeros, and varied either the radius or the frequency. The results of the simulation with the change in radius are presented first in Figure 4.20.
Figure 4.20: Probability of correct classification for RPLF, Itakura-Saito and Cepstrum classifiers, versus the added zero radius.
Figure 4.21 presents the results for the analysis were the frequency of the zeros was changed.
Figure 4.21: Probability of correct classification for RPLF, Itakura-Saito and Cepstrum classifiers versus the added zero frequency.
In Figure 4.20 the RPLF and the cepstrum classifiers performed about equally, however, they both outperform the Itakura-Saito. In Figure 4.21 the RPLF performed as well or better then the cepstrum, except at frequencies around 0.25 and 0.5. Again, they both outperform the Itakura-Saito.
4.4 Template modifications
In order to increase the probability of correct classification additional templates were examined. The conditional PDF of x[n] for large data records is shown in
[14], for an AR(p) process
It was also shown in [14] the maximum likelihood estimate MLE for the a[k]
coefficients are found using the covariance method. If it is assumed that we have M independent AR(p) process data, each with the same number of AR(p) parameters, the likelihood function is the product of the individual likelihood functions.
where n is the number of samples for each utterance, m is the number of utter-ances, and p is the number of a[k] coefficients. The MLE for the a[k] coefficients of many AR(p) processes is found using the covariance method, which averages the M covariance matrices and covariance vectors. Then from the averaged co-variance matrices and coco-variance vectors the a[k] coefficients can be found. The result of this analysis utilizing these templates is presented is in Figure 4.22.
Figure 4.22: Probability of correct classification for RPLF, Itakura-Saito and Cepstrum classifiers with added noise, solving a[k] using covariance method, M
= 240.
There was a minimal increase in probability of correct classification when using the covriance method, Figures 4.19 and 4.22.
Next the template data was divided into logical subgroups. Since the gender data is available, this is a natural way to subdivide the groups. Sub-grouping this way resulted in ten calculations for each classifier. The classifier took the maximum value from the ten calculations. For example for the RPLF, when
PT male(f ) is the normalized PSD male template for the letters.
and when PT f emale(f ) is the normalized power spectral density female template for the letters and 1 ≤ j ≤ 10.
and ¯I(f ) is the normalized periodogram of the observed data. The goal is that, for the AR(p) coefficients, the means will converge to the true value for each gender and the variances will decrease. Tables 4.9 and 4.10 present the means and variances for the original estimates, and the tables with the gender specified AR(p) coefficients follow in Tables 4.11 - 4.14.
Estimated mean for letter I, p = 12
AR(p) − 1 AR(p) − 2 AR(p) − 3 AR(p) − 4 AR(p) − 5 AR(p) − 6
-1.3615 0.5923 0.0320 0.0105 0.00086 0.0508
AR(p) − 7 AR(p) − 8 AR(p) − 9 AR(p)−10 AR(p)−11 AR(p)−12
0.0134 0.0427 -0.0087 0.0378 -0.0756 0.1191
Table 4.9: Estimated means of AR(12) coefficients for the letter I
Estimated variance for letter I, p = 12
AR(p) − 1 AR(p) − 2 AR(p) − 3 AR(p) − 4 AR(p) − 5 AR(p) − 6
0.0763 0.2658 0.1417 0.0975 0.0541 0.0610
AR(p) − 7 AR(p) − 8 AR(p) − 9 AR(p)−10 AR(p)−11 AR(p)−12
0.0400 0.0282 0.0226 0.0257 0.0308 0.0124
Table 4.10: Estimated variances of AR(12) coefficients for the letter I
Estimated mean for letter I, female voice, p = 12
AR(12)−1 AR(p) − 2 AR(p) − 3 AR(p) − 4 AR(p) − 5 AR(p) − 6
-1.3100 0.5750 0.0072 0.0613 0.0208 0.00017
AR(p) − 7 AR(p) − 8 AR(p) − 9 AR(p)−10 AR(p)−11 AR(p)−12
0.0054 0.0820 -0.0223 0.0303 -0.0208 0.0962
Table 4.11: estimated means of AR(12) coefficients for the letter I, spoken by a female
Estimated variance for letter I, female voice, p = 12
AR(p) − 1 AR(p) − 2 AR(p) − 3 AR(p) − 4 AR(p) − 5 AR(p) − 6
0.0631 0.2282 0.1149 0.0530 0.0356 0.0391
AR(p) − 7 AR(p) − 8 AR(p) − 9 AR(p)−10 AR(p)−11 AR(p)−12
0.0284 0.0206 0.0176 0.0207 0.0218 0.0133
Table 4.12: Estimated variances of AR(12) coefficients for the letter I, spoken by a female
Estimated mean for letter I, male voice, p = 12
AR(p) − 1 AR(p) − 2 AR(p) − 3 AR(p) − 4 AR(p) − 5 AR(p) − 6
-1.4086 0.5972 0.0794 -0.0465 -0.0211 0.0934
AR(p) − 7 AR(p) − 8 AR(p) − 9 AR(p)−10 AR(p)−11 AR(p)−12
0.0296 0.0030 0.0059 0.0420 -0.1262 0.1407
Table 4.13: Estimated means of AR(12) coefficients for the letter I, spoken by a male
Estimated variance for letter I, male voice, p = 12
AR(p) − 1 AR(p) − 2 AR(p) − 3 AR(p) − 4 AR(p) − 5 AR(p) − 6
0.0710 0.2435 0.1065 0.0966 0.0471 0.0606
AR(p) − 7 AR(p) − 8 AR(p) − 9 AR(p)−10 AR(p)−11 AR(p)−12
0.0345 0.0253 0.0187 0.0220 0.0257 0.0078
Table 4.14: Estimated Variances of AR(12) coefficients for the letter I, spoken by a male
Figure 4.23 presents the results with Table 4.15 summarizing the results
for all classifiers. The probability of correct classification in Table 4.15 is the
for all classifiers. The probability of correct classification in Table 4.15 is the