6.6 Combined feature set
6.6.2 Feature correlation
Three methods of constructing features were implemented in this chapter, in- dependently of each other. This resulted in the final combined feature set consisting of the values derived through all three methods. This independence in approach does not, however, result in three sets of values that are statisti- cally independent of each other. Some correlation between certain features is expected, considering the overlapping goals of the approaches in reducing the dimensionality of the acoustic waveforms. This subsection aims to explore the correlation between variables.
Ideally, a scatterplot matrix comparing all the feature variables against each other is constructed to indicate visually the measure of similarity between
6.6. COMBINED FEATURE SET 91
features. In smaller feature sets this is the typical approach, and it provides intuitive understanding of the potential correlation between variables. In this research there are 33 feature variables, as can be seen in Table 6.2. This large number of features would result in scatterplots shown in an array of 33 rows and 33 columns - this is unwieldy for the sake of illustration and intuitive understanding.
An alternative approach is to derive the statistical correlation coefficients of each feature for comparison against each other. This will result in a covariance indicator that shows how two features change together. The covariance indi- cator is intended to show whether a linear relationship can be found between any two features. In statistics, a standard approach is to use the Pearson’s correlation coefficient matrix to determine this relationship (Lee Rodgers and Nicewander, 1988).
The equation for Pearson’s correlation coefficient rxy between any two vari-
able distributions, or commonly just called the correlation coefficient, is as follows:
rxy =
P xiyi− nµxµy
(n − 1) σxσy (6.6.3)
where µxand µy are the means of features X and Y, σx and σy are the stan-
dard deviations of X and Y, and n is the amount of samples in the individual variable distributions.
Deriving the correlation coefficient for the feature set results in a matrix of values with dimensions of 33 rows by 33 columns. Each of the points on the matrix corresponds to the correlation of the two features identifying the corresponding row and column. Applying the colour coding to the values in the range of [-1, 1] helps in the quick identification of values by visual inspection. The resulting plot is given in Figure 6.14.
The diagonal line of Figure 6.14 shows the full correlation of each feature with itself, so is discarded for this analysis. The first strong correlations that can be seen are those of Sc with Sw, and also Sa with Sf. A higher mean
in the distribution corresponds to a higher variance, which indicates that the spread of spectrum of the acoustic signal becomes less detailed in the higher frequencies. The peakedness also correlates strongly with the skewness of the signal; these are properties that are influenced by the higher definition of the spectrum in the lower frequencies. This observation relates to discussion around the perception of pitch, where lower frequencies are more iteratively distinguishable than higher frequencies.
A strong correlation can be seen between each neighbouring group of fre- quency bins. The existence of energy in the frequency ranges is therefore not strongly differentiated by individual bins. This indicates that there are tran- sient values between the bins, which will assist in providing additional detail regarding the frequency distribution as described by these bins.
6.7. CONCLUSION 93
The strongest correlation can be seen between the mean of the signal, Sc, with the first-order MFCC value, MF CC1. This correlation approaches
negative unity, which indicates that the higher the mean of the frequency of the signal, the lower the value of the first MFCC. This is a surprising correlation, and provides additional insight into the use of the MFCC approach to describe the higher-order characteristics of signals.
The correlation matrix shows that no individual feature is redundant to the feature set. Each feature describes a different aspect of the acoustic signal. This justifies the use of this feature set for the training of models, and the additional information provided by each feature variable will ideally increase the accuracy of the model’s construction in the next chapter.
6.7
Conclusion
In this chapter the acoustic waveforms collected from the experimental rig and from operational environments were transformed into a descriptive feature set. This feature set describes important and distinguishable characteristics of the acoustic waveforms in a vector of 33 values. This compares very favourably with the original acoustic waveform length of 4 096 data points: the feature set effectively reduces the dimensionality of the signal. This reduced dimension- ality aids in the construction, training and testing of potential classification models.
In addition, the feature set values were standardised into consistent data ranges. This is essential for the operation of most models.
The successful preparation of a feature set is the final step before the train- ing and testing of models. The intended use of the feature set will be shown in the next chapter.
Chapter 7
Classification Models Evaluated
7.1
Introduction
This chapter describes the choice of the appropriate classification model for use on the ESD. To paraphrase the No Free Lunch Theorem, there is no sin- gle ‘best’ model, and each problem needs a tailored classification system to describe it best (Wolpert and Macready, 1997). Therefore, the appropriate model for this problem needs to be found from the available pool of classifica- tion models that exist in practice. This chapter aims to present a classification system that can be justified for use on the ESD.
The approach used in order to construct and evaluate models against each other is described first. The discussion focuses on the areas of feature resam- pling, the training methodology, and the performance measure used for scoring models.
The second part of the chapter presents the results from the model evalua- tions. It describes the results of the testing of classification models grouped as a common modality, after which the best result from each group is compared with each other. Top performing approaches are then identified and further optimised until a suitable classification system is found.
Finally, the chapter describes the implementation details of the chosen classification model on the ESD platform.