Chapter 4 Materials and methods
4.6 Modelling
4.6.1 K-nearest neighbours
In K-nearest neighbour (K-NN) classification, given a set of training data points with known labels, a new (test) data point is assigned the majority rule label of its πΎπ closest neighbours in the feature space. In this work the numbers of nearest neighbours considered were 1 β€ πΎπ β€ 11, πΎπβ β.
4.6.2 Discriminant analysis
Discriminant analysis (DA) finds a set of weights such that the linear combinations of the training data vectors and weights result in a maximal separation between the classes. The data are then classified according to the maximum a-posteriori rule. In this work the use of linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) were considered.
4.7 Performance evaluation
As discussed in section 4.3.3, the primary performance measure used to evaluate the performance of each feature extraction method and classifier combination was the error rates obtained by using the trained model to classify unseen test images:
where ππ‘ππ π‘ is the number of test images, and ππ and ππ are vectors of length ππ‘ππ π‘ containing the predicted and known class labels for the test images, respectively. This performance measure can be augmented by performing further analyses of the results, specifically by considering confusion matrices and by performing sensitivity analyses to determine the significance of differences between error rates and the significance of the effect that hyperparameter choices have on the error rates.
4.7.1 Confusion matrices
A confusion matrix is a good way to visualise classification results, and shows the percentage of samples that were classified into each class. As an example, consider the sample confusion matrix for a 4-class classification problem, shown in figure 4-13.
β°π‘ππ π‘ = 1 ββ (ππ(π) == ππ(π)) π
π=1
Chapter 4 β Materials and methods 89
Looking for example at the third row of the confusion matrix, showing classification results for all images that actually belong to class 3: 6.7% of the images were incorrectly classified into class 1, 24.8% were incorrectly classified into class 2, 32.2% were correctly classified into class 3 and 36.2% were incorrectly classified into class 4. A perfect classifier would result in 100% (coloured black) along the diagonal and 0% (coloured white) everywhere else.
4.7.2 Sensitivity analysis
Analysis of variance
Analysis of variance (ANOVA) is a statistical test that can be used to determine whether there are significant differences between means of several groups, and thus generalises the t-test to more than two groups. ANOVA was used in this work to test whether the error rates obtained with the different feature extraction methods and classifiers differed significantly from each other. A 95% confidence level (πΌ = 0.05) was selected, so that for each effect a p-value of p β€ πΌ = 0.05 was considered significant.
ANOVA results can only show whether specific factors have a significant influence on the error rate, but does not provide any further information regarding which of the factor levels produce significantly different error rates. For example, if one factor is βfeature setβ, then its levels are the specific feature sets: GLCM, wavelet, steerable pyramid, texton and LBP. If the effect of the factor βfeature setβ is found to be significant, this only means that at least one of the feature sets produced a significantly different error rate than at least one other feature set.
Post-hoc testing
After an ANOVA test has been performed, post-hoc tests can be carried out to determine which levels in factors were significantly different from one another. Post-hoc testing involves performing a t-test between each pair of treatments (or between a pre-specified set of treatments), which means that multiple t-tests are carried out. When performing multiple t-tests, the overall confidence level is no longer 95%, as with each additional test performed the probability of a type I error (false positive) increases. Post-hoc tests therefore incorporate corrections for the πΌ of each individual t-test so that the overall confidence level remains at 95%.
Figure 4-13: Sample confusion matrix for a 4-class classification problem
Chapter 4 β Materials and methods 90
The Bonferroni post-hoc test (Dunn, 1961) was used to determine the significance between different levels in the βfeature setβ factor. This post-hoc test involves a simple correction for πΌ:
According to equation (4-9) the original πΌ is divided by the number of t-tests (π) to yield the adjusted πΌπππ for each individual test. Equivalently, the Bonferroni post-hoc test employed in this work multiplies the p-value obtained in each individual t-test with π, so that this new p-value can be compared to the original πΌ:
Therefore, the Bonferroni post-hoc test shows that there is a significant difference between two factor levels tested when pπππβ€ πΌ for that t-test.
Regression for ANOVA
During the cross-validation phase, many feature extraction and classification hyperparameters were optimised. It is important to know which of these hyperparameters actually had a significant effect on the error rate, since insignificant hyperparameters could be left out of the optimisation in future work, leading to lower computational requirements.
In some cases, there were up to four hyperparameters optimised for a single feature extraction and classification combination. Since ANOVA with more than two factors is not a standard procedure, analysis of variance was instead carried out using regression models. The idea behind such a regression model is simple: if the hyperparameter settings instead of the features extracted are used as input variables to a regression model of the error rate, then the p-values of these input variables will show which of them had a significant influence on the error rate.
One regression model was set up for each of the ten feature set and classification combinations, for each of the three case studies. Since all combinations of hyperparameter settings were tested during cross-validation, the validation error rates for each fold were used as dependent variables in these regression models. Apart from the hyperparameter settings, all pairwise interaction terms between hyperparameter settings were also included in the regression models. It is important to determine whether interaction effects are significant, the rationale being that if the interaction between two hyperparameters is not significant, these hyperparameters could be optimised independently, thereby reducing the computational requirements of the optimisation procedure.
4.8 Summary
In this chapter an overview of the three case studies used in this work was given. The texture classification framework was illustrated and explained by providing details on data partitioning,
πΌπππ = πΌ
π (4-9)
Chapter 4 β Materials and methods 91
cross-validation and testing, as well as on each individual step in the framework: pre-processing, dimensionality reduction and modelling.
The dimensionality reduction and modelling hyperparameters considered for optimisation are summarised in table 4-13.
Table 4-13: Summary of hyperparameters considered for optimisation Method Hyperparameter Values considered
GLCM
Number of grey levels (πΊ) πΊ = 2π, 3 β€ π β€ 7, π β β Size of displacement between grey
level pairs (π·) 1 β€ π· β€ 5, π· β β GLCM type Symmetric (not varied) Number of orientations 4 (not varied)
Wavelet Wavelet type βhaarβ, βdb3β, βsym4β Decomposition level (π½) Maximum (not varied) Steerable
pyramid
Filter bank 3rd order directional derivatives (not varied)
Decomposition level (π½) Maximum (not varied) Number of orientations (ππππ) 4, 6
Width of pixel neighbourhood (π) 7, 11 Texton
Filter bank Schmid (not varied) Support width of largest filter (πΉπ) 25, 49
Number of cluster centres (πΎπ) 20, 40, 80 LBP
Texture neighbourhood radius and
sampling points pair (π , π) (1, 8), (2.5, 12), (4, 16) Mapping type βnoneβ, βriβ, βu2β, βriu2β
PCA Usage of PCA None, π£ππ = 99%, π£ππ = 95%
K-NN Number of nearest neighbours (πΎπ) 1 β€ πΎπ β€ 11, πΎπ β β
DA Type DA Linear, Quadratic