• No results found

Chapter 4 Materials and methods

4.6 Modelling

4.6.1 K-nearest neighbours

In K-nearest neighbour (K-NN) classification, given a set of training data points with known labels, a new (test) data point is assigned the majority rule label of its 𝐾𝑁 closest neighbours in the feature space. In this work the numbers of nearest neighbours considered were 1 ≀ 𝐾𝑁 ≀ 11, πΎπ‘βˆˆ β„•.

4.6.2 Discriminant analysis

Discriminant analysis (DA) finds a set of weights such that the linear combinations of the training data vectors and weights result in a maximal separation between the classes. The data are then classified according to the maximum a-posteriori rule. In this work the use of linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) were considered.

4.7 Performance evaluation

As discussed in section 4.3.3, the primary performance measure used to evaluate the performance of each feature extraction method and classifier combination was the error rates obtained by using the trained model to classify unseen test images:

where 𝑁𝑑𝑒𝑠𝑑 is the number of test images, and 𝓒𝒑 and π“’π’Œ are vectors of length 𝑁𝑑𝑒𝑠𝑑 containing the predicted and known class labels for the test images, respectively. This performance measure can be augmented by performing further analyses of the results, specifically by considering confusion matrices and by performing sensitivity analyses to determine the significance of differences between error rates and the significance of the effect that hyperparameter choices have on the error rates.

4.7.1 Confusion matrices

A confusion matrix is a good way to visualise classification results, and shows the percentage of samples that were classified into each class. As an example, consider the sample confusion matrix for a 4-class classification problem, shown in figure 4-13.

ℰ𝑑𝑒𝑠𝑑 = 1 βˆ’βˆ‘ (𝓒𝒑(𝑛) == π“’π’Œ(𝑛)) 𝑁

𝑛=1

Chapter 4 – Materials and methods 89

Looking for example at the third row of the confusion matrix, showing classification results for all images that actually belong to class 3: 6.7% of the images were incorrectly classified into class 1, 24.8% were incorrectly classified into class 2, 32.2% were correctly classified into class 3 and 36.2% were incorrectly classified into class 4. A perfect classifier would result in 100% (coloured black) along the diagonal and 0% (coloured white) everywhere else.

4.7.2 Sensitivity analysis

Analysis of variance

Analysis of variance (ANOVA) is a statistical test that can be used to determine whether there are significant differences between means of several groups, and thus generalises the t-test to more than two groups. ANOVA was used in this work to test whether the error rates obtained with the different feature extraction methods and classifiers differed significantly from each other. A 95% confidence level (𝛼 = 0.05) was selected, so that for each effect a p-value of p ≀ 𝛼 = 0.05 was considered significant.

ANOVA results can only show whether specific factors have a significant influence on the error rate, but does not provide any further information regarding which of the factor levels produce significantly different error rates. For example, if one factor is β€œfeature set”, then its levels are the specific feature sets: GLCM, wavelet, steerable pyramid, texton and LBP. If the effect of the factor β€œfeature set” is found to be significant, this only means that at least one of the feature sets produced a significantly different error rate than at least one other feature set.

Post-hoc testing

After an ANOVA test has been performed, post-hoc tests can be carried out to determine which levels in factors were significantly different from one another. Post-hoc testing involves performing a t-test between each pair of treatments (or between a pre-specified set of treatments), which means that multiple t-tests are carried out. When performing multiple t-tests, the overall confidence level is no longer 95%, as with each additional test performed the probability of a type I error (false positive) increases. Post-hoc tests therefore incorporate corrections for the 𝛼 of each individual t-test so that the overall confidence level remains at 95%.

Figure 4-13: Sample confusion matrix for a 4-class classification problem

Chapter 4 – Materials and methods 90

The Bonferroni post-hoc test (Dunn, 1961) was used to determine the significance between different levels in the β€œfeature set” factor. This post-hoc test involves a simple correction for 𝛼:

According to equation (4-9) the original 𝛼 is divided by the number of t-tests (𝑁) to yield the adjusted π›Όπ‘Žπ‘‘π‘— for each individual test. Equivalently, the Bonferroni post-hoc test employed in this work multiplies the p-value obtained in each individual t-test with 𝑁, so that this new p-value can be compared to the original 𝛼:

Therefore, the Bonferroni post-hoc test shows that there is a significant difference between two factor levels tested when pπ‘Žπ‘‘π‘—β‰€ 𝛼 for that t-test.

Regression for ANOVA

During the cross-validation phase, many feature extraction and classification hyperparameters were optimised. It is important to know which of these hyperparameters actually had a significant effect on the error rate, since insignificant hyperparameters could be left out of the optimisation in future work, leading to lower computational requirements.

In some cases, there were up to four hyperparameters optimised for a single feature extraction and classification combination. Since ANOVA with more than two factors is not a standard procedure, analysis of variance was instead carried out using regression models. The idea behind such a regression model is simple: if the hyperparameter settings instead of the features extracted are used as input variables to a regression model of the error rate, then the p-values of these input variables will show which of them had a significant influence on the error rate.

One regression model was set up for each of the ten feature set and classification combinations, for each of the three case studies. Since all combinations of hyperparameter settings were tested during cross-validation, the validation error rates for each fold were used as dependent variables in these regression models. Apart from the hyperparameter settings, all pairwise interaction terms between hyperparameter settings were also included in the regression models. It is important to determine whether interaction effects are significant, the rationale being that if the interaction between two hyperparameters is not significant, these hyperparameters could be optimised independently, thereby reducing the computational requirements of the optimisation procedure.

4.8 Summary

In this chapter an overview of the three case studies used in this work was given. The texture classification framework was illustrated and explained by providing details on data partitioning,

π›Όπ‘Žπ‘‘π‘— = 𝛼

𝑁 (4-9)

Chapter 4 – Materials and methods 91

cross-validation and testing, as well as on each individual step in the framework: pre-processing, dimensionality reduction and modelling.

The dimensionality reduction and modelling hyperparameters considered for optimisation are summarised in table 4-13.

Table 4-13: Summary of hyperparameters considered for optimisation Method Hyperparameter Values considered

GLCM

Number of grey levels (𝐺) 𝐺 = 2𝑔, 3 ≀ 𝑔 ≀ 7, 𝑔 ∈ β„• Size of displacement between grey

level pairs (𝐷) 1 ≀ 𝐷 ≀ 5, 𝐷 ∈ β„• GLCM type Symmetric (not varied) Number of orientations 4 (not varied)

Wavelet Wavelet type β€˜haar’, β€˜db3’, β€˜sym4’ Decomposition level (𝐽) Maximum (not varied) Steerable

pyramid

Filter bank 3rd order directional derivatives (not varied)

Decomposition level (𝐽) Maximum (not varied) Number of orientations (𝑆𝑖𝑛𝑐) 4, 6

Width of pixel neighbourhood (π‘Š) 7, 11 Texton

Filter bank Schmid (not varied) Support width of largest filter (𝐹𝑆) 25, 49

Number of cluster centres (𝐾𝑇) 20, 40, 80 LBP

Texture neighbourhood radius and

sampling points pair (𝑅, 𝑃) (1, 8), (2.5, 12), (4, 16) Mapping type β€˜none’, β€˜ri’, β€˜u2’, β€˜riu2’

PCA Usage of PCA None, π‘£π‘Žπ‘Ÿ = 99%, π‘£π‘Žπ‘Ÿ = 95%

K-NN Number of nearest neighbours (𝐾𝑁) 1 ≀ 𝐾𝑁 ≀ 11, 𝐾𝑁 ∈ β„•

DA Type DA Linear, Quadratic