Evaluation of Classifier Generation (Stage Five)

7.6 Evaluation

7.6.3 Evaluation of Classifier Generation (Stage Five)

In this section, the results obtained using the proposed process with respect to the three different classifier generation mechanisms (KNN, SVM and BN) are considered. Recall that the objective was to determine which classifier produced the best results. Recall also that in the context of the region-based method the BN and SVM classifier generators produced the best results. Two sets of experiments were again conducted: (i) classifier performance evaluation and (ii) statistical significance testing. For the evaluation, the following techniques were fixed with respect to the stages not consid-

ered in this section: (i) standard decomposition and the KLD critical function with a threshold of t = 0.5, (ii) gSpan FSG using σ = 20%. A range of values for L were considered (L={3,4,5,6}) again because it was conjectured that the level of decomposition would have an effect on the nature of the adapted classifier generator. The results obtained are discussed in the following subsections.

7.6.3.1 Classifier Performance in the Context of Classifier Generation

In this subsection, we present the results obtained using the three classifiers considered: (i) SVM, (ii) KNN and (iii) BN. Table 7.9 presents the results obtained. From the table, it can be seen that the SVM and BN classifier generator tended to produce the best results with respect to the four different decomposition levels considered. The SVM produced the best average results with respect to all the metrics considered. It can also be observed that as the level of decomposition increased the classification performance tended to decrease. All reported AUC results obtained, as shown in the table, were between 0.82 and 0.92.

Table 7.9: Classifier performance results in the context of classifiers generation (Stage 5) using: (i) standard decomposition, (ii) a range of decomposition levels, (iii) KLD critical function and (iv) Kurtosis node labelling.

CF L Acc Sen Spec PPV NPV EER AUC

SVM 3 89.29% 88.73% 89.86% 90.00% 88.57% 0.10 0.89 4 85.00% 82.67% 87.69% 88.57% 81.43% 0.12 0.85 5 83.57% 78.31% 91.23% 92.86% 74.29% 0.09 0.84 6 82.14% 80.00% 84.62% 85.71% 78.57% 0.15 0.82 Ave. 85.00% 82.42% 88.35% 89.28% 80.71% 0.15 0.85 KNN 3 88.93% 87.59% 90.37% 90.71% 87.14% 0.11 0.90 4 80.00% 77.27% 83.33% 85.00% 75.00% 0.22 0.83 5 81.07% 77.71% 85.37% 87.14% 75.00% 0.22 0.83 6 80.71% 76.54% 86.44% 88.57% 72.86% 0.20 0.82 Ave. 82.67% 79.77% 86.37% 87.85% 77.5% 0.18 0.84 BN 3 89.29% 87.67% 91.04% 91.43% 87.14% 0.11 0.92 4 79.29% 77.33% 81.54% 82.86% 75.71% 0.22 0.83 5 80.95% 78.26% 84.21% 85.71% 76.19% 0.21 0.84 6 81.19% 77.18% 86.59% 88.57% 73.81% 0.21 0.83 Ave. 82.68% 80.11% 85.84% 87.14% 78.21% 0.18 0.85

7.6.3.2 Classifier Generation Significance Testing

The test conduced to measure the differences between the operation of the classifiers in the context of the whole image-based methods showed that there was no statistical difference between the classifiers. Table 7.10 presents the calculated ANOVA data. The reported p-value was 0.4920, indicating that there was no difference between the operation of the classifiers. Thus the fact that the BN classifier generator produced

Table 7.10: ANOVA data for classifier generation comparison.

Source SS df MS F p-value

Between-Groups 0.0085 2 0.0042 0.7098 0.4920 Error 5.0104 840 0.0060

Total 5.0188 842

the best performance, as reported above, is not statistically significant. Because there was no detected significant difference in the operation of the three classifier generators considered, no further analysis was conducted. There was a small difference between the groups with a Between-GroupsSS= 0.0085 (Table 7.5). Figure 7.6 shows the confidence diagrams showing the similarities between the results obtained using the different classifiers. From the diagram, it can be seen that BN and SVM were slightly better than KNN.

Figure 7.6: Confidence intervals for comparing classifiers.

7.7 Summary and Conclusions

In this chapter, the whole image-based representation mechanism, in the context of AMD classification, has been presented and evaluated. The process comprised five stages: (i) image decomposition, (ii) tree conceptualisation, (iii) frequent sub-graph mining, (iv) feature vector generation and (v) classifier generation. First the images of interest were decomposed into a tree structure. To this end, two different types of decomposition were suggested (standard and overlapping) and the use of six different critical functions (including no critical unction). Nodes were labelled using a single statistical value computed by comparing parent-child regions. The gSpan graph mining algorithm was then applied to generate a set of frequent sub-graphs. These sub-graphs

were then used to form a feature vector for each image. Three classifier generators were considered for the final stage (SVM, KNN and BN).

Table 7.11: Best four performing combinations of techniques as identified in the foregoing evaluation.

Identifier Image Decompo- sition

Node Labelling Feature Selection Classification W IB1 overlapping de-

composition, L=6, LCS critical function

Kurtosis node labelling gSpan FSG using σ= 20% SVM classifier W IB2 overlapping decomposition, L=3, LCS critical function

Kurtosis node labelling gSpan FSG using σ= 20% BN classifier W IB3 overlapping decomposition, L=6, KLD critical function

Kurtosis node labelling gSpan FSG using σ= 20% SVM classifier W IB4 overlapping decomposition, L=6, KLD and LCS critical function

Kurtosis node labelling

gSpan FSG using σ= 20%

BN classifier

Table 7.12: The best classification results obtained using decomposition and whole image- based methods.

Label Acc Sen Spec PPV NPV EER AUC

W IB1 100.00% 100.00% 100.00% 100.00% 100.00% 0.00 1 W IB2 99.29% 100.00% 98.59% 98.57% 100.00% 0.00 0.99 W IB3 97.86% 98.55% 97.18% 97.14% 98.57% 0.03 0.98 W IB4 98.10% 99.03% 97.20% 97.14% 99.05% 0.01 0.98

The evaluation results for the whole image-based representation established that the proposed method produced good classification performances. It was clearly demonstrated that overlapping decomposition produced a much better classification performance than when using standard decomposition. It was also established that the best performing critical function was LCS (which produced an accuracy of 100%, an error rate of 0.00 and an AUC of 1). It is noteworthy that the use of LCS outperformed the use of AIV, as proposed in [59] where work on retina classification was also presented (bit directed at 2D images). In the context of the selected level of decomposition, the overall results demonstrated that six levels of decomposition produced high-performing classifiers. In addition, it was established that node labelling using the Kurtosis measure produced better results than when Means labelling was used. The ANOVA statistical significance testing and the post-hoc test (the Tukey’s test) analysis revealed that the results obtained were statistically significant.

The best results from the reported experiments concerning the operation of the proposed whole image-based techniques are summarised. Tables 7.11 gives the best four performing combination of techniques and the associated results obtained. For reference later in this thesis, the techniques have been labelled as follows: (i) W IB1, (ii) W IB2 (iii) W IB3 (iv) W IB4. From the tables, it can be seen that the best results in terms of accuracy reached 100%. From the given results with respect to the proposed approaches using the decomposition, it can be concluded that: (i) overlapping decomposition is more effective than standard decomposition, and (ii) the LCS and KLD critical functions produced a good performance.

Thus, in summary, the main findings from this chapter are as follows:

1. The LCS critical function contributed to a better whole image-based representation.

2. There were similarities between levels of decomposition with slightly better performance using level 6 of decomposition.

3. Kurtosis node labelling produced better results than Mean node labelling.

4. All the classifiers produced good results although SVM and BN results were slightly better than KNN.

In the next chapter, we compare the obtained results in this chapter with the results from the previous chapter (Chapter 6).

Chapter 8

Discussion

8.1 Overview

This short chapter discusses and compares the operation of the proposed volumetric representation methods, both Region-Based (RB) and Whole Image-Based (WIB), for image classification described in the foregoing chapters. The comparison is conducted in the context of the four best region-based methods and the four best whole image- based representation methods identified in Chapters 6 and 7. The aim of this chapter is to provide a comparison between these two groups of techniques and to compare their operation with other techniques taken from the literature (identified in Chapters 2 and 3). For the evaluation, the AMD data set used previously was again used.

With respect to the RB category, the best identified performing techniques are summarised in Table 8.1. With respect to the WIB category the best identified performing techniques are summarised in Table 8.2. The alternative techniques with which the best four RB and best four WIB techniques were compared were as follows:

1. Voxel Co-occurrence Matrix (VCM) [53, 73].

2. Voxel Run-Length Matrix (VRLM) [19, 26, 43, 132].

3. Histograms of Oriented Gradients (HOG) [24, 90, 123].

4. Histograms of Local Binary Pattern (LBP) [57].

5. A combination of HOG and LBP (HOG-LBP) [143].

6. Local Phase Quantisation (LPQ) [101].

Details of these techniques were presented earlier in the previous work chapter (Section 2.4). These methods have been widely used for image feature extraction. Note that none of these featured any form of decomposition. They have thus been adapted for the purpose of 3D (volumetric) image classification in the research domain of interest with respect to this thesis. The SVM classifier was used with all the given techniques because

Table 8.1: Best four performing combinations of techniques for region-based methods. Identifier Image Decompo-

sition

Region Represen- tation

Feature Selection Classification RB1 overlapping decomposition, L=5, ED critical function HOG representation IFK with K = 128 SVM classifier RB2 overlapping decomposition, L=5, ED critical function HOG representation IFK with K = 128 BN classifier RB3 overlapping decomposition, L=5, LCS critical function HOG representation IFK with K = 128 SVM classifier RB4 overlapping decomposition, L=5, LCS critical function HOG representation IFK with K = 128 BN classifier

it had been shown to produce good performance (see evaluation results presented in Chapters 6 and 7).

Table 8.2: Best four performing combinations of techniques for whole image-based methods. Identifier Image Decompo-

sition

Node Labelling Feature Selection Classification W IB1 overlapping de-

composition, L=6, LCS critical function

Kurtosis node labelling gSpan FSG using σ= 20% SVM classifier W IB2 overlapping decomposition, L=3, LCS critical function

Kurtosis node labelling gSpan FSG using σ= 20% BN classifier W IB3 overlapping decomposition, L=6, KLD critical function

Kurtosis node labelling gSpan FSG using σ= 20% SVM classifier W IB4 overlapping decomposition, L=6, KLD and LCS critical function

Kurtosis node labelling

gSpan FSG using σ= 20%

BN classifier

With respect to the above, two sets of comparisons are presented. The first set of comparisons is concerned with classifier performance. The second set of comparisons is directed at investigating the run time complexity of the proposed methods (classification efficiency).

comparison in performance between the selected techniques. The time complexity of the selected techniques is then considered in Section 8.3. Some conclusion concerning the results is presented in Section 8.4.

In document Three-dimensional image classification using hierarchical spatial decomposition: A study using retinal data (Page 131-138)