Single Feature Vector Generation (Stage Three)

In this section, the results obtained using the proposed process with respect to the single feature vector generation methods are considered. Recall that two methods were

considered: (i) a dimensionality reduction based technique using Principal Component Analysis (PCA) and (ii) a feature selection based technique using the Improved Fisher Kernel (IFK). For the evaluation, four levels of decomposition were again considered,

L= 3,L= 4,L= 5 andL= 6, together with the LCS critical function, overlapping decomposition and the HOG representation technique because the experiments presented in the previous section (and others not specifically reported in this thesis) indicated that these produced “best” results. With respect to the classification stage, SVM classification was again adapted. As before, two sets of experiments were conducted: (i) classifier performance (described in Subsection 6.4.1) and (ii) significance testing (described in Subsection 6.4.2). Recall (Subsection 5.4.2) that the IFK feature selection technique required the dictionary sizeK to be pre-specified. The experiments conducted to eval- uate the use of IFK therefore used a range of K values ({32,64,128,256,512}). The objectives of the comparison were as follows:

1. To determine which feature selection technique, PCA or IFK, served to produce the best classification performance.

2. In the context of the IFK feature selection technique, to determine the effect of using different dictionary sizes (defined by the parameterK).

6.4.1 Classifier Performance in the Context of Single Feature Vector Generation

In this section, we compare between the performance of PCA and IFK in terms of classification effectiveness. In addition, we compare classification performance with respect to different dictionary sizes K when using IFK.

Table 6.9 presents the comparison between the results using PCA and IFK (K = 32). From the table, it can be seen that the best classification results when using IFK were accuracy of 99.29% and AUC of 0.99, while the best accuracy for PCA was 97.14% and the best AUC value was 0.97. Overall, IFK produced the best average results with respect to all the metrics considered. It is conjectured that this was because the IFK- based technique selects feature vectors to be included in the single feature vector by considering the entire collection of feature vectors for the entire collection of images, while PCA considers each image individually. The results of all experiments using IFK are presented in Appendix A.3.

Considering the most appropriate dictionary size (K) with respect to the IFK single feature vector generation method, the results obtained are presented in Table 6.10. In the table, for the evaluation the following techniques were used: (i) overlapping decomposition, (ii) a range of decomposition levels, (iii) the LCS critical function, (iv) the HOG region-based representation and (v) SVM classification. From the table, it can be seen that all the dictionary sizes produced good performance results. The average

Table 6.9: Classifier performance results using overlapping decomposition, a LCS critical function, the HOG region-based representation and ) SVM classification in the context of single feature vector generation (Stage 3) using: (i) a range of decomposition levels (L), (ii) PCA and IFK (withK= 32) feature selection.

Method L Acc Sen Spec PPV NPV EER AUC

PCA 3 95.71% 95.59% 95.83% 95.59% 95.83% 0.04 0.96 4 97.14% 98.48% 95.95% 95.59% 98.61% 0.04 0.97 5 95.71% 98.44% 93.42% 92.65% 98.61% 0.07 0.96 6 95.00% 98.41% 92.21% 91.18% 98.61% 0.08 0.95 Ave. 95.89% 97.73% 94.35% 93.75% 97.91% 0.05 0.96 IFK 3 95.71% 98.44% 93.42% 92.65% 98.61% 0.07 0.96 4 97.86% 98.51% 97.26% 97.06% 98.61% 0.03 0.98 5 97.14% 98.48% 95.95% 95.59% 98.61% 0.04 0.97 6 99.29% 98.55% 100.00% 100.00% 98.61% 0.00 0.99 Ave. 97.5% 98.49% 96.65% 96.32% 98.61% 0.03 0.97

results for the dictionary size of K = 128 were the best with accuracy of 98.57% and AUC of 0.98. From the table, it can be seen that K = 128 produced the best average results with respect to all the evaluation metrics considered. From the table, it can also be seen that there was a relationships between the dictionary size and the level of decomposition. For example, the most effective results are generated for small dictionary sizes such asK = 32,64 with higher levels of decomposition such asL= 5,6. This is because when a largerK is used the IFK will required more feature vectors; a higher level ofLwill ensure that there is a large selection to choose from.

Table 6.10: Classifier performance results using overlapping decomposition, a LCS critical function, the HOG region-based representation and (v) SVM classification in the context of the IFK single feature generation method with (i) a range of dictionary sizes and (ii) a range of decomposition levels.

k L Acc Sen Spec PPV NPV EER AUC

32 3 95.71% 98.44% 93.42% 92.65% 98.61% 0.07 0.96 4 97.86% 98.51% 97.26% 97.06% 98.61% 0.03 0.98 5 97.14% 98.48% 95.95% 95.59% 98.61% 0.04 0.97 6 99.29% 98.55% 100.00% 100.00% 98.61% 0.00 0.99 Ave. 97.5% 98.49% 96.65% 96.41% 98.61% 0.035 0.97 64 3 97.86% 98.51% 97.26% 97.06% 98.61% 0.03 0.98 4 97.86% 98.51% 97.26% 97.06% 98.61% 0.03 0.98 5 99.29% 100.00% 98.63% 98.53% 100.00% 0.01 0.99 6 97.86% 97.10% 98.59% 98.53% 97.22% 0.01 0.98 Ave. 98.21% 98.53% 97.93% 97.79% 98.61% 0.02 0.98 128 3 99.29% 100.00% 98.63% 98.53% 100.00% 0.01 0.99 4 99.29% 100.00% 98.63% 98.53% 100.00% 0.01 0.99 5 97.86% 100.00% 96.00% 95.59% 100.00% 0.04 0.98 6 97.86% 95.77% 100.00% 100.00% 95.83% 0.00 0.98 Ave. 98.57% 98.94% 98.31% 98.16% 98.95% 0.01 0.98

256 3 95.71% 96.97% 94.59% 94.12% 97.22% 0.06 0.96 4 97.86% 97.10% 98.59% 98.53% 97.22% 0.01 0.98 5 95.71% 94.29% 97.14% 97.06% 94.44% 0.03 0.96 6 99.29% 100.00% 98.63% 98.53% 100.00% 0.01 0.99 Ave. 97.14% 97.09% 97.23% 97.06% 97.22% 0.02 0.97 512 3 95.00% 96.92% 93.33% 92.65% 97.22% 0.07 0.95 4 100.00% 100.00% 100.00% 100.00% 100.00% 0.00 1.00 5 98.57% 97.14% 100.00% 100.00% 97.22% 0.00 0.99 6 97.14% 100.00% 94.74% 94.12% 100.00% 0.06 0.97 Ave. 97.67% 98.51% 97.01% 96.69% 98.61% 0.03 0.97

6.4.2 Single Feature Vector Generation Significance Testing

The results presented in Subsection 6.4.1 above using PCA and IFK single feature vector generation suggest that there is a significant difference, in terms of classification effectiveness, between the two techniques. In this section, the results from an ANOVA for comparing the two techniques are presented to demonstrate whether there was indeed a statistically significant difference between the two techniques or not. In addition, a second ANOVA is presented compare between the effect of using different IFK dictionary sizes with respect to classification performance in the context of AUC. With respect to the comparison of PCA and IFK (withK= 32) Table 6.11 sows the ANOVA result. From the table, it can be seen that the ANOVA confirms that there is indeed a statistical difference between the operation of the PCA and IFK single feature vector generation methods in terms of classifier performance (p−value= 2.106e−211). Figure 6.5 shows the confidence interval diagram comparing the results of PCA and IFK, from which it can be clearly seen that the operation of the IFK techniques is statistically better than the operation of the PCA technique. As noted above, it is conjectured that this is because IFK generates the feature vector for each image with respect to the entire images collection, whilst the PCA methods consider the set of regions for each image individually.

Table 6.11: Comparing IFK and PCA-based methods.

Source SS df MS F p-value

Between-Groups 4.73 1 4.73 1255.37 2.106e-211 Error 7.14 1894 0.00377

Total 11.87 1895

With respect to experiments conducted to identify the most appropriate dictionary size (K), the ANOVA result is presented in Table 6.12. From the table, it can be seen that the difference in operation resulting from using different dictionary sizes is not statistically significant (p−value = 0.7106 > 0.05). From the table, it can also be noted that the difference between the groups, where each group represents the results

Figure 6.5: Confidence intervals for comparing single feature vector generation techniques.

using a particular dictionary sizeK, is very small (Between-GroupsSS = 0.0030) while the difference within each group is slightly higher with ErrorSS = 1.2460. From Figure 6.6, it can be seen that there were no major differences between the confidence intervals associated with the different dictionary sizes where all the medians of the results are above 0.9 of AUC. It seems that K = 128 was slightly better than the rest because it has a shorter interval and all the result range above 0.86.

Table 6.12: Comparing different dictionary sizeK in IFK.

Source SS df MS F p-value

Between-Groups 0.0030 4 7.4372e-04 0.5342 0.7106 Error 1.2460 895 0.0014

Total 1.2490 895

Figure 6.6: Confidence intervals for comparing different dictionary sizesK when using IFK single feature vector generation.

In document Three-dimensional image classification using hierarchical spatial decomposition: A study using retinal data (Page 106-111)