Classification Results using the MD Dataset

6.5 Classification Strategies for Multiclass Classification

6.6.2.2 Classification Results using the MD Dataset

Similar to the foregoing sub-section, the image classification performance using various

k values with k-NN, with respect to the accuracy and AUC evaluation metrics using the MD dataset, are plotted in Figure 6.6. Tables 6.4, 6.5 and 6.6 show the results

Table 6.4: Average results based on best accuracies for retinal image classification generated using S1 and S2,k-NN, and theMD dataset

Strategy Sens-AMD (%) Sens-other (%) Spec (%) Accuracy (%) AUC (%)

S1 71.6(1.2) 43.3(1.5) 50.8(1.2) 57.0(0.9) 72.3(0.3)

d2 57.4(1.5) 43.4(1.6) 38.7(1.4) 48.0(0.5) 65.0(0.2)

d3 76.1(4.4) 45.3(2.3) 24.9(8.4) 53.1(0.8) 71.8(1.3)

d4 76.4(0.8) 41.0(1.2) 35.9(1.9) 42.8(0.8) 56.9(0.6)

Table 6.5: Average results obtained using S1 and S2, NB and theMD dataset

Strategy Sens-AMD (%) Sens-other (%) Spec (%) Accuracy (%) AUC (%)

S1 73.0(8.0) 46.1(4.0) 62.2(16.2) 61.2(4.1) 78.3(4.5)

d2 52.3(2.8) 18.6(3.3) 73.6(0.9) 46.7(1.5) 69.0(0.6)

d3 50.0(1.8) 36.3(1.6) 71.3(1.6) 50.7(0.8) 71.4(0.4)

d4 56.4(8.5) 54.5(6.4) 62.9(3.3) 57.3(5.2) 75.6(4.3)

generated using k-NN, NB and SVM respectively; these were used to compare the relative performance of the selected classification techniques.

Figure 6.6 shows the average accuracy and AUC results obtained when applying the proposed tabular representations to the MDdataset and building the desired classifier using k-NN. Figure 6.6(a) shows plots of values for k against accuracy. Figure 6.6(b) shows plots of values for k against AUC values. From the figures it can be seen that the best accuracy and AUC using S1 were 57.0% (k= 10) and 72.7% (k= 13), while S2 produced a highest accuracy of 53.1% (usingd= 3 and k= 16) and a highest AUC value of 71.9% (usingd= 3 and k= 15). Other evaluation metrics obtained from the experiments (not shown in the figures) with respect to both strategies were sensitivity and specificity. With respect to S1 the best sensitivity for AMD identification was 72.3% (using k = 6) and the best sensitivity for “other diseases” was 45.1% (using

k= 16). The best recorded specificity, for S1, was 53.9% (usingk= 5). With respect to S2, the best sensitivity for AMD identification was 78.4% (using d= 4 and k= 20) and the best sensitivity for “other disease” was 45.8% (usingd= 3 and k= 17). The best recorded specificity using S2 was 51.2% (d = 4, k = 2). Similar to BD dataset, S1 produced the best performance with respect to accuracy for allk when 3≤k≤20 on MD dataset. A similar outcome was observed with respect to the AUC values, S1 produced the best performance for allkwhen 2≤k≤20.

The results produced using theMDdataset are given in Tables 6.4 (k-NN), 6.5 (NB) and 6.6 (SVM). The “Sens-AMD” labelled column represents the classifiers sensitivity in identifying AMD images, while their sensitivity to identify other disease images is presented in the column labelled as “Sens-other”. Again, the best results for each evaluation metric are indicated in bold font, except for results generated using k-NN (shown in Table 6.4) where only the best performing results (with respect to accuracy) are shown in the table.

(a)

(b)

Figure 6.6: Comparison of average (a) accuracy and (b) AUC results for image classification usingk-NN and the non-partitioning (S1) and partitioning (S2) strategies using theMD dataset

Table 6.6: Average results obtained using S1 and S2, SVM and the MD dataset

Strategy Sens-AMD (%) Sens-other (%) Spec (%) Accuracy (%) AUC (%)

S1 75.9(1.5) 46.7(2.7) 59.8(1.2) 62.1(1.5) 79.6(1.4)

d2 77.1(1.7) 55.1(1.7) 41.6(1.5) 60.8(0.6) 77.7(0.4)

d3 72.8(1.1) 66.5(2.0) 54.9(2.0) 66.3(0.8) 83.4(0.5)

Inspection of the results shown in Table 6.4 (and Figure 6.6) indicated that, similar to the results produced using the BD dataset, no single best value for k could be identified. The k-NN results displayed in Table 6.4 were generated using different k

values based on the best accuracy achieved by each strategy and d (for strategy S2 only); the results for S1 were obtained usingk= 10, while the results forS2-d2,S2-d3

and S2-d4 were generated using k= 5, 16 and 6 respectively. As noted above, Figure

6.6(a) indicated that strategy S1 performed better than strategy S2 when usingk-NN classification with respect to classification accuracy. This result is corroborated by the results presented in Table 6.4. A similar pattern was produced using NB as shown in Table 6.5. However, when using a SVM, strategy S2 yielded the best accuracy as indicated in Table 6.6. With respect to AUC, again k-NN and NB produced the best result using strategy S1; 72.7% (see Figure 6.6(b)) and 78.3% (see Table 6.5) respectively. SVM produced the best AUC value of 84.1% using strategy S2 andd= 4.

Overall, the results produced can be summarised as follows:

1. The best classification accuracy and AUC for the MD dataset were 67.4% and 84.1% (using S2 and d= 4).

2. The best results using bothk-NN and NB were generated using S1, while SVM performed the best using S2. However, the overall best results were produced using strategy S2.

3. No single value ofk(for k-NN) has been found to produce the best classification results with respect to both strategies S1 and S2.

6.6.2.3 Discussion of Experiment 1 Results

All the classification algorithms produced good AUC results; the best for each were all greater than 70%. Lower accuracy (compared to AUC) was produced, in particular the results produced using theMDdataset. Such results (low accuracy and high AUC) are likely to be caused by the imbalance image sets used for evaluation throughout the work described in this thesis. Overall, the proposed approaches performed signifi- cantly better when applied to theBDdataset compared to theMD dataset (multiclass classification is always more challenging than binary class classification). Inspection of the standard deviations of the results indicate that similar accuracy and AUC were produced by all classification algorithms across different sets of TCV, with a standard deviation of less than 6% (accuracy) and 5% (AUC) respectively for theMD dataset. The BD dataset produced even more consistent results, with standard deviations of less than 2% for accuracy and AUC. Based on the classification results produced by all the classification techniques employed, dataset BD produced the best sensitivity and accuracy using S2, while the best specificity and AUC were produced using S1. On the other hand, S2 outperformed S1, with respect to all the evaluation metrics used

when all the features were utilised for image classification using theMD dataset. It is conjectured that through generalisation (which was achieved using the SVM classifier), the features extracted from smaller regions of an image are more informative than features that were generated from the whole image as they represent information (colour or texture) of pixels that are spatially close to each other, which may in turn indicate some specific characteristic (such as dark, bright or smooth texture) of a particular area of an image. With respect to the three classification algorithms used for the experiments,k-NN produced most of the best results using strategy S1. One explanation of such performance, with regard tok-NN, is the possible redundancy (or insignificant) of features generated by S2.

In document Image classification : a study in age-related macular degeneration screening (Page 141-145)