5.5 Evaluation and Discussion
5.5.2 Experiment 1: Comparison of Classification Performances using
The results of experiments directed at comparing the classification performance using colour histograms with different numbers of bins,W, is presented and discussed in this sub-section. Table 5.1 and 5.2 show the results produced using theBDandMDdatasets. The column labels in Table 5.1, starting from the left-hand side, should be interpreted as follows: (i)W indicates the number of histogram bins, (ii) “Sens” is the sensitivity, (iii) “Spec” is the specificity, (iv) “Accuracy” is the classification accuracy and (v)“AUC” is the Area Under the receiver operating Curve (AUC) value. In Table 5.2, the columns labelled “Sens-AMD” and “Sens-other” represent the sensitivity with respect to the identification of AMD and other disease images respectively. In each column, two values are recorded, each of which correspond to the average and standard deviation (in brackets) of the results generated using five sets of Ten-fold Cross Validation (TCV). The highest value obtained for each evaluation metric is indicated in bold font. All results were produced using colour histograms extracted from the RGB colour model for each image with the number of different colours reduced to W colours, except for the results in the last row in the tables where the results were generated using only a green channel histogram of length 256 bins.
Table 5.1: Average classification results obtained usingCBH, different W values and theBDdataset
W Sens (%) Spec (%) Accuracy (%) AUC (%) 8 71.1(1.3) 58.5(0.5) 66.3(0.8) 75.7(0.6) 16 64.4(1.0) 60.1(0.7) 62.8(0.8) 77.7(0.7) 32 70.9(1.3) 56.2(1.3) 65.4(0.7) 76.5(0.4) 64 66.7(1.5) 63.6(1.5) 65.5(1.1) 77.7(0.4) 128 69.3(2.4) 55.0(1.5) 64.0(1.5) 74.7(0.8) 256 74.0(1.5) 57.4(1.5) 67.8(0.6) 73.9(0.8) green-256 60.4(1.7) 55.6(1.4) 58.6(1.5) 82.3(0.7)
From Table 5.1 where the BD dataset was used, it can be seen that the best sen- sitivity (74.0%), specificity (63.6%) and accuracy (67.8%) values were generated using colour histograms extracted from all three colour channels combined (after colour quan- tisation). Both the best sensitivity and accuracy were obtained using W = 256, while the best specificity was produced when W = 64. The individual green channel his- togram produced the best AUC of 82.3%. Considering the results produced using the MDdataset presented in Table 5.2, the best sensitivity of 57.0% and the highest accu- racy of 50.2% for AMD images identification were achieved using W = 256. The best
Table 5.2: Average classification results obtained usingCBH, different W values and theMD dataset
W Sens-AMD (%) Sens-other (%) Spec (%) Accuracy (%) AUC (%)
8 45.1(1.2) 39.8(1.8) 43.5(1.0) 42.7(0.7) 65.4(0.3) 16 42.1(1.4) 48.2(1.7) 44.2(0.9) 44.7(0.8) 69.4(0.4) 32 51.9(0.9) 38.4(2.7) 46.7(0.4) 46.2(1.1) 70.6(0.6) 64 52.2(2.8) 45.6(3.0) 47.3(2.9) 48.8(1.4) 72.7(0.9) 128 49.5(1.9) 42.9(1.1) 37.9(1.4) 44.5(0.9) 72.8(0.4) 256 57.0(2.0) 46.2(1.1) 43.9(2.4) 50.2(0.6) 71.2(0.1) green-256 44.0(1.3) 64.3(7.6) 29.5(3.2) 47.2(1.9) 70.7(1.2)
recorded sensitivity for the identification of other disease was 64.3% using green channel histograms. Other best results (specificity and AUC) recorded were 47.3% (W = 64) and 72.8% (W = 128) respectively. From the experiments, it can be summarised that most of the best results were achieved using 64≤W ≤256.
5.5.2.1 Discussion of Experiment 1 Results
Observation of the results obtained using the two class dataset,BD, presented in Table 5.1 indicates that comparable results (between different values of W) with respect to accuracy and AUC were produced using histograms generated using all three colour channels (the accuracy and AUC value obtained were greater than 60% and 70% re- spectively). The green channel histogram produced the lowest accuracy of 58.6%, but it performed the best with respect to AUC (82.3%). The results generated across the dif- ferent sets of TCV runs indicated that consistent results were produced, with standard deviations of less than 2% for accuracy and less than 1% for AUC.
The results generated from the multiclass dataset,MD, presented in Table 5.2 show that the overall accuracy is low (the best was just above 50%). Better AUC values were however produced (greater than 65% for all values of W) with a recorded best of 72.8%. Inspection of the standard deviations demonstrated that similar accuracy and AUC results were produced across the different TCV runs, with a standard deviation of less than 2% for both metrics. Note that CBH was used in these experiments.
The results shown in both tables indicate that the histograms extracted from all three RGB channels, combined and quantised to W colours, produced better overall results than using histograms generated from the green channel alone. The suggested explanation for this results is that histograms representing all channels are more in- formative, and therefore more discriminative (in the context of image classification), than the green channel histogram alone. With respect to theW parameter, the results clearly indicate that the higher the W value, up to W = 256 in the reported exper- iments, the better the classification accuracy, with the exception of W = 128 where a slight drop in accuracy was observed compared to W = 64. A similar pattern can
be observed with respect to the AUC values, where the AUC values produced were better when the length of the histograms was increased up to W = 128. This was to be expected as low numbers of colour bins will tend to group different coloured pixels into the same bin, and consequently reduce the discriminatory power of the colour rep- resentation. However, as stated before, increasing the value of W resulted in a higher computational cost with minimal improvement in classification performances (less than 2% for accuracy and AUC). Thus, the maximum value of W was limited to 256 (note that the original number of different colours produced by the RGB colour model is 16,777,216 as stated in Sub-section 5.2.1). The presented results show that:
1. Using theBDdataset, the best classification accuracy and AUC were 67.8% (W = 256) and 82.3% (using the green channel histograms) each.
2. Using the MD dataset, the best classification accuracy and AUC were 50.2% (W = 256) and 72.8% (W = 128) respectively.
3. The selection of parameter W did affect the classification performances whereby
W ≥32 tended to produce better overall (accuracy and AUC) results.
4. Using all three RGB channels combined as features produced a better performance overall than when using the green channel alone.
5. Performance is relatively stable, as the recorded standard deviation is small (less than 2% for bothBDand MD datasets).
6. Better results were produced using the two class dataset,BD.
Based on the above findings, the histograms used in the following experiments were extracted using all three RGB channels with 32≤W ≤256.
5.5.3 Experiment 2: Comparison of Classification Performances using