Chapter 8 Overall discussion and conclusions
8.2 Texture classification discussion and conclusions
The results obtained with all feature extraction algorithm and classifier combinations, for all three case studies, are reported in table 8-1.
Chapter 8 – Overall discussion and conclusions 140
Table 8-1: Comparison of average test error rates for all method combinations and all case studies
It should be noted that since the results obtained with textons and the platinum froth flotation results are based on only one run, these are not as reliable as the remainder of the results. It was found that for the coal and hydrocyclone data sets, results varied significantly between repetitions of the same experiment, and conclusions could not be drawn based on the results of one run only. However, even though only one run could be performed with the flotation data set, this data set contains many more images than the coal and hydrocyclone data sets (2600 versus 280 and 300, respectively). It is therefore expected that the results based on one run with the flotation data set would be more reliable than results based on one run with the other two data sets.
8.2.1 Feature extraction methods
From table 8-1 it can be seen that the average test error rates across all three case studies were the lowest for the steerable pyramid and LBP feature sets, followed by textons and then wavelets. The GLCM feature set resulted in the highest error rates. This is in line with what would be expected when considering the properties of the different methods. The steerable pyramid, LBP and texton methods have advanced properties and combine texture analysis approaches in ways that should theoretically provide advantages over the baseline GLCM and wavelet methods.
The poorer performance of textons when compared to steerable pyramids and LBPs could be ascribed to two factors. First, only one texton run was performed for each case study, which means that the poorer performance could be coincidental. Second, it was not possible to optimise the filter set hyperparameter used in the texton algorithm, due to the long computer running times that would be required to do so. It is possible that other filter sets would yield better results.
It is important to consider which feature extraction methods outperformed which other methods to a degree that is statistically significant. This is depicted in figure 8-1.
Feature set Classifier Case study Avg. Flotation* Coal Hydrocyclone
GLCM K-NN 35.2% 25.6% 14.2% 25.0% DA 36.0% 17.0% 13.5% 22.2% Wavelet K-NN 36.3% 14.7% 13.0% 21.3% DA 27.1% 13.3% 12.3% 17.6% Steerable pyramid K-NN DA 28.3% 11.1% 13.0% 12.0% 10.7% 7.8% 17.3% 10.3% Texton* K-NN DA 33.3% 25.7% 10.0% 8.6% 16.2% 6.8% 19.4% 14.2% LBP K-NN 22.8% 15.1% 10.4% 16.1% DA 14.5% 10.3% 9.3% 11.4% * Error rates based on only one run (these cells are shaded grey).
Chapter 8 – Overall discussion and conclusions 141
In the platinum froth flotation case study the steerable pyramid and LBP feature sets outperformed the GLCM, wavelet and texton feature sets. Although only one run was performed, and thus a statistical significance test could not be carried out, the differences in error rates between the better methods and the worse methods were large. On average across both classifiers, the error rates obtained with steerable pyramid and LBP features were 19.7% and 18.7%, respectively. The error rates for the other three methods were much higher at 29.5% (textons), 31.7% (wavelets) and 35.6% (GLCMs). The conclusion that steerable pyramids and LBPs significantly outperformed the other three methods for this data set therefore seems reasonable.
With the data set of coal on a conveyor belt, the GLCM feature set was significantly outperformed by all four of the other methods, although the good result for textons is not as reliable, as only one run was performed.
In the case study where hydrocyclone underflow particle size was investigated, the steerable pyramid and LBP feature sets significantly outperformed the GLCM and wavelet feature sets. No conclusion could be drawn for textons, since the result obtained with a K-NN classifier led to the highest error rate of all methods for this case study (16.2%), while the texton and DA combination had the lowest error rate (6.8%).
Based on these results, the overall conclusion can be drawn that the steerable pyramid and LBP methods showed the best performance and can thus extract the most descriptive feature sets. These two feature extraction methods were among the better methods for all three case studies, and are therefore also the most likely to give good results on other texture analysis case studies. Both steerable pyramid and LBP feature extraction would therefore be good methods to employ, should further experiments be carried out.
Better methods Worse methods
Flotation
Steerable pyramid LBP Wavelet GLCM TextonCoal
Wavelet Steerable pyramid Texton LBP GLCMHydrocyclone
Steerable pyramid LBP Wavelet GLCM
Figure 8-1: Statistically significant differences in performances of feature extraction methods. On the left are shown the better methods for each case study, and on the right are shown the methods that
Chapter 8 – Overall discussion and conclusions 142
The texton feature set appears once among the better methods and once among the worse methods (figure 8-1), and had one inconclusive result. While texton feature extraction was the second-best option after steerable pyramids and LBPs, this algorithm has a high computational complexity and requires extremely long computer running times for the training phase. This makes hyperparame- ter optimisation difficult, and if the algorithm were to be implemented in an industrial vision-based inferential sensor, the slow training time reduces the capacity for fast online recalibration. It is recommended that this method is further tested before a final conclusion regarding its performance is made. This further testing should include the optimisation of the choice of filter bank and the execution of ten repetitions of the experiment instead of only one. Additionally, the use of alternative clustering methods in the texton algorithm may reduce the computational workload required by this algorithm.
The wavelet and GLCM methods had the worst performance. Wavelets appear once among the better methods and twice among the worse methods, while GLCMs are always among the worse methods (figure 8-1). Therefore, even though these two methods have enjoyed widespread use in vision-based inferential sensing applications in the process industries, their status of being “state- of-the-art” (Duchesne et al., 2012) should be reconsidered. The results from this study show that alternative textural feature sets (steerable pyramids and LBPs) are likely to improve the performance when used as input to online prediction algorithms, compared to the GLCM and wavelet methods.
8.2.2 Classifiers
Considering the overall results reported in table 8-1, DA outperforms K-NN for almost every feature set and case study. On average, the use of DA results in a 4.7% lower error rate than K-NN. Also, from the ANOVAs performed for the coal and hydrocyclone case studies, it was shown that the choice of classifier is significant. The evidence is therefore overwhelming that DA significantly outperformed K-NN in the classification experiments performed in this study.
If a choice has to be made between K-NN and DA in future work, DA would be the better option. However, there could be many reasons for the difference in performance between these two classifiers that would cause the result not to hold true in every case. For example, the unequal distribution of data points into classes had impacted K-NN negatively in this work, and the choice of distance metric (Euclidean) may not have been optimal for these particular case studies.