The concatenation of all methods with CFS: a case of study

4.5 A practical evaluation: Analysis of results

5.1.5 The concatenation of all methods with CFS: a case of study

When using feature selection, features are selected according to some specific criteria depending on the method. Filters remove features based on redundancy and relevance, but they do not take into account costs for obtaining them. Note that the cost for obtaining a feature depends on the procedures required to extract it. Therefore, each feature has an associated cost that can be related to financial cost, physical risk or computation demands. This is the case of co-occurrence features and, consequently, the concatenation of all methods. In co-occurrence features the cost for obtaining the 588 features is not homogeneous. Features are vectorized in groups of 28 related to distances and channels in the color space. Each group of 28 features corresponds with the mean and range of 14 statistics across the co-occurrence matrices.

If we focus on co-occurrence features when using CFS, the number of features was reduced by 95.41% (from 588 to 27) but the processing time was not reduced in the same proportion, being 27.01 instead of the initial 102.18 seconds (a reduction of 73.57%). This fact clearly shows that computing some of the 588 features takes longer than others. Some experimentation was performed on the time the method takes to compute each of the 14 statistics. Results disclosed that computing the 14th _statistic,

Chapter 5. Feature selection in other real applications

Table 5.7: TOPSIS values obtained for every method when w = [1/2, 1/2, 0]

Texture analysis Feature selection filter

None CFS Cons INT

Butterworth filters 0.8983 0.9017 0.1694 0.4722 Discrete wavelet transform 0.6567 0.8986 0.9377 0.9629 Co-occurrence features 0.9923 0.9846 0.6233 0.9539 Markov random fields 0.4777 0.3361 0.1601 0.0009 Gabor filters 0.9829 0.8526 0.2717 0.5526 Concatenation 0.9991 0.9987 0.4769 0.9706

which corresponds with the maximal correlation coefficient (Haralick et al., 1973), takes around 96% of the total time. So the time for obtaining a single matrix is negligible compared to the time for computing the 14th statistic. Therefore, the key for reducing the feature extraction time is to reduce the number of 14th statistics in the selection.

Table 5.8: Co-occurrence features selected by CFS over the concatenation of all methods, in which features corresponding with 14th statistic are marked in bold.

Distance Component in the colour space

L a b 1 – 29,50 66 2 98 121,133 – 3 193 – 230 4 267,268,275,276,277 – 321 5 350,359 – – 6 434,443,446 – 492,502 7 518 546 576

In the case of the concatenation of all methods with CFS, the filter selects 56 features (see Table 5.2) distributed as follows: 17 features of Butterworth filters, 1 of the discrete wavelet transform, 24 of co-occurrence features, 1 of Markov random fields, and 13 of Gabor filters. Five of the features selected in co-occurrence features correspond with the 14thstatistic (see Table 5.8). In co-occurrence features, the cost of obtaining the statistics also depends on the distance and component in the color space. On the one hand, the longer the distance the larger the number of matrices to compute

5.1 Tear film lipid layer classification

(and so, the higher the processing time). On the other hand, the differences of color have little contrast so the colorimetric components of the Lab color space are minimal. As a consequence, the matrices within components a and b have smaller dimension than the matrices within component L. As expected, the smaller the dimension the shorter the time to compute a statistic.

Computing the five 14th statistics in the different distances and components take: 3.12 s (feature 98), 8.23 s (feature 350), 9.61 s (feature 434), 11.49 s (feature 518), and 4.81 s (feature 546). As can be seen, avoiding computing some of them will entail saving a significant amount of time. The aim here is to explore the impact of removing some of the five 14thstatistics selected by CFS in terms of accuracy, robustness and time. There are 5 features within the 14th statistic so only 25= 32 different configurations need to be explored. An empirical evaluation of brute force is acceptable. Table 5.9 shows the performance of the different configurations in terms of accuracy, robustness and time. Each configuration corresponds with those features selected by CFS removing some 14th statistics. For purposes of simplicity, only the acceptable results are shown. It is assumed that one solution is unacceptable if it obtains a lower accuracy and robustness in a longer span of time than other.

Table 5.9: Performance measures for the concatenation of all methods with CFS when some of the five 14th statistics are not selected. The best results are marked in bold.

Features removed Acc (%) Rob (%) Time (s) {}, baseline performance 96.19 93.84 37.04 {98, 434} 97.14 94.09 24.31 {98, 434, 546} 97.14 93.84 19.83 {98, 350, 518, 546} 97.14 93.60 9.72 {98, 434, 518, 546} 97.14 92.86 8.34 {98, 350, 434, 518, 546} 97.14 92.61 0.11

In terms of accuracy and robustness to noisy data, the best result is obtained when removing the features {98, 434} (results of 97.14% and 94.09%, respectively), but at the expense of a quite long lapse of time (24.31). Note that this result even improves the baseline performance. In the remainder results, the classification accuracy is maintained whilst the feature extraction time is reduced, only at the expense of a slightly deterioration in terms of robustness to noise (less than 2%).

Chapter 5. Feature selection in other real applications

It is also important to remark the effectiveness of CFS filter for selecting the most appropriate features. If we do not apply feature selection and we simply remove the 14th statistics from the 588 features corresponding with co-occurrence features in the concatenation of all methods, the accuracy and the robustness are 92.86% for both of them. That is, the accuracy is worse than the results shown in Table 5.9 and the robustness is not significantly different. As expected, the time is also longer: 14.74 seconds.

To sum up, the manual process done by experts could be automatized with the benefits of being faster and unaffected by subjective factors, with maximum accuracy over 97% and processing time under 1 second. The clinical significance of these results should be highlighted, as the agreement between subjective observers is between 91%- 100%.

In document Novel feature selection methods for high dimensional data (Page 133-136)