6.5 Discussion of Results
6.5.4 Contribution of the Motion Features
The previous section shows how feature selection can increase the correct classification rates when using the combined set of appearance and motion features. Table 6.4 also shows the results obtained using the 49 appearance features only (that is, with the motion features removed from the subset of 70). Comparing the two sets of figures allows us to estimate the contribution of the motion features to the overall classification rate.
Inspection of the result shows that there is an increase of 4% and 9% in the correct classification rates for the RF and RT classifiers respectively, when the combined features are used compared with when only appearance features are used. The primary interest is in the RF classifier, which is consistently best performing: in this case a 4% increase in classification rates, from 85 to 89% is significant. The statistical significance of the correct classification rate of combined set with feature selection was compared with that of appearnce with feature selection by computing the Wilcoxon’s test statistic W = 0 (the smaller of W+= 665 and W−= 0). Since the computed test statistic is less than 415, we accept the alternative hypothesis that the correct classification rate of the combined set with feature selection is significantly different from that of the appearance set with feature selection. Hence, the assertion that the 4% increase in classification rates is significant.
Comparing the appearance feature subset in Table 6.4 with the complete appearance feature set in Table 4.8 further supports this conclusion. There is a small observed dif- ference when feature selection is used in conjunction with the appearance features alone; similarly, simply adding the motion features naively to produce a large set of 320 features similarly makes little impact. However, when feature selection is used in conjunction with the full set of motion and appearance features, a significant increase in correct classifica- tion rates is evident.
6.6
Conclusion
In this chapter, the work on automated classification of bird species in flight, using com- bined features (motion and appearance) have been further addressed. This chapter has ad- dressed the challenge as a robust fine-grained classification problem using the combined
6.6. CONCLUSION 171
features and has shown experimentally that motion features are important for classifica- tion of species especially those with fine grain appearances.
Classification rates dropped significantly by approximately 9% for some of the clas- sifiers when the combined features with the thirteen classes were used. Two feature se- lection techniques were then applied to select the feature set that produces the best clas- sification results. The best classification rates for both methods occurred at 70 features. The classification rate was 75.40% for the correlation-based (CoBfs) method and 74.82% for the classifier-based. This was 49 selected optimal appearance features and 21 motion features with the correlation-based, and 62 appearances and 8 motion with the classifier- based. Finally, we noted that wing beat frequency and vicinity features were selected irrespective of the method used. This shows how important these groups of features were in classifying species by motion.
Experiments performed in Chapters 4 - 5 were revisited, using the selected features by the correlation-based method as they produced the best correct classification rates. The results using the selected features show that the performance of the RF and RT classifiers were superior to both NB and SVM. The classification results from the selected feature set were compared with those without feature selection.
This showed an increase in the correct classification rates by between 0-7% when RF and RT classifiers are used with the motion features, between 4-9% with the combined features, and 0-5% with appearance features. Surprisingly, misclassification of species with closely related appearances decreased with the selected appearance, motion and com- bined features. Specifically, with the selected combined features, there was between 1.0 - 2.3% reduction in misclassification, between 0.1% - 2.5% reduction in misclassification when appearance features were used and 0% - 0.8% when motion features were used.
The contribution of the selected motion features to the overall performance of the classifiers were also evaluated. There was an increase of 4% and 9% in correct classifica- tion rates for the RF and RT classifiers respectively. The best-performing classifier (RF) improved the classification rate by approximately 4%, which may be directly attributed to the use of motion features. Further analysis also revealed specific improvements in
species with similar visual appearance.
The works in Chapters 4 - 6 present results based on classification using single frames and subsets of videos. In the following Chapter, this will be extended to combine the results of several frames from a sequence using majority voting with the four standard classifiers in an attempt to improve classification rates.
Chapter 7
Improving the Performance of Our Bird
Species Classifiers
The works presented in the previous chapters were based on results of classification us- ing single frames and subsets of videos. In Chapter 5, species were classified using a combination of appearance and motion features in order to improve classification rates. However, the naive addition of motion to the appearance features only led to a small or no improvement to the classification rates. Irrelevant and or redundant features were then eliminated in Chapter 6 by performing feature selection in an attempt to further improve classification rates. Two features selection methods were used, namely correlation- and classifier-based techniques, which improved the correct classification rates by approxi- mately 4%.
Most recent classifiers like the random forest classifier have novel classification re- sults since they apply some sort of voting schemes (majority votes), which have greatly motivated the work presented in this chapter. In addition, majority voting technique have been successfully used by Bhattacharya and Chaudhuri (2003) to improve overall classifi- cation rates by combining the output of several classifiers. Since this research tracked and classified flying species in a video sequence frame by frame, it is beleived that aggregat- ing these results will further improve the classification rate as previously done in Marini et al. (2013) for still images.
The aim of work presented in this chapter is to attempt a further increase in the cor- rect classification rates. This is achieved by using majority voting techniques to aggregate the classification results presented in Chapter 6 across a set of video sub-sequences. This technique is applied to both the seven species and thirteen classes dataset using all four standard classifiers and the results were present. The rest of this chapter is structured into the following sections:
• In Section 7.1 the datasets used were presented, and the processing methods applied to the datasets in order to extract features and process sequences were described.
• Experimental work on classification using video sequences and majority voting were also described in Section 7.2 and the results presented in Section ??
• Finally, conclusions were drawn to the results of the majority voting techniques in Sections 7.4
7.1
Dataset and Features Extraction
Datasets #2, which was described in detail in Chapter 4 was used for experiments pre- sented in this chapter. As a reminder, these is the extended set of videos covering thirteen classes made up of eleven bird species with the Budgerigar (Melopsittacus Undulatus) having three colour forms. Specifically, it has been used in this chapter to classify bird species, by aggregating the results of several frames from a video sequence using majority voting.
The background Gaussian mixture model proposed by Zivkovic and van der Heij- den (2006) was used to extract birds’ silhouettes from each video. Contours were then obtained using the algorithm proposed by Suzuki et al. (1985) and were used to form connected components. Details of these have been presented in Chapter 3. The same techniques in Chapter 4 to extract appearance features. This was done by fitting oriented bounding box to each silhouette and extracting metrics like the height, width and hy- potenuse, centroid, contour points and the silhouette itself. Appearance features made up of colour moments, shape moments, greyscale histogram, Gabor filter and log-polar
7.2. MAJORITYVOTINGEXPERIMENTS 175
were then extracted from these metrics. Finally, centroids of the oriented bounding box were used to form trajectories (details in Chapter 5) from which the motion features were extracted. Motion features extracted from these trajectories include curvature scale space (CSS), turn-based, wing-beat frequency, centroid distance function (CDF), vicinity and curvature based on sine and cosine.
The appearance and motion features were merged to form the combined feature set (see Chapter 5), which were optimised using a correlation- and classifier-based feature se- lection technique described in Chapter 6. The optimally selected features by the classifier- based techniques were used to perform majority voting experiments presented in this chapter, as they yield the best results when compared with the correlation-based method. Again as a reminder, features used in this chapter have been represented as statistics, in order to reduce the feature dimension and enable real-time classification. The statisti- cal features computed include the mean, standard deviation, skewness, kurtosis, energy, entropy, maximum, minimum, local maxima, local minima and number of zero crossings.