3.3 Data set and Hierarchy
3.4.2 Hierarchical vs flat multi-class classification
In order to compare the approach of the hierarchical binary network and traditional flat multi-class classification, it was necessary to restructure the problem. For the hierarchical approach, MPS was used to force predictions to leaf node level. The flat classifier was comprised of the 13 leaf nodes in the tree (selected as per 3.2a). The test set was reduced to contain only data that had been labelled to leaf node (as higher level labels have no defined assignment in a leaf node flat classifier). Rather than the tree f1-score (which also makes little sense for a flat classifier), results were evaluated using the mean f1 score across the leaf node classifiers. As the mean f1-score weights nodes equally, the overall results were significantly lower than the tree f1-score. On this metric, the hierarchical classifier’s performance was marginally higher. This was largely due to superior performance on the most poorly performing nodes (e.g. the f1-score on Cnidaria reduced from 0.09 to 0.00 with the
flat multi-class case). Note that this comparison was performed using the siblings
Chapter 3 Hierarchical Classification in AUV Imagery RootNode Tree F1-Score: 80.2% PHYSICAL F1-Score: 84.5% 6354 901 996 5177 HARD F1-Score: 0.0% 13394 0 34 0 SOFT F1-Score: 84.5% 6395 894 991 5148 BIOTA F1-Score: 86.3% 5227 1064 907 6230 SPONGES F1-Score: 22.9% 12532 321 459 116 ECHINODERMS F1-Score: 0.0% 13370 0 58 0 CNIDARIA F1-Score: 6.3% 13029 78 308 13 BRYOZOA F1-Score: 0.0% 13324 0 104 0 OTHER F1-Score: 0.0% 13421 0 7 0 ALGAE F1-Score: 68.2% 11727 406 415 880 CANOPY F1-Score: 75.0% 12535 232 125 536 ECK F1-Score: 74.7% 12585 210 130 503 OTHER F1-Score: 0.0% 13400 0 28 0 ERECTBRANCHING F1-Score: 0.0% 13399 0 29 0 BROWN F1-Score: 0.0% 13418 0 10 0 RED F1-Score: 0.0% 13409 0 19 0 CRUSTOSE F1-Score: 7.3% 13190 53 176 9 ECOR F1-Score: 5.1% 13313 0 112 3 SOND F1-Score: 3.8% 13325 33 68 2
Figure 3.7: Performance results on best classifier (HLNP with thresholding, LBP
features, inclusive training set). The grid of 4 numbers is the confusion matrix
for instances in the test set, which was used to compute both the local f1-scores for each node (red bars), and the tree f1-score (given in the RootNode box, and calculated by computing the f1-score on the sum over the local node confusion matrices). The thickness of the grey edges between nodes is proportional to the number of instances from the test set that belong to a given node.
3.5 Conclusion
policy, and the best performing descriptors (Fourier LBP).
Mean f1-score
Hierarchical (trained on leaf-node training data) 0.197
Hierarchical (trained on all training data) 0.182
Flat multi-class (trained on leaf-node training data) 0.178
Table 3.1: Flat multi-class and hierarchical classification comparison, tested on the reduced leaf-node only data set
3.5 Conclusion
We have investigated various aspects of performing supervised hierarchical classifi- cation on sparsely labelled benthic imagery for the purpose of species recognition. The aim was to apply techniques used in the literature from other fields, to construct an initial solution for the automated interpretation of AUV images.
Results have shown over a range of feature descriptors and patch sizes, that with the PGM prediction, better results were obtained than using simple MPS. This is promising for future directions as it is a more principled approach. However, the best results were achieved employing thresholded MPS.
We also compared two different classifier training approaches, namely the local sib-
ling policy against the global inclusive policy. It was demonstrated that compara-
tively, the sibling and inclusive training policies exhibit similar performance, with
the sibling option holding the advantage due to reduced training time.
In addition, the comparison with a flat multi-class approach on the reduced data set confirms that basic performance at leaf-node level is at least as good as the traditional approach, using our hierarchical classification scheme.
Future work will cover a number of areas. In terms of the hierarchical classification, we will investigate potential improvements, such as the use of PGMs with variable
Chapter 3 Hierarchical Classification in AUV Imagery
depth prediction. Because the classification scheme permits different features to be used at different nodes, another challenging area of research will be to find ways of incorporating other sensor modalities from the AUV (such as dense stereo informa- tion) to enhance the ability of the classifier to distinguish between various species and objects. Although the automated semantic labelling described in this paper has been applied as a post-processing step, eventual incorporation as a real time algorithm on-board the robot would have further benefits. Communication with the AUV through water is typically performed using an acoustic modem, which has far lower bandwidth than what is necessary to transmit raw image and sensor data. If the robot can “understand” what it sees by assigning automated labels, it could use that information to either adapt its behaviour, or relay it to the human operators for monitoring and intervention.
4 BENTHOZ-2015: An Australian
benthic data set
The automated semantic interpretation of AUV images relies on research from vari- ous fields (e.g. machine learning and computer vision) being applied to work done in marine robotics and marine science. A fundamental limitation to this is open data. Without open data sets, every piece of work seeking to make progress in the field is necessarily done by researchers with close links to the robotics and marine science groups that captured and annotated the data. For those able to access relevant data, it is typically necessary to manually define and process a usable data set. As well as requiring a substantial investment of time, it is then difficult to make comparisons with work done by other researchers, or for others to replicate or build on results. The solution to this problem in the computer vision and machine learning communi- ties has been to create open data sets, on which researchers publish the performance of their algorithms. In this chapter, we have assembled and published a compre- hensive data set containing AUV captured benthic images, sensor data, and expert annotated labels from around Australia. This work required collaboration between five research institutions, and a large number of contributors (as evidenced by the author list). We use it as a consistent data set for work in the remainder of this the- sis, and invite other researchers to use the public data set to reproduce our results,
Chapter 4 BENTHOZ-2015: An Australian benthic data set
improve on our algorithms, or conduct their own research.
The text below was published in Nature’s Scientific Data journal as a data set descriptor in October 2015 [13]. Both the paper and data set were released under the Creative Commons 4.0 license, in accordance with the Nature Publishing Group’s policies.
Contribution: I organised the collaboration, prepared the data set from the files pro- vided to me by the marine scientists, and wrote the majority of the manuscript. Marine scientists wrote and submitted their own sections describing methodology and validation methods in their geographic region (organised by Renata Ferrari). Authors listed on the publication were: Michael Bewley, Dr. Ariell Friedman, Dr. Renata Ferrari, Dr. Nicole Hill, Dr. Renae Hovey, Dr. Neville Barrett, Dr. Oscar Pizarro, Dr. Ezequiel Marzinelli, Dr. Will Figueira, Ms Lisa Meyer, Russell Bab- cock, Dr. Lynda Bellchambers, Prof. Maria Byrne, Prof. Stefan Williams. NB: Dr. Marzinelli was added since the original version as an erratum.
Abstract: This Australian benthic data set (BENTHOZ-2015) consists of an expert-
annotated set of georeferenced benthic images and associated sensor data, captured by an autonomous underwater vehicle (AUV) around Australia. This type of data is of interest to marine scientists studying benthic habitats and organisms. AUVs col- lect georeferenced images over an area with consistent illumination and altitude, and make it possible to generate broad scale, photo-realistic 3D maps. Marine scientists then typically spend several minutes on each of thousands of images, labeling sub- stratum type and biota at a subset of points. Labels from four Australian research groups were combined using the CATAMI classification scheme, a hierarchical clas- sification scheme based on taxonomy and morphology for scoring marine imagery. This data set consists of 401,850 expert labeled points from around the Australian
4.1 Background & Summary
coast, with associated images, geolocation and other sensor data. The robotic sur- veys that collected this data form part of Australia’s Integrated Marine Observing System (IMOS) ongoing benthic monitoring program. There is reuse potential in marine science, robotics, and computer vision research.
4.1 Background & Summary
Less than 0.05% of the global sea floor has been mapped with sonar swath map- ping [28] at high resolution (tens of meters). Coverage at visual resolution (millime- ters) using a camera is substantially lower. Visual resolution permits the detailed analysis of benthic taxonomy; however, this requires image capture at an altitude of several meters above the sea floor, typically traveling slower than walking pace. The growing maturity of AUVs has permitted broader and more systematic visual surveys than traditional diver held cameras or towed video sleds (a system whereby a camera on an underwater sled is attached to a ship by a cable, and towed. The resulting images are lower quality than an AUV as the positioning, particularly alti- tude, is difficult to control precisely). AUVs can operate continuously and precisely at greater depths, with geolocation, sensor data and stereo images captured several times a second. A 3D visual map of the survey area can then be produced from the data. This abundance of data has introduced a new problem for scientists: efficiently extracting and distilling useful information from the raw data.
The data set presented in this paper contains 401,850 expert annotations of 9,791 georeferenced images with associated sensor data (latitude, longitude, depth, alti- tude, salinity and temperature) from around the Australian coast (see Figure 4.1). The annotations conform to a hierarchy of 148 substratum and biological classes (Figure 4.2), and specify the content at specific points within each image. All image
Chapter 4 BENTHOZ-2015: An Australian benthic data set
and sensor data were captured by the Sirius AUV. Sirius is the primary platform
responsible for collecting seafloor images as part of the AUV facility of the Inte- grated Marine Observing System (IMOS) in Australia [37]. Table 4.1 summarizes the number of expert labels applied to each campaign, and Figure 4.1 shows the geographic location of each deployment. The annotation process poses a significant bottleneck, taking a trained marine scientist 5 minutes or more to assign semantic labels to dozens of individual points on a single image using the context provided by the image neighborhood around the point. After a survey is conducted, there is typically a time lag of several years before the labeling is complete, and scientific inferences can begin to be drawn. Even with this delay, it is only practical to label a very small fraction of the data collected by the AUV. For the deployments in this data set, the 9,791 images with labels represent around 2% of the total number of images captured during those deployments.
Machine learning and computer vision techniques have the potential to increase the amount of labeled data and reduce the time it takes to do so. The availability of a set of high quality expert labels with geographic and temporal diversity will permit researchers in these fields to investigate ways to reduce or eliminate the manual la- beling effort, as well as gaining new scientific insights from working with a combined data set. Another significant hurdle to the integrated analysis of benthic imagery data is the lack of standardization between research groups. Until recently, individ- ual research groups have labeled images using a variety of custom labeling systems and standards suited to their particular geographic region and research interests, which limits the ability to perform scientific analysis, or train machine learning al- gorithms on large, varied data sets. In this data set, however, we combine data from four leading research groups, using the recently established Collaborative and Automated Tools for Analysis of Marine Imagery (CATAMI) class hierarchy [2] as