5.5 Conclusion
6.1.1 Evaluation of the Long-term Robustness
The performance of interest point detectors and descriptors has been evaluated in the context of visual tracking [48] and SLAM [49] but not with a focus on long-term robust- ness. In [158, 159] the authors investigated the feasibility of feature based topological localization in long term outdoor experiments. They found that SIFT and SURF features enable a localization rate of 80 − 95% when using high resolution images and applying the epipolar constraint. However, the data set used in the experiments has been recorded on a campus area and thus contains many static scene elements like buildings and other man made objects. To investigate the performance of local visual features on image data
Figure 6.1: Garden time-lapse. Images taken at a regular interval with a fixed camera capture the natural variation in appearance over the seasons. Source: http://www.youtube.com/watch? v=7dhT-IJmqcg&hd=1.
6.1. Robustness of Local Visual Features 77
according to our scenario of long-term localization in garden like environments we per- formed an evaluation using a time-lapse recording with a fixed camera of a garden over the course of a year (see Fig. 6.1 for example images). For the evaluation we considered SIFT [86], SURF [10] and its upright version and the FAST-detector [130] combined with the BRIEF-descriptor [17]. The upright version of SURF as well as the combination of FAST and BRIEF do not assign an orientation to detected interest points and thus are not invariant with respect to a rotation of the camera around its optical axis. However, for the scenario of a moving ground vehicle changes in the roll angle should be negligible. To evaluate the performance of the methods 1000 features have been extracted from the first frame with each of the aforementioned detector/descriptor combinations and stored as reference. Then we extracted features from every image of a proceeding sequence of 100 frames and matched them against the respective reference features. Matches have been validated using a distance threshold of 10 pixels and applying the mutual consis- tency check in order to prevent that multiple feature from a query image are matched to the same feature from the reference image.
The performance of the evaluated features is measured in terms of precision and recall. In addition, we also kept track of the count a certain feature from the reference frame was matched over the sequence. The measured precision and recall values of the indi- vidual features are shown in Fig. 6.2a. SIFT and the orientation invariant version of SURF performed worst. Since the image set does not feature rotational movement of the camera and the orientation assignment slightly reduces the distinctiveness of the descriptor the upright version of SURF yields a better performance. The best result was obtained with the combination of FAST and BRIEF.
The number of true positive matches over time for the best performing feature FAST/BRIEF is shown in Fig. 6.2b. While there have been about 580 valid matches found in the first frame the number of true positives steadily decreases to about 50 in the last frame where the appearance has changed drastically due to variations in the vegetation. The lowest number of matches was obtained for a frame in winter time where large parts of the scene were covered by snow. In this case only 10 features could be matched successfully. Other negative fluctuations were mainly caused by changes in lighting conditions like cast shadows or overexposed image regions.
The 50 most and least stable features are illustrated in Fig. 6.3. Stable features are mainly located at image patches that correspond to man-made objects like fences and walls, but there are also stable features located at the top of conifers as their foliage is not affected by the seasons and the strong contrast against the sky results in high responses of the interest point detectors. Features from image regions corresponding to lawn and other kinds of vegetation, which strongly varied with the seasons, are the least stable ones.
Although a number of features are persistent over a whole year the overall performance in the examined natural outdoor environment is rather insufficient for the purpose of
78 6. Robust Environmental Representations 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 Recall P re ci si o n FAST + BRIEF U-SURF SURF SIFT (a) 0 20 40 60 80 100 0 200 400 600 # Image (time) # M a tc h ed fe a tu re s (b)
Figure 6.2: Results of the feature evaluation. (a) Precision and Recall of the evaluated features. Orientation invariant features perform worse than the ones that do not account for orientation. The combination of the FAST detector and binary BRIEF descriptor performs best while the overall performance is still rather low. (b) The number of feature matches using FAST and BRIEF steadily decreases with negative fluctuations that are mainly caused by changes in global lighting.
long-term localization. The steady decrease in the ratio of true positive matches w.r.t. the reference set would pose a severe problem for a feature based localization system as it increases the probability of false data associations.
6.1. Robustness of Local Visual Features 79
(a) (b)
Figure 6.3: Most stable and unstable features. (a) The 50 most stable features are mainly located in image regions that correspond to man made objects. The high contrast between the top of conifers and the sky results in high responses of the feature detectors. Since their foliage does not change with the seasons some stable features can be found at their crowns. (b) The 50 least stable features correspond to image region that are heavily affected by seasonal changes like lawn and other kinds of vegetation.