The majority of previous work studying feature description and matching has fo- cused on matching of visual appearance between aerial captures and reference im- agery, such as the work by Conte. This type of matching restricts the performance of the system as it requires recent reference imagery that matches the current weather and lighting conditions. To overcome the limitations brought on by temporal and/or
seasonal changes, the proposed system uses widely available but visually similar fea- tures, such as buildings, to help locate the vehicle.
Further, landmarks cannot be matched by their individual appearance - a residential home seen from the air in London can appear virtually identical to a home in Aberdeen. This rules out a significant number of the current state of the art feature descriptors in computer vision, such as SIFT and SURF, which match features based on their appearance. Instead the system takes an alternative approach by matching the features collectively; thus an alternative descriptor was developed that encodes the geographic relationship between features.
The descriptor is based on the concept that landmarks can be recognised not by their individual appearance but how they are related to other surrounding features. For example, most people can identify their home among a number of houses along a road in a satellite image, even if the house is identical to the others.
To replicate this interpretation, the new descriptor encodes the relationship between a feature and its immediate neighbours. The relationship is encoded in a binary fea- ture vector, known as a fingerprint, that is scale, rotation and translation invariant. For performance reasons, the descriptor is aggressive and discards most of the data associated with the feature that is irrelevant to interpreting location. The shape of the region is encoded in the feature irreversibly, making it a strictly one-way pro- cess. However, the descriptor is fully repeatable for varying scales and rotations, making it suitable for the detection of geographic landmarks from various altitudes and rotations. This ensures that the features identified by the detector produced the same fingerprints as those within the reference database, whether the latter was extracted from aerial or satellite imagery.
As a result, the operational envelope scope of the system is not limited by the visual positioning system itself, but rather the landmark detection algorithm. In some scenarios this is not an issue, for example, craters can be detected at a variety of sizes, but in building detection amongst others the envelope is much tighter.
Since landmarks, in particular man-made landmarks, tend to be semi-structured the descriptor on its own is not strong enough to match landmarks on a large scale. Semi-structured regions mean that fingerprints for two features can be very similar,
even if they are geographically distant, which becomes a significant issue in the pose estimation algorithm.
This led to the development of a number of strategies in order to reduce the risk of incorrect matches. These include the elimination of poorly conditioned features, high-level region matching and methods to reduce the potential target set extracted from the database. This improved matching performance, giving up to 95% match- ing accuracy in both test scenarios: urban, semi-structured environments and un- structured crater matching.
The results of the project also demonstrated the effect of sub-optimal matching situations, caused by a poorly designed and configured system. In these situations, the matching performance was greatly reduced, with certain parameters rapidly reducing the matching accuracy to less than 30%. It is critically important that any real world implementation of a vision-based positioning system considers the precise requirements in order to optimise the performance of the landmark sensor and associated algorithms. For example, a fundamental assumption in the proposed system is that landmarks can be detected with a low false positive rate and a specific estimated error. If the detection rate of the landmark algorithm is only slightly lower than expected in flight, it will have a dramatic impact on the performance of the overall system.
The results of the project have shown that vision-based positioning systems are highly sensitive; changes in parameters and inputs can have both positive and neg- ative effects on the overall performance. Consequently any proposed system will require extensive tuning tailored for the prescribed task. This is not unique to this project. For example, Visual-SLAM systems, which are among the most robust visual positioning systems currently available, have a multitude of parameters that need to be adjusted depending on the vehicle, trajectory type and likelihood of loop closures.
In addition, the processing platform on which the proposed vision-based positioning system is implemented can limit the overall functionality of the system. For exam- ple, some of the tasks executed by the system, such as landmark detection and the constraint stages of the matcher, can be very computationally expensive and there- fore restrict the use of the system in real vehicles. Fortunately, processing hardware
is becoming faster and more energy efficient every year; at the moment the mobile devices industry is aggressively advancing technology in this field. It is thus likely that by the time VPS is ready for flight testing, the technology will have sufficiently developed to meet the computation demands.
Ultimately, the performance results for the VPS have merit and show the potential for the descriptor and matcher when used in a complete positioning system. How- ever, current work is still in the proof of concept stage and a significant amount of work is required to move towards flight testing.