2.4 UAS Sensors
2.4.3 Computer Vision-Based Sensors
An area of active research inUASnavigation is the use of computer vision-based sensors to provide navigation information for position, airspeed, and attitude. These sensors use
a variety of techniques including optical flow [33], [64], [36], feature detection [50], and vanishing points [65] to provide measurements to the filter. One of the largest advantages of using this type of sensor is that it does not depend on any type of electromagnetic signal
to work properly, making it a complementary sensor to theGPS.
2.4.3.1 Optical Flow
Optical flow is defined in [66] as the distribution of apparent velocities of brightness pattern movements in an image. It is generally calculated by comparing pixels in sequential images
to determine the local velocity of the camera that is capturing the images. This concept can
calculating the apparent local velocities of adjacent buildings or the street below. Accuracy
is typically measured in pixels per frame, with a scaling process necessary to convert to
meters per second.
An ideal optical flow application to the urban environment is the ‘centering response’,
with biological inspiration from bees. Reference [67] explains that bees are able to hold this centerline trajectory by equalizing the apparent motion images on their retinas. This
phenomenon has been demonstrated in biological tests and on UAS operating in urban canyons, in simulation [33], [68] and in experiments [69], [70]. In addition to maintaining a centerline trajectory, further experiments have shown that vehicles equipped with side
facing optical flow sensors and a pair of front facing stereo sensors for obstacle detection
can also navigate 90◦turns in a simulated urban canyon [36].
Standardized accuracy metrics have been established to compare performance of the
different optical flow calculation methods. The two main metrics throughout the literature
are 1) average angular error and 2) endpoint error. Average angular error is described
in [71], [72], [73], [74] and Middlebury Dataset [75]. It is measured as the angle between the true velocity vector ~vc and the estimated velocity vector ~ve in the image coordinate
plane using
ψE = cos−1(vc• ve). (2.27)
Endpoint error [73] defined in image plane coordinates as
Ex= |uc− ue|
Ey= |vc− ve|
(2.28)
can also be used to quantify the absolute magnitude of differences of components.
The Middlebury Dataset is a vast resource providing optical flow accuracy characteriza-
tion information for both metrics over 91 different methods using well-known standardized
flow information is found in AppendixA.
2.4.3.2 Line Detection and Vanishing Points
Another computer vision-based technique forUASnavigation uses line detection and van- ishing points to generate roll and pitch measurements. A brief overview of the process to
measure attitude angles from vanishing points is discussed here with further details avail-
able in [65]. The first step in the algorithm is to detect parallel lines in a two-dimensional image. These lines may represent vertical edges of buildings parallel to the direction of
gravity or horizontal edges of buildings at the street level, orthogonal to the direction of
gravity. Once the lines have been detected, the second step in the algorithm is to follow
the lines to points of intersection as shown in Figure2.1. These points of intersection are known as vanishing points, categorized as either vertical or horizontal based on the direc-
tion of the parallel lines.
Figure 2.1: Example of parallel lines and vanishing points using an urban scene courtesy of Hwangbo and Kanade [65].
The third step in the algorithm is to use the vanishing points to calculate roll and pitch.
A single vertical vanishing point v∗v with coordinates (vx, vy) can be used to calculate both
φ = atan2(vx, vy) (2.29)
and
θ = atanq 1 v2x+ v2y
(2.30)
where the coordinates are generated from the projection of the world z-axis onto the two-
dimensional camera plane.
Horizontal plane vanishing point coordinates, v∗h, are calculated as
v∗h=" cos φ sin ψ − sin φ sin θ cos ψ cos θ cos ψ ,
− sin φ sin ψ − cos φ sin θ cos ψ cos θ cos ψ
#T
(2.31)
using the projection of the direction of travel axis onto the unit vector in the direction of
each horizontal vanishing point. These points can also be used for the calculation of roll
and pitch if at least two horizontal vanishing points are present in an image as equation is
not decoupled in the roll and pitch directions. Once all vanishing points are calculated from
an image, they can be used in a Kalman Filter to reset the error in theIMU-based attitude angle estimates.
2.4.3.3 Feature Detection
In [50], a scale-invariant feature transform technique is used to match features from one image to the next in order to accurately correct position, airspeed, and attitude angles. Use
of feature matching for state estimation has several steps. The first step is to analyze the
initial image, capturing each scale-invariant descriptor (or feature). When the next image
is received, the Euclidean distance is calculated between a descriptor in the first image and
its nearest neighbors in the next image. Once the images are matched, the homography (or
of the camera coordinate out of the bottom of the aircraft before and after transformation.
Reference [50] gives details on how to use the homography matrix to solve for the rotation matrix and the translation vector. This rotation matrix and translation vector can then be
used to correctIMUand altimeter measurements.
2.4.3.4 Urban Cues
In the urban environment, images from computer vision systems can be used to detect
cues such as roads, lane markings, crosswalks, and stop lines. While primarily used for
ground vehicle applications,UAS applications are possible when operating at sufficiently low altitudes to detect and isolate these cues in successive images. Many works have stud-
ied this problem including [76], [77], [78]. Paetzhold and Franke [76] focused on both unknown and known road situations to locate these cues. In the unknown situations they
made assumptions about the characteristics of the cues in order to extract them from im-
ages as polygons. These assumptions include orientation of cues with respect to each other
and the vehicle trajectory as well as constant brightness and linear shape among others.
With a map of cues, frame to frame feature matching was used to generate cue-based state
measurements. Stereo vision was then used to separate vertical and horizontal shapes and
motion as well as identify image clutter. He et al. [77] used an intensity (grayscale) image to detect the road boundaries, including curvature, and then used the full color image to find
the road area within the boundaries. Newman et al. [78] created a navigation system to both generate three-dimensional maps and pose estimates using vision and lasers. Their system
analyzed the generated maps to provide labels within the maps to classify the different parts
of the image as walls, foliage, grass, etc.