• No results found

Computer Vision-Based Sensors

2.4 UAS Sensors

2.4.3 Computer Vision-Based Sensors

An area of active research inUASnavigation is the use of computer vision-based sensors to provide navigation information for position, airspeed, and attitude. These sensors use

a variety of techniques including optical flow [33], [64], [36], feature detection [50], and vanishing points [65] to provide measurements to the filter. One of the largest advantages of using this type of sensor is that it does not depend on any type of electromagnetic signal

to work properly, making it a complementary sensor to theGPS.

2.4.3.1 Optical Flow

Optical flow is defined in [66] as the distribution of apparent velocities of brightness pattern movements in an image. It is generally calculated by comparing pixels in sequential images

to determine the local velocity of the camera that is capturing the images. This concept can

calculating the apparent local velocities of adjacent buildings or the street below. Accuracy

is typically measured in pixels per frame, with a scaling process necessary to convert to

meters per second.

An ideal optical flow application to the urban environment is the ‘centering response’,

with biological inspiration from bees. Reference [67] explains that bees are able to hold this centerline trajectory by equalizing the apparent motion images on their retinas. This

phenomenon has been demonstrated in biological tests and on UAS operating in urban canyons, in simulation [33], [68] and in experiments [69], [70]. In addition to maintaining a centerline trajectory, further experiments have shown that vehicles equipped with side

facing optical flow sensors and a pair of front facing stereo sensors for obstacle detection

can also navigate 90◦turns in a simulated urban canyon [36].

Standardized accuracy metrics have been established to compare performance of the

different optical flow calculation methods. The two main metrics throughout the literature

are 1) average angular error and 2) endpoint error. Average angular error is described

in [71], [72], [73], [74] and Middlebury Dataset [75]. It is measured as the angle between the true velocity vector ~vc and the estimated velocity vector ~ve in the image coordinate

plane using

ψE = cos−1(vc• ve). (2.27)

Endpoint error [73] defined in image plane coordinates as

Ex= |uc− ue|

Ey= |vc− ve|

(2.28)

can also be used to quantify the absolute magnitude of differences of components.

The Middlebury Dataset is a vast resource providing optical flow accuracy characteriza-

tion information for both metrics over 91 different methods using well-known standardized

flow information is found in AppendixA.

2.4.3.2 Line Detection and Vanishing Points

Another computer vision-based technique forUASnavigation uses line detection and van- ishing points to generate roll and pitch measurements. A brief overview of the process to

measure attitude angles from vanishing points is discussed here with further details avail-

able in [65]. The first step in the algorithm is to detect parallel lines in a two-dimensional image. These lines may represent vertical edges of buildings parallel to the direction of

gravity or horizontal edges of buildings at the street level, orthogonal to the direction of

gravity. Once the lines have been detected, the second step in the algorithm is to follow

the lines to points of intersection as shown in Figure2.1. These points of intersection are known as vanishing points, categorized as either vertical or horizontal based on the direc-

tion of the parallel lines.

Figure 2.1: Example of parallel lines and vanishing points using an urban scene courtesy of Hwangbo and Kanade [65].

The third step in the algorithm is to use the vanishing points to calculate roll and pitch.

A single vertical vanishing point v∗v with coordinates (vx, vy) can be used to calculate both

φ = atan2(vx, vy) (2.29)

and

θ = atanq 1 v2x+ v2y

(2.30)

where the coordinates are generated from the projection of the world z-axis onto the two-

dimensional camera plane.

Horizontal plane vanishing point coordinates, v∗h, are calculated as

v∗h=" cos φ sin ψ − sin φ sin θ cos ψ cos θ cos ψ ,

− sin φ sin ψ − cos φ sin θ cos ψ cos θ cos ψ

#T

(2.31)

using the projection of the direction of travel axis onto the unit vector in the direction of

each horizontal vanishing point. These points can also be used for the calculation of roll

and pitch if at least two horizontal vanishing points are present in an image as equation is

not decoupled in the roll and pitch directions. Once all vanishing points are calculated from

an image, they can be used in a Kalman Filter to reset the error in theIMU-based attitude angle estimates.

2.4.3.3 Feature Detection

In [50], a scale-invariant feature transform technique is used to match features from one image to the next in order to accurately correct position, airspeed, and attitude angles. Use

of feature matching for state estimation has several steps. The first step is to analyze the

initial image, capturing each scale-invariant descriptor (or feature). When the next image

is received, the Euclidean distance is calculated between a descriptor in the first image and

its nearest neighbors in the next image. Once the images are matched, the homography (or

of the camera coordinate out of the bottom of the aircraft before and after transformation.

Reference [50] gives details on how to use the homography matrix to solve for the rotation matrix and the translation vector. This rotation matrix and translation vector can then be

used to correctIMUand altimeter measurements.

2.4.3.4 Urban Cues

In the urban environment, images from computer vision systems can be used to detect

cues such as roads, lane markings, crosswalks, and stop lines. While primarily used for

ground vehicle applications,UAS applications are possible when operating at sufficiently low altitudes to detect and isolate these cues in successive images. Many works have stud-

ied this problem including [76], [77], [78]. Paetzhold and Franke [76] focused on both unknown and known road situations to locate these cues. In the unknown situations they

made assumptions about the characteristics of the cues in order to extract them from im-

ages as polygons. These assumptions include orientation of cues with respect to each other

and the vehicle trajectory as well as constant brightness and linear shape among others.

With a map of cues, frame to frame feature matching was used to generate cue-based state

measurements. Stereo vision was then used to separate vertical and horizontal shapes and

motion as well as identify image clutter. He et al. [77] used an intensity (grayscale) image to detect the road boundaries, including curvature, and then used the full color image to find

the road area within the boundaries. Newman et al. [78] created a navigation system to both generate three-dimensional maps and pose estimates using vision and lasers. Their system

analyzed the generated maps to provide labels within the maps to classify the different parts

of the image as walls, foliage, grass, etc.