Implementation - Visual SLAM for Autonomous Navigation of MAVs

camera. We divide the map by giving a label to each keyframe and map point, indicating from which camera they have been obtained.

The above organization method can make map operations in both pose tracking and mapping more efficient. For example, it is required to search for all potentially visible map points in each pose tracking process. When performing this search in an image taken by camera C1, now only the map points measured by C1need to be checked whether they

are potentially visible or not. In the mapping process, neighbouring keyframes can be searched in a sub-map only, instead of the whole global map. This is done whenever we need to decide whether a new keyframe should be added, or to choose an existing keyframe for triangulating new map points.

Despite our choice of a sup-map organization method, we still maintain a single global map. It is a trivial issue in PTAM to assume features can be matched among multiple cameras, since both map point triangulation and bundle adjustment are designed to han- dle multiple observations of a feature point. Thus, if multiple cameras can share common perspectives to the environment, e.g. they are mounted looking in the forward direction or the sides of the MAV, we can easily adjust the organization of the map to allow feature matching among multiple cameras. In this case, each sub-map should include all the keyframes and map points generated by all the cameras that share common perspectives.

5.4.2 Pose tracking

Map points in the two sub-maps are reprojected to their corresponding source camera to decide whether they are potentially visible. Successful matches between image features and those potentially visible points serve as image measurements which are used in iter- ative optimizations for the pose update of the camera C1. Then the optimal pose of the

camera C2is computed by using Eq. (5.2).

In the original PTAM, if map point j is measured in image pyramid level s (s ∈ {0, 1, 2, 3}), the measurement noise is estimated to be σj= 2s. In our dual-camera pose

tracking optimizations, we assume that the measurement noises of the two cameras follow the same distribution, i.e. for any camera i ∈ {1, 2} measuring point j, we have σji = 2s. This also applies to the optimizations in bundle adjustment. Measurement

noise of each camera in a multi-camera system using significant different cameras or lenses should not be considered to follow the same distribution. Instead, the distribu- tions can be estimated according to the actual performance of each sensor, as proposed in Scherer et al. (2012).

5.4.3 Mapping

New keyframes K1n and K2nfrom both dual cameras are added to the global map, when

the geometric distance of K1n (or K2n) to its closest neighbor obtained by the same

camera is larger than a threshold. This means, if we obtain a keyframe from any of the dual cameras that should be added to the map, the keyframe obtained by the other camera

at the same time will also be added. Thus, in the global map, each keyframe obtained by the camera C1 is always associated with another keyframe obtained by C2, and vice

versa. Here, the geometric distance measures the sum of weighted translation distance and angular difference, as has been done in the original PTAM.

Additionally, we attempted to implement a scheme that allows individual keyframes from only one of the dual cameras to be added to the global map: a keyframe from a camera is added to the map, only if its geometric distance to its closest neighbor ex- ceeds a threshold. However, this scheme does not obviously reduce the total number of keyframes in the map. On the contrary, it requires a complex logic in order to achieve correct associations between the two keyframe sets which are obtained by the two cameras.

To achieve real-time performance of the SLAM system during its exploration, we only retain the local bundle adjustment and abandon global bundle adjustment in the mapping thread. The local bundle adjustment process involves a subset of map points and keyframes in the global map which are generated by both cameras. It computes the pose updates for keyframes and the 3D position updates for map points, which are added to the keyframe set Kaand the point set pa, respectively. As explained in Sec. 5.3.4, only

the keyframes from the camera C1will be added to Ka. Keyframes which are obtained

by the camera C2and associated (rigidly connected) to Kaform the set Kc, whose poses

can be computed by using the optimized poses of Ka. paconsists of all the points which

are measured in Ka or Kc. A further fixed keyframe set Kf contains any keyframe in

which a measurement of any point in pa has been made. Then the minimization of the

local bundle adjustment becomes {{µi∈Ka},{pj∈pa}}= argmin {{µ},{p}} X i∈Ka∪Kc∪Kf X j∈pa∩Si Obj | eji| σji ,σT ! , (5.13)

which is solved by using the Levenberg-Marquardt method (Hartley and Zisserman, 2004) as in the original PTAM. The Jacobians of ei jare solved as described in Sec. 5.3.4.

We use a similar strategy as in PTAM to define the keyframe set Ka: It consists of nk

keyframes obtained by the camera C1, including the newest keyframe and the other nk− 1

ones nearest to it. Normally, we set nk= 5.

5.4.4 Automatic Initialization

Metric scale ambiguity generally exists in monocular camera systems. Our dual-camera system has the same issue since the cameras have no overlap in their respective fields of view, and thus no stereo triangulation can be used to recover the metric scale factor of the map built by the system. We solve this issue by initializing the metric map of our SLAM system similarly as we did in Chapter 4. We use the initialization module presented in Chapter 3 to robustly estimate the pose of the downward-looking camera C1 during the

In document Visual SLAM for Autonomous Navigation of MAVs (Page 97-99)