Dense Visual Odometry - Visual Perception For Robotic Spatial Understanding

While ICP focuses on aligning point clouds by minimizing the point to plane distance between all point correspondences, Steinbrucker’s dense visual odometry algorithm focuses on minimizing the photometric error between corresponding points[107], [192]. The algorithm is predicated on the photometric constancy assumption, i.e., that given two nearby but different viewpoints of the same scene, a single point on a surface reflects the same amount of light and therefore produces a similar intensity for the

corresponding point in each view4_{. Whether this is a realistic assumption will depend}

on the nature of the scene, including the lighting, surface materials, and the nature of the transformation between the two views. Practically, and ignoring prevalent but minimal specular effects, smaller camera motions support the assumption better. The camera itself is also a complicating factor: most RGB-D cameras provide an automatic exposure and automatic white balance function that can significantly change the overall brightness of the scene and otherwise confuse the photometric function since corresponding points no longer have the same intensity value between frames (more

recent5 direct VO methods take great pains to account for this effect by calibrating

and estimating online exposure parameters, for example, see [12], [50], [234]).

Why is this approach interesting, when the assumption is so tenous? Ultimately, an ICP approach will only function on point clouds with enough geometry to constrain the least squares optimization. DVO, on the other hand, is capable of using geometry

and appearance to compute the information needed to align frames. While this may

not be as effective at aligning individual pixels as using point features derived from appearance (i.e., indirect methods with keypoints and descriptors), it is the dense information over large regions of the image that allows the algorithm to compute a transform even when there is no significant geometry; e.g., when looking at a poster on a wall.

Dense visual odometry (DVO) exploits the depth / intensity pair produced by RGB-D cameras, and operates using the following equation, where a point on a surface is assumed to have the same appearance (i.e., intensity) when seen from two different viewpoints:

It(x) =It+1(τ(x, θ)), (5.15)

where It and It+1 are consecutive images,θ∈SE(3) andτ(x, θ) is a warping function

that maps pixels from frame t to t+ 1 such that the error derived from Eq. 5.15 is

4_{I would be inclined to call this the photometric constancy}_{simplification}_{, since there is rarely a}

case where this would actually hold.

minimized: Edvo(θ) = Z x∈Ω [It+1(τ(x, θ))−It(x)] 2 dx. (5.16)

The warping function, in particular, enables us to compare frames by transform-

ing the points in frame t+ 1 to framet using the currently estimated transformation

parameters, and subsequently projecting them to the image plane to determine their intensity values. The DVO algorithm proceeds by iteratively computing the transfor-

mation that minimizes theEdvo energy. Each iteration computes the Jacobians based

on mapping the current frame intensity image onto the transformed point cloud from the previous frame, then computes the difference between the expected intensity values and the mapped intensity values. A set of partial derivatives are computed at each point, along with the residual, and each is accumulated into the normal equations for the least squares optimization.

The DVO energy is optimized using an iteratively reweighted least squares approach in an inverse compositional formulation. The original formulation is the stan- dard forward additive procedure proposed by Lucas and Kanade in their optical flow paper [133]. The objective function is optimized with respect to the change in the

parameters of the warping function (θ ∈SE(3)) and the parameters are updated by

simply adding the delta values. In the inverse compositional model [5], the objective

is modified to estimate the incremental warp (the composition of warps,θ = ∆θ+θ)

as opposed to just the change in the parameters. While this is provably equivalent to the forward additive method if the set of warp functions form a group, there is a computational benefit: the Hessian may be computed once, as it becomes indepen- dent of the parameters. For more details, we refer the reader to the overview paper by Baker and Matthews [5].

To handle larger transformations between frames, particularly due to the photometric constancy assumption, DVO uses a coarse to fine approach. An image pyramid is generated using the pyramid down operator (Gaussian filter followed by a subsam- pling). Then, starting from the lowest resolution of the pyramid, the algorithm is run to determine the estimated transformation parameters for that level, followed

by the optimization at the next level using an initialization from the previous level parameters; this continues until the highest resolution pyramid level is used.

In practice, we found that when the photometric constancy assumption was vio- lated or an image difference was too large that the algorithm diverged. Since DVO, like ICP, is based on a non-linear least squares optimization approach, each step uses a linearization of the gradient. Therefore, DVO is a local optimization, and may not converge to a globally optimal solution.

In document Visual Perception For Robotic Spatial Understanding (Page 118-121)