Related Work - Robust and Efficient Camera-based Scene Reconstruction

To overcome the issues, we propose a 3D orientation descriptor that can be inferred from its 2D counterpart and the intrinsic camera calibration (per pixel view direction in view space): In principle, our orientation descriptor defines a triplet of orthonormal vectors for each feature whose base axes are given by (1) the 2D feature orientation trans- formed to view space, (2) the view direction towards the feature in view space and (3) the cross product of (1) and (2) (see Figure 3.4).

Since this 3D orientation descriptor is constructed in view space, it is independent of the camera type and parameters and therefore allows for significantly more robust matching of features between omnidirectional and wide-angle camera images. The number of spurious matches is significantly reduced using our 3D orientation while positive matches are essentially detected unhindered. Subsequent processing tasks such as sparse geometry reconstruction (see Section 2.4) become significantly more robust and more efficient after this simple outlier removal based on 3D orientation.

In the remaining part of this chapter, we introduce related work to the handling of wide FoVs as well as omnidirectional cameras with local image features. We present our 3D orientation descriptor in detail, describe our match refinement strategies and evaluate and compare our work to the common use of 2D feature orientations.

3.2 Related Work

Finding point correspondences in image pairs is usually done in four steps: First, dis- tinctive keypoints are found with position, orientation and scale (Section 2.1.3). Their local image content appearance is stored in a feature descriptor (Section 2.1.1) which is matched based on some distance norm (Section 2.1.4). While the best match is found for each feature individually, the set of all found matches can be refined in common based on geometrical consistency of the keypoints as a final step: the SIFT framework [Low04] introduced analyzing keypoint orientations of all matches for consistency. By dropping inconsistent matches, the number of outliers is reduced. This match refinement stage is what we analyze and improve in this chapter.

Match refinement can be applied to any feature matching framework that extracts feature orientations such as the already presented SIFT [Low04], SURF [BTVG06], GLOH [MS05], Oriented FAST [RD06, RRKB11], BRISK [LCS11] and FREAK [AOV12] (Summary in Section 2.1.5). While SIFT, SURF and GLOH assign orientations based on the local gradient, newer descriptors like Oriented FAST uses intensity centroids [Ros99]: it calculates moments

mpq = ∑

u,v

upvqI(u, v) p, q ∈ {0, 1} (3.1)

for all pixels of a patch and obtains the orientation based on their gradients. BRISK [LCS11] and FREAK [AOV12] use the sum of gradients obtained from special

sample pairs of patches around a keypoint. Our method is not only applicable to point features: Scaramuzza et al. [SCMS08] extract vertical line features which are oriented by definition. Lu and Wu [LW08] do quasi-dense matching of omnidirectional and per- spective images using an affine model for local transformations which can be reduced to a rotation.

There are several SIFT extensions which improve matching wide FoV and omnidirectional camera image features by better feature extraction instead of refining the matches in the end: Arican and Frossard [AF10] focus on calculating the scale space pyramid used for keypoint detection correctly for parabolic mirrors by solving the heat equation using Riemannian geometry. Hansen et al. [HCB10] and Cruz-Mota et al. [CMBP+12] make features invariant against distortions by projecting the images to a sphere and then back to planes which are tangential to the sphere at the feature’s location. Lourenço et al. [LBV12] implement scale space construction and feature extraction without image re- sampling by using adaptive filtering that compensates image distortion. Our approach of match refinement is orthogonal to all the publications focusing on better feature extraction and can be applied on top of those for further improvement.

3.2.1 Applications

Our match refinement provides advantages on all cameras with a large field of view or distorted images, e.g., currently popular action cams like the GoPro Hero Series.

The greatest improvements can however be observed on omnidirectional camera systems, e.g. custom camera rigs, professional devices such as the various Point Grey’s Ladybugs, commercial omnidirectional camera systems like the 360◦camera in the new Mercedes E-Class or the Panono Panoramic Ball Camera. Beside one shot solutions, many smartphones are able to capture panoramas, e.g. by using Photo Sphere (Google) or Photosynth (Microsoft).

3.3 3D Orientation Descriptor

Previously, orientations of local 2D image features were simply represented by an orien- tation angle o.

We represent orientations with our 3D orientation descriptor O which can be calculated for each feature and is made up of three orthonormal vectors which define an Euclidean 3D space. For each match of two features with their 3D orientation descriptors O1, O2,

one can find the rotation of the match R3D _{= O}

2 · O1−1 which transforms from one

feature space to the other. The orientation descriptors are designed in a way that the match rotation R3D _{is roughly the inverse of the camera rotation and therefore similar}

for all correct matches of an omnidirectional image pair.

The 3D orientation descriptor is calculated based on the 2D orientation o of a feature in the image plane, its position (u v) ∈ R2 in the image and the intrinsic calibration of

3.3 3D Orientation Descriptor

Figure 3.4: 3D orientation descriptor. The red, green and blue vectors represent the axes of the

orthonormal descriptor frame defined by the orientation and position of each feature.

Figure 3.5: Descriptor behavior under camera rotation. If the camera is rotated, all 3D feature

orientations perform an inverse rotation.

the camera (or camera rig)

C(u, v) = π

−1_{((u v 1); p} c)

kπ−1_{((u v 1); p}_c₎_k ∈ {R

2 _{→ R}3_} _(3.2)

that maps image coordinates to directions in camera space (compare to Section 2.2.3). Note that calculating the 3D descriptor works the same way for single images, spherical or cylindrical mappings or sets of images that represent an omnidirectional view.

The 3D orientation descriptor is defined by O = ( ~o1o~2o~3) (see also Figure 3.4):

~ o1 =

C(u + cos(o), v + sin(o))− C(u, v)

kC(u + cos(o), v + sin(o)) − C(u, v)k (3.3)

o2 = C(u, v) (3.4)

o3 = o~1× ~o2 (3.5)

3.3.1 Properties

By design, the rotation R3D _{obtained from two matched features is the inverse of the}

camera (or camera rig) rotation (see Figure 3.5). In contrast to 2D orientations, all our 3D orientation descriptors rotate the same way under arbitrary camera transformations.

This allows for analyzing a set of feature matches for consistent rotations and discarding inconsistent matches.

Another benefit is that we incorporate the features’ viewing directions dcinto the ori-

entation descriptors. This enables us to distinguish between features with similar appearance and 2D orientation but different viewing direction.

In document Robust and Efficient Camera-based Scene Reconstruction (Page 63-66)