Evaluation of Repeatability - Viewpoint-Invariant Patch

3.2 Viewpoint-Invariant Patch

3.2.4 Evaluation of Repeatability

To measure the repeatability of VIP features, I conduct an experiment similar to the evaluation done by Mikolajczyk et al. (2005). This experiment is done jointly with Xiaowei Li and Brian Clipp.

We use the wall dataset shown from Mikolajczyk et al. (2005) for the evaluation experiment. As shown in Figure 3.5, this dataset is a sequence of images of a brick wall taken with increasing angles between the optical axis and the wall’s normal. Each of the images of the wall has a known homography to the first image, which was taken with image plane fronto-parallel to the wall. Using this homography we extract a region of overlap between the first image and each other image. We extract features in this area of overlap and evaluate two performance measures: the number of inlier correspondences and the re-detection rate, for a number of feature detectors. The number of inliers is the number of feature correspondences which fit the known homography. Re-detection rate is the ratio of inlier correspondences found in these overlapping regions to the number of features found in the fronto-parallel view. The number of inliers is shown in Figure 3.6 and the re-detection rate is shown in Figure 3.7.

Figure 3.6 demonstrates that VIP matching generates a significantly larger number of inliers over the wide range of angles than the other detectors. The other detectors we compare to are SIFT [Lowe (2004)], Harris-Affine [Mikolajczyk and Schmid (2004)], Hessian-Affine [Mikolajczyk and Schmid (2004)],Intensity Based Region (IBR) [Tuyte- laars and Van Gool (2004)], Edge Based Region (EBR) [Tuytelaars and Van Gool (2004)], and Maximally Stable Extremal Region (MSER) [Matas et al. (2002)]. The

20 25 30 35 40 45 50 0 1000 2000 3000 4000 5000 6000 7000 angle (deg) num inliers VIP SIFT harris affine hessian affine IBR EBR MSER

Figure 3.6: Number of inliers under projective transformations

20 25 30 35 40 45 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 angle (deg) detection rate VIP SIFT harris affine hessian affine IBR EBR MSER

Figure 3.7: Re-detection rate of features under projective transformations proposed VIP feature also has a significantly higher re-detection rate than the other detectors as seen in Figure 3.7. This high re-detection rate demonstrates the power of geometry-guided feature detection on ortho-rectified views. Even under large viewpoint changes which often result in projective transformations between images that affine transformations cannot accurately approximate, VIP performs well.

3.3 3D Model Matching with VIPs

The previous section addressed the VIP detection problem, and any 3D model can now be represented as a set of VIPs. This section discusses how to apply VIPs to match and

align two 3D models. In fact, VIP-based matching is not limited to 3D model alignment, Section 3.6 will show how to utilize the 3D model matching as a basic function in loop detection for large scale structure from motion and 3D model based location recognition. Like other feature based matching, putative VIP matches can be obtained by matching the feature descriptors, for example, by the nearest neighbor matching. After ob- taining all the putative matches between two 3D scenes, it is then possible to find an optimized scene transformation from the 3D transformation hypotheses given by VIP correspondences. Since VIPs are viewpoint invariant, given a correct camera matrix and 3D structure, we can expect the similarity between correct matches to be more accurate than a transformation derived from viewpoint dependent matching techniques.

The rich geometric dimensions of VIP feature allow recovery of the 3D similarity transformation between two scenes from a single match. The ratio of the scales of two VIPs expresses the relative scale between the 3D scenes; the normals and orientations of the VIP pair determine the relative rotation. The translation between the scenes can be obtained from relative feature positions after compensating the scaling and rotation. The scaling and rotation needed to bring corresponding VIPs into alignment is uniform for all the correct correspondences.

A hierarchical method is proposed in this section to estimate the 3D similarity transformation between two scenes from putative VIP matches. Each single VIP correspondence delivers a unique 3D similarity transformation, and so hypothesized matches can be tested efficiently. Furthermore, the rotation and scaling components of the similarity transformation are same for all inlier VIP matches and they can be tested separately and efficiently with a voting scheme.

3.3.1 3D Similarity from a Single VIP Match

The scaling, rotation and translation of a 3D similarity transformation can be fully decided by six constraints from 3 pairs of 3D point matches. As illustrated in Figure 3.8,

a single VIP correspondence gives exactly 3 pairs of point matches, which determines a unique 3D similarity transformation. Specifically, given a VIP correspondence of (x1, σ1, n1, d1, s1) and (x2, σ2, n2, d2, s2), the scaling between them is given by σs = σ_σ1₂. The rotation between them satisfies (n1,d1,d1×n1)Rs = (n2,d2,d2×n2). The

translation between them is Ts = x1 −σsRsx2. A 3D similarity transformation can be

formed from the three components as (σsRs, Ts).

Figure 3.8: Three pairs of point matches given by one VIP correspondence can uniquely determine a 3D similarity transformation.

In document Geometry-driven feature detection (Page 46-49)