2.3 Experimental Results
2.3.1 Estimating Multiple Affine Models
In this section we apply PEaRL to estimate affine transformations in the context of
rectified narrow baseline stereo. We use SIFT [20] features as points of interest, since they are scale and rotation invariant. They are also partially invariant to illumination and 3D camera view point changes. Matches between pairs of points in two images are found using exhaustive search along the corresponding scan line. In principle, it is possible to replace exhaustive search with ”smarter” methods as in [1, 22].
We will use the notation (xl, yl), (xr, yr) to describe the coordinates of the image
feature on the left image pl and right image pr.The symbolp denotes a pair of matching
points (xl, yl, xr, yr) andP is the set of matched pairs. Restricting the search for match-
ing pairs to the corresponding epipolar lines8 corresponds to imposing an additional constraint |yl−yr|< for some small threshold .
A planar homography has only three degrees of freedom for rectified stereo images. In this case the epipole e = [1 0 0]T is at infinity and the fundamental matrix can be written as F = [e]x = 0 0 0 0 0 −1 0 1 0 .
where [e]x describes the skew-symmetric matrix of the vectore. Following [12], a planar
homographyH satisfies the following constraintHTF+FTH = 0. Then, it can be shown that any planar homography for a rectified stereo pair is an affine transformation
A= a b c 0 1 0 0 0 1 .
with 3 degrees of freedom corresponding to parameters a,b, and c. This transformation can be uniquely identified from three matching pairs. We first generate a finite set of initial model proposals by randomly sampling three pairs of matching points and by computing parameters a, b, cfor the corresponding models.
One simplistic way to measure geometric error between matching pairp= (xl, yl, xr, yr)
and model A is a non-symmetric (left) transfer errorfrom pl= (xl, yl) to pr = (xr, yr)
||p−A||=|Apl−pr|2 = ∆2x (2.9)
where
∆x = (axl+byl+c−xr)
is a horizontal shift between Apl and pr along the epipolar line, see Fig. 2.12(a). This
basic approach assumes that thevertical shift between Apl and pr
∆y = (yl−yr)
is zero. This could be justified because our matched pairs p={pl, pr} are points on the
same scan lines (|yl−yr|< ) and affine transformationA respects such (epipolar) lines.
2.3. Experimental Results 41
(a) non-symmetric (left) transfer error
(b) reprojection error
Figure 2.12: Geometric fitting errors ||p−A|| for rectified stereo. Assuming matched points p = {pl, pr} are on the corresponding epipolar (scan) lines ∆y = yl −yr = 0,
the left transfer error is a horizontal shift ∆x between pr and Apl as shown in (a).
The standard re-projection error (b) is obtained by treating points pl and pr as noisy
observations of some hidden “true” pair of matched points ¯p= {p¯l,p¯r} where ¯pr =Ap¯l
Alternatively, one can use the re-projection error [16] illustrated in Fig.2.12(b). This approach treats pl and pr as noisy observations of some unknown “true” points ¯p =
{p¯l,p¯r} estimated by minimizing the observation noise, as follows
||p−A||= min
¯
p |p¯l−pl|
2
+|p¯r−pr|2 s.t. p¯r =Ap¯l.
For ¯p={pl, Apl}the objective function above equals|Apl−pr|2 and, therefore, optimiza-
tion over all ¯p should give an error smaller than (2.9). That is, our transfer error (2.9) can over-estimate the observation noise, as defined by the constrained optimization prob- lem above. The difference could be particularly significant if plane A is near-horizontal. Note that the optimal “true” pair ¯p corresponding to the minimum observation error is typically9 located on a scan line different from those containing data points p
l and pr.
If pl and pr are treated as noisy observations, these points do not need to respect the
epipolar geometry. Thus, we can drop the constraint ∆y = 0 when using the re-projection
error. Following the definition in the previous paragraph, one can derive the following closed formula for the re-projection error specific to our affine transformations
||p−A||= 2∆
2
x + (1 +a2+b2)∆2y − 2b∆x∆y
2a2+b2+ 2 . (2.10)
Figures 2.13(a-c) compare affine model fitting results regenerated by BT [2] and results generated by PEaRL for two different geometric error measures ||p−A|| in (2.9) and
(2.10). We applied PEaRL to energy
E(A, f) = X p∈P ||p−Afp||+λ X (p,q)∈N [fp 6=fq) +β· X h∈L δh(f)
where A = {Ah | h ∈ L} is set of affine models. Our neighborhood graph N is a
triangulation of points pr in the right image. BT [2] uses dense segmentation of pixels
based on photo-consistency. This measure does not work well in texture-less regions and they have to rely on intensity edges (static cues) to detect the boundaries between regions supporting different models. In contrast, PEaRL labels a sparse set of distinct
features based on geometric errors and spatial regularization. Also, the label cost allows our method to connect spatially disconnected parts of the same model as in Fig. 2.10.
Our results in Figures 2.13(b-c) demonstrate that specific choice of geometric measure ||p−A|| can significantly change the results. For example, minimizing the horizontal transfer error in equation (2.9) worked well for all vertical planes in the scene, but it split the ground plane into two, see Fig. 2.13(b). The re-projection errors (2.10) worked significantly better than the transfer errors. This is particularly obvious for the ground plane. As mentioned earlier, the transfer error (2.9) significantly over-estimates the observation noise for near-horizontal planes. This is fairly analogous to the consequences of using “vertical” point-to-line distance ||p−L||=|y−ax−b|2 for fitting near-vertical
lines L= (a, b) to 2D points p= (x, y).
In order to provide some quantitative comparison between the affine models generated by BT [2] and PEaRL, we used a “ground truth” image, shown in Fig. 2.14(a), where
2.3. Experimental Results 43
(a) BT [2] (b) PEaRL(2.9) (c) PEaRL(2.10)
segmentation of pixels classification of inliers classification of inliers
Figure 2.13: Results for ”Clorox” stereo pair [2]. (a) Dense pixel segmentation by BT [2] uses photo-consistency. (b-c) Sparse inlier classification by PEaRL using geometric fit measures (2.9) and (2.10), respectively.
Line BT [2] PEaRL(2.9) PEaRL(2.10)
L1 34.55 8.80 4.37
L2 16.83 4.82 7.5
L3 5.56 13.27 9.53
L4 5.99 4.46 8.47
Total 62.93 31.35 29.87
Table 2.1: Geometric errors for lines in Fig. 2.14 where affine models intersect. Errors are computed as the sum of distances between the ground truth line segment corners and the computed lines. The errors for sequentialRANSAC and J-linkage, see Fig. 2.15,
were significantly larger than those above.
we manually extracted the lines corresponding to intersecting planes. Assuming that two intersecting planes π1 and π2 are represented by the affine models Aπ1 and Aπ2, the
homogeneous vector representing the line of intersection is defined as the first row of the matrix (Aπ1 −Aπ2). Therefore, such lines can be computed from the models estimated
by either BT or PEaRL. Table 2.1 compares the accuracy of these lines with respect to
our ground truth.