Uncalibrated Image Rectification using structural congruency

4. Rectification of Wide Baseline Images

4.3 Uncalibrated Image Rectification using structural congruency

In this section, we will explain the basic idea of the proposed rectification method and state it formally.

4.3.1 Basic Ideas

The basic idea is illustrated in Figure 4-4 and Figure 4-5. Let’s say we take two images of a triangle. Because of the wide baseline properties of the imaging system, we may get two images like what are shown in Figure 4-4(a) and (b). Notice that the bottom side of the triangle may of very different look in the two views because of the perspective distortion.

(a) (b)

(a') (b')

H H'

Figure 4-4. Image rectification with distortion minimized. By applying homographies H and H' to (a) and (b) respectively, we obtain image (a') and (b'). The corresponding pixels are on the same scanline after the operations but it is still very difficult to establish pixel to

The traditional method of rectification will minimize the local distortion introduced by the rectified homographies, so that a possible result may be look like in Figure 4-4(a') and (b'). The two images are rectified with no doubt. But it is impossible for us to get a fine dense matching for the two images because the sizes of the triangles in the two images are quite different. So we can try applying an affine transformation after the two images are rectified, as shown in Figure 4-5.

(a) (b)

(a') (b')

H H'

(a'')

Maximize the shape congruency of (a'') and (b')

Figure 4-5. Using proposed method, we apply one more affine transformation A on intermediate result (a') to maximize the shape congruency of the scenes and objects in the

We use affine transformation here because it will change the x-coordinate of the rectified image and thus will not introduce severe distortions that are not welcome here. Using optimization method, we can find the appropriate affine transformation so that the transformed triangle could be look like the other one as much as possible.

4.3.2 The Delaunay Triangulation

Based on the previous analysis, we naturally think about using the feature points detected in the input images to generate a network of control triangles. And then we can consider the net shape difference for the triangle network. Luckily, there is a readily tool to generate triangle network, i.e., the Delaunay triangulation. This is a well-developed research topic in computational geometry; for the details of how to construct a triangle network from a point set, you are referred to [Ber-08]. For a sample Delaunay triangle network, see Figure 4-6. All the triangles are identified by a list of its vertex. Such as in Figure 4-6, the triangles identified by yellow deltas are (62,8,21), (87,20,62) respectively.

Figure 4-6. The Delaunay triangle network generated using the feature points detected on an image of a silo.

4.3.3 The Proposed Algorithm for Wide Baseline Stereo Rectification

Based on the discussion in the previous section, it is clear that epipolar rectification and distortion compensation are needed before dense matching can be applied on wide baseline stereo image pairs. Epipolar rectification can simplify the dense matching task by reducing 2D searching to 1D searching while distortion compensation compensates for the perspective distortion and made the two views look similar in shape and pose. We define wide baseline image rectification as “epipolar rectification with shape difference minimization ”. Now we propose our new algorithm that combines the epipolar rectification and shape distortion compensation. The epipolar geometry of a wide baseline stereo system can be illustrated as in Figure 4-7.

Figure 4-7. The epipolar geometry

To rectify the epipolar lines, two homographies H and H' can be applied on 𝐼₁ and 𝐼₂, respectively. To make the conjugate epipolar lines (𝐦𝐞�� and 𝐦′𝐞′�� in Figure 4-7) parallel with x-axis and collinear, the equalities 𝐇𝐞 = (1,0,0)𝑇and 𝐇′𝐞′ = (1,0,0)𝑇 are necessary. The fundamental matrix for the rectified stereo pair is given by 𝐅� = [𝐇′𝐞′]_×= [(1 0 0)𝑇]_×,

which is a 3 × 3 skew symmetric matrix. A major constraint on the two rectifying homographies is given by,

, where 𝐅� = �

0 0 0

0 0 1

0 −1 0�.

This equation imposes no constraint on the first row of H and H', which allows certain degrees of freedom of choosing H and H' to attain more objectives.

The outline of the proposed rectification method with maximized structural congruency is as follows:

Proposed Algorithm for Wide Baseline Stereo Rectification

1. Detect and establish the matches between two n- feature point sets, {𝐦_𝐢} ↔ �𝐦_𝐢′�, 𝑖 = 1,2, … , 𝑛. Here {𝐦𝐢} ⊆ 𝐼1 and �𝐦𝐢′� ⊆ 𝐼2.

2. Estimate the fundamental matrix F using the matching point sets. Solve for the epipoles e and e'.

3. Apply a homography 𝐇′ on 𝐼₂ and get the image 𝐼̅₂, such that the epipole e' is sent to infinity.

4. Generate the 2D triangulation net N2 for 𝐼̅2 using the vertices�𝐇′𝐦_𝐢′�, 𝑖 = 1,2, … , 𝑛,

and construct a lookup table of triangles of the net.

5. Apply a compatible homography 𝐇_𝟎 on 𝐼₁ such that 𝐇_𝟎𝐞 = (1,0,0)𝑇.

6. Apply an optimized affine transformation A on 𝐇_𝟎𝐼₁ to maximize the structural

congruency between the triangulation nets.

Essentially, image 𝐼₂ is transformed by some quasi-rigid transformation to 𝐼̅₂. The inherent structure of 𝐼̅₂ is represented by the triangulation net N2, with the feature points as its

vertices. N2 is then used as a reference of the rectified structure. Finally, the homography

applied on 𝐼₁ is optimized to drive the structural congruency between the rectified images 𝐼̅1 and 𝐼̅2.

Typically, we can choose 𝐇′ and 𝐇_𝟎 such that Equation (4-3) is satisfied. As the first step

of our exploration, we adopt the choices described in [Har-99] and [Har-03] in our experimentation.

𝐇′ is the resultant of three sequential operations. Initially, the origin of the image coordinate system is sent to the center of 𝐼₂ by a translation matrix T. After that, a rotation 𝐑𝜑 can be applied to relocate the epipole 𝐞′ = �𝑒𝑥′, 𝑒𝑦′, 1�𝑇 onto the x-axis with a rotation

angle φ = − tan−1�𝑒_𝑦′/𝑒_𝑥′�. Finally, a quasi-rigid projective transformation G is applied to send 𝐞�′ = (𝑒̃_𝑥′, 0,1)T to (1,0,0)𝑇. Or equivalently we can state that 𝐇′ = 𝐆𝐑_φ𝐓, where

𝐆 = � 10 0 01 0 −1/𝑒̃𝑥′ 0 1

In document Wide Baseline Stereo Image Rectification and Matching (Page 106-111)