The Geometry of Multiple Views - 2461059 Computer Vision Solution Manual

PROBLEMS

10.1. Show that one of the singular values of an essential matrix is 0 and the other two are equal. (Huang and Faugeras [1989] have shown that the converse is also true—

that is, any 3× 3 matrix with one singular value equal to 0 and the other two equal to each other is an essential matrix.)

Hint: The singular values ofE are the eigenvalues of EE^T.

Solution We have E = [t×]R, thus EE^T = [t×][t×]^T = [t×]^T[t×]. If a is an eigenvector ofEE^T associated with the eigenvalue λ then, for any vector b

λb· a = b^T([t×]^T[t×]a) = (t× b) · (t × a).

Choosing a = b = t shows that λ = 0 is an eigenvalue of EE^T. Choosing b = t shows that if λ6= 0 then a is orthogonal to t. But then choosing a = b shows that

λ|a|²=|t × a|²=|t|²|a|².

It follows that all non-zero singular values of E must be equal. Note that the singular values ofE cannot all be zero since this matrix has rank 2.

10.2. Exponential representation of rotation matrices. The matrix associated with the rotation whose axis is the unit vector a and whose angle is θ can be shown to be equal to e^θ[a_×] def= P+∞

i=0 1

i!(θ[a×])ⁱ. Use this representation to derive Eq. (10.3).

Solution Let us consider a small motion with translational velocity v and ro-tational velocity ω. If the two camera frames are separated by the small time interval δt, the translation separating them is obviously (to first order) t = δtv.

The corresponding rotation is a rotation of angle δt|ω| about the axis (1/|ω|)ω, i.e.,

R = e^δt[ω_×]=

+∞

i=0

i!(δt[ω_×])ⁱ= Id + δt [ω_×] + higher-order terms.

Neglecting all terms of order two or higher yields Eq. (10.3).

10.3. The infinitesimal epipolar constraint of Eq. (10.4) was derived by assuming that the observed scene was static and the camera was moving. Show that when the camera is fixed and the scene is moving with translational velocity v and rotational velocity ω, the epipolar constraint can be rewritten as p^T([v_×][ω_×])p + (p× ˙p) · v = 0.

Note that this equation is now the sum of the two terms appearing in Eq. (10.4) instead of their difference.

Hint: IfR and t denote the rotation matrix and translation vectors appearing in the definition of the essential matrix for a moving camera, show that the object displacement that yields the same motion field for a static camera is given by the rotation matrixR^T and the translation vector−R^Tt.

42 Chapter 10 The Geometry of Multiple Views

Solution Let us consider first a moving camera and a static scene, use the coor-dinate system attached to the camera in its initial position as the world coorcoor-dinate system, and identify scene points with their positions in this coordinate system and image points with their position in the corresponding camera coordinate system.

We have seen that the projection matrix associated with this camera can be taken equal toM = ¡

Id 0¢

before the motion and to M =¡

R^Tc −R^Tctc¢

after the camera has undergone a rotationR^cand a translation tc. Using non-homogeneous coordinates for scene points and homogeneous ones for image points, the two images of a point P are thus p = P and p⁰=R^TcP− R^Tctc.

Let us now consider a static camera and a moving object. Suppose this object undergoes the (finite) motion defined by P⁰=R^oP + to in the coordinate system attached to this camera. Since the projection matrix is¡

Id 0¢

in this coordinate system, the image of P before the object displacement is p = P . The image after the displacement is p⁰ =R^oP + to, and it follows immediately that taking R^o=R^T and to=−R^Tt_c yields the same motion field as before.

For small motions, we have

R^o= Id + δt [ωo×] =R^Tc = Id− δt [ω^c×], and it follows that ωo=−ω^c. Likewise,

t_o= δt vo=−R^Tct_c=−(Id − δt [ωc×])(δt vc) =−δt vc

when second-order terms are neglected. Thus vc =−v^o. Recall that Eq. (10.4) can be written as

p^T([vc×][ωc×])p− (p × ˙p) · v^c= 0.

Substituting vc=−vôand ωc=−ωôin this equation finally yields p^T([v_o×][ω_o×])p + (p× ˙p) · vô= 0.

10.4. Show that when the 8×8 matrix associated with the eight-point algorithm is singu-lar, the eight points and the two optical centers lie on a quadric surface (Faugeras, 1993).

Hint: Use the fact that when a matrix is singular, there exists some nontrivial linear combination of its columns that is equal to zero. Also take advantage of the fact that the matrices representing the two projections in the coordinate system of the first camera are in this case (Id 0) and (R^T − R^Tt).

Solution We follow the proof in Faugeras (1993): Each row of the 8× 8 matrix associated with the eight-point algorithm can be written as

(uu⁰, uv⁰, u, vu⁰, vv⁰, v, u⁰, v⁰) = 1

zz⁰(xx⁰, xy⁰, xz⁰, yx⁰, yy⁰, yz⁰, zx⁰, zy⁰), where P = (x, y, z)^T and P⁰= (x⁰, y⁰, z⁰)^T denote the positions of the scene point projecting onto (u, v)^T and (u⁰, v⁰)^T in the corresponding camera coordinate sys-tems (C) and (C⁰). For the matrix to be singular, there must exist some nontrivial linear combination of its columns that is equal to zero—that is, there must exist eight scalars λi (i = 1, . . . , 8) such that

λ1xx⁰+ λ2xy⁰+ λ3xz⁰+ λ4yx⁰+ λ5yy⁰+ λ6yz⁰+ λ7zx⁰+ λ8zy⁰= 0,

OO⁰, where O and O⁰ denote the optical centers of the two cameras. It follows that

P⁰=^C⁰P =^C_C⁰R^CP +^C⁰−−→

O⁰O =^C_C⁰R^CP−^C⁰−−→

OO⁰=^C_C⁰R^CP−^CC⁰R^C−−→

OO⁰=R^T(P−t).

The degeneracy condition derived earlier can therefore be written as P^TQR^T(P− t) = 0.

This equation is quadratic in the coordinates of P and defines a quadric surface in the coordinate system attached to the first camera. This quadric passes through the two optical centers since its equation is obviously satisfied by P = 0 and P = t.

When the 8× 8 matrix involved in the eight-point algorithm is singular, the eight points must therefore lie on a quadric passing through these two points.

10.5. Show that three of the determinants of the 3× 3 minors of

L =

Show that the fourth determinant can be written as a linear combination of these.

Solution Let us first note that

 now rewrite the trifocal constraints as

a× [dc − eb] = d(a × c) − e(a × b) = 0.

With the same notation, we have L^T =

µa b c

0 d e

Let us now compute the determinants of the 3× 3 minors of L^T (instead of those ofL; this is just to avoid transposing too many vectors and does not change the result of our derivstion).

Three of these determinants can be written as

D12=

44 Chapter 10 The Geometry of Multiple Views

which, except for a change in sign, is identical to the expression derived earlier.

Thus the fact that three of the 3× 3 minors of L are zero can indeed be expressed by the trifocal tensor.

Let us conclude by showing that the fourth determinant is a linear combination of the other three. This determinant is

D = a b c = (a× b) · c.

which shows that D can indeed be written as a linear combination of D12, D23, and D₃₁.

10.6. Show that Eq. (10.18) reduces to Eq. (10.2) when M¹ = (Id 0) andM² = (R^T − R^Tt).

Solution Recall that Eq. (10.18) expresses the bilinear constraints associated with two cameras as D = 0, where D is the determinant

D =

andM^j_i denotes row number j of camera number i. When

M1=¡

where c1, c2, and c3denote the three columns ofR, we can rewrite the determinant as

where p₁= (u1, v1, 1)^T and p₂= (u2, v2, 1)^T.

Now, let (t₁, t₂, t₃) be the coordinates of the vector t in the right-handed orthonor-mal basis formed by the columns ofR, we have

D = p^T₁¡ which is indeed an instance of Eq. (10.2).

10.7. Show that Eq. (10.19) reduces to Eq. (10.15) whenM1= (Id 0).

Solution Recall that Eq. (10.18) expresses the trilinear constraints associated with two cameras as D = 0, where D is the determinant

D =

or, since the determinant of a matrix is equal to the determinant of its transpose,

D =

where we use the same notation as in Ex. 10.5. According to that exercise, we thus have

which is the result we aimed to prove.

46 Chapter 10 The Geometry of Multiple Views

10.8. Develop Eq. (10.20) with respect to the image coordinates, and verify that the coefficients can indeed be written in the form of Eq. (10.21).

Solution This follows directly from the multilinear nature of determinants. In-deed, Eq. (10.20) can be written as

0 =

It is thus clear that all the coefficients of the quadrilinear tensor can be written in the form of Eq. (10.21).

10.9. Use Eq. (10.23) to calculate the unknowns zi, λi, and z₁ⁱin terms of p₁, p_i,Ri, and t_i(i = 2, 3). Show that the value of λ_iis directly related to the epipolar constraint, and characterize the degree of the dependency of z²₁− z1³on the data points.

Solution We rewrite Eq. (10.23) as

zⁱ₁p₁= t_i+ z_iRip_i+ λ_i(p₁× Rip_i)⇐⇒ z1ⁱp₁= t_i+ z_iq_i+ λ_ir_i, where q_i=Rip_i, r_i= p₁× qi, and i = 2, 3.

Forming the dot product of this equation with riyields t_i· [p1× Rip_i] + λ_i|ri|²= 0, or, rearranging the terms in the triple product,

λi|ri|²= p₁· [ti× Rip_i], which can be rewritten as

λi= p^T₁Eip_i

|ri|² ,

where Eⁱ is the essential matrix associated with views number 1 and i. So the value of λ_i is indeed directly related to the epipolar constraint. In particular, if this constraint is exactly satisfied, λi= 0.

Now forming the dot product of Eq. (10.23) with q_i× riyields z₁ⁱp₁· (qi× ri) = ti· (qi× ri).

Let us denote by [a, b, c] = a· (b × c) the triple product of vectors inR³^{. Noting} that the value of the triple product is invariant under circular permutations of the vectors reduces the above equation to

z₁ⁱr_i· (p1× qi) = r_i· (ti× qi), or

z₁ⁱ =r^T_iEⁱp_i

|ri|² .

Likewise, forming the dot product of Eq. (10.23) with p₁× ri yields [t_i, p₁, r_i] + z_i[q_i, p₁, r₁] = 0,

zi= [ti, p₁, ri]

|rⁱ|² .

The scale-restraint condition can be written as z²₁− z1³= 0, or

|r2|²(r^T₃E3p₃) =|r3|²(r^T₂E2p₂), which is an algebraic condition of degree 7 in p₁, p₂, and p₃. Programming Assignments

10.10. Implement the eight-point algorithm for weak calibration from binocular point correspondences.

10.11. Implement the linear least-squares version of that algorithm with and without Hartley’s preconditioning step.

10.12. Implement an algorithm for estimating the trifocal tensor from point correspon-dences.

10.13. Implement an algorithm for estimating the trifocal tensor from line correspon-dences.

C H A P T E R 11

Stereopsis

PROBLEMS

11.1. Show that, in the case of a rectified pair of images, the depth of a point P in the normalized coordinate system attached to the first camera is z =−B/d, where B is the baseline and d is the disparity.

Solution Note that for rectified cameras, the v and v⁰ axis of the two image coordinate systems are parallel to each other and to the y axis of the coordinate system attached to the first camera. In addition, the images q and q⁰ of any point Q in the plane y = 0 verify v = v⁰= 0. As shown by the diagram below, if H, C₀ and C₀⁰ denote respectively the orthogonal projection of Q onto the baseline and the principal points of the two cameras, the triangles OHQ and qC0O are similar, thus −b/z = −u/1. Likewise, the triangles HO⁰Q and C⁰₀q⁰O⁰ are similar, thus

−b⁰/z = u⁰/1, where b and b⁰ denote respectively the lengths of the line segments OH and HO⁰. It follows that u⁰− u = −B/z or z = −B/d.

’ O’

u’

v’

p q

C₀ C’₀

q’

p’

1 1

-z

b’

Let us now consider a point P with nonzero y coordinate and its orthogonal pro-jection Q onto the plane y = 0. The points P and Q have the same depth since the line P Q joining them is parallel to the y axis. The lines pq and p⁰q⁰ joining the projections of the two points in the two images are also obviously parallel to P Q and to the v and v⁰ axis. It follows that the u coordinates of p and q are the same, and that the u⁰ coordinates of p⁰ and q⁰ are also the same. In other words, the disparity and depths for the points P and Q are the same, and the formula z =−B/d holds in general.

11.2. Use the definition of disparity to characterize the accuracy of stereo reconstruction as a function of baseline and depth.

Solution Let us assume that the cameras have been rectified. In this case, as in 48

Ex. 11.1, we have z =−B/d. Let us assume the disparity has been measured with some error ε. Using a first-order Taylor expansion of the depth shows that

z(δ + ε)− z(δ) ≈ εz⁰(δ) = εB d² = ε

Bz².

In other words, the error is proportional to the squared depth and inversely pro-portional to the baseline.

11.3. Give reconstruction formulas for verging eyes in the plane.

Solution Let us define a Cyclopean coordinate system with origin at the mid-point between the two eyes, x axis in the direction of the baseline, and z axis oriented so that z > 0 for points in front of the two eyes (note that this contradicts our usual conventions, but allows us to use a right-handed (x, z) coordinate system).

Now consider a point P with coordinates (x, z). As shown by the diagram below, if the corresponding projection rays make angles θ_l and θr with the z axis, we must have







x + B/2

z = cot θ_l, x− B/2

z = cot θr,

⇐⇒





 x = B

cot θ_l+ cot θr

cot θ_l− cot θ^r,

z = B

cot θ_l− cot θ^r.

r r

l φ l

-x P

F z

x θ

θ φ

Now, if the fixated point has angular coordinates (φ_l, φr) and some other point P has absolute angular coordinates (θ_l, θr), Cartesian coordinates (x, z), and retinal angular coordinates (ψ_l, ψr), we must have θ_l= φ_l+ ψ_land θr= φr+ ψr, which gives reconstruction formulas for given values of (φ_l, φr) and (ψ_l, ψr).

11.4. Give an algorithm for generating an ambiguous random dot stereogram that can depict two different planes hovering over a third one.

Solution We display two squares hovering at different heights over a larger bac-ground square. The backbac-ground images can be synthesized by spraying random black dots on a white background plate after (virtually) covering the area corre-sponding to the hovering squares. For the other two squares, the dots are generated as follows: On a given scanline, intersect a ray issued from the left eye with the first plane and the second one, and paint black the resulting dots P1and P2. Then paint a black dot on the first plane at the point P₃where the ray joining the right eye to P2 intersects the first plane. Now intersect the ray joining the left eye to P₃with the second plane. Continue this process as long as desired. It is clear that this will generate a deterministic, but completely ambiguous pattern. Limiting this

50 Chapter 11 Stereopsis

process to a few iterations and repeating it at many random locations will achieve the desired random effect.

11.5. Show that the correlation function reaches its maximum value of 1 when the image brightnesses of the two windows are related by the affine transform I⁰= λI + µ for some constants λ and µ with λ > 0.

Solution Let us consider two images represented by the vectors w = (w₁, . . . , wp)^T and w⁰= (w⁰₁, . . . , w⁰p)^T ofR^p (typically, p = (2m + 1)× (2n + 1) for some positive values of m and n). As noted earlier, the corresponding normalized correlation value is the cosine of the angle θ between the vectors w− ¯wand w⁰− ¯w⁰, where ¯a denotes the vector whose coordinates are all equal to the mean ¯a of the coordinates of a.

The correlation function reaches its maximum value of 1 when the angle θ is zero.

In this case, we must have w⁰− ¯w⁰= λ(w− ¯w) for some λ > 0, or for i = 1, . . . , p, w⁰_i= λw_i+ ( ¯w⁰− λ ¯w) = λw_i+ µ,

where µ = ¯w⁰− λ ¯w.

Conversely, suppose that w_i⁰ = λw_i+ µ for some λ, µ with λ > 0. Clearly, w⁰ = λw + ¯µand ¯w⁰= λ ¯w+ ¯µ, where ¯µdenotes this time the vector with all coordinates equal to µ. Thus w⁰− ¯w⁰ = λ(w− ¯w), and the angle θ is equal to zero, yielding the maximum possible value of the correlation function.

11.6. Prove the equivalence of correlation and sum of squared differences for images with zero mean and unit Frobenius norm.

Solution Let w and w⁰ denote the vectors associated with two image windows.

If these windows have zero mean and unit Frobenius norm, we have by definition

|w|²=|w⁰|²= 1 and ¯w= ¯w⁰= 0. In this case, the sum of squared differences is

|w⁰− w|²=|w|²− 2w · w⁰+|w⁰|²= 2− 2w · w⁰= 2− 2C,

where C is the normalized correlation of the two windows. Thus minimizing the sum of squared differences is equivalent to maximizing the normalized correlation.

11.7. Recursive computation of the correlation function.

(a) Show that (w− ¯w)· (w⁰− ¯w⁰) = w· w⁰− (2m + 1)(2n + 1) ¯I ¯I⁰.

(b) Show that the average intensity ¯I can be computed recursively, and estimate the cost of the incremental computation.

(c) Generalize the prior calculations to all elements involved in the construction of the correlation function, and estimate the overall cost of correlation over a pair of images.

Solution

(a) First note that for any two vectors of size p, we have ¯a· b = p¯a¯b, where ¯a and

¯b denote respectively the average values of the coordinates of a and b. For vectors w and w⁰representing images of size (2m + 1)× (2n + 1) with average intensities ¯I and ¯I⁰, we have therefore

(w− ¯w)· (w⁰− ¯w⁰) = w· w⁰− ¯w· w⁰− w · ¯w⁰+ ¯w· ¯w⁰

= w· w⁰− (2m + 1)(2n + 1) ¯I ¯I⁰.

(b) Let ¯I(i, j) and ¯I(i + 1, j) denote the average intensities computed for windows respectively centered in (i, j) and (i + 1, j). If p = (2m + 1)× (2n + 1), we

Thus the average intensity can be updated in 4(n+1) operations when moving from one pixel to the one below it. The update for moving one column to the right costs 4(m + 1) operations. This is to compare to the (2m + 1)(2n + 1) operations necessary to compute the average from scratch.

(c) It is not possible to compute the dot product incrementally during column shifts associated with successive disparities. However, it is possible to compute the dot product associated with elementary row shifts since 2m of the rows are shared by consecutive windows. Indeed, let w(i, j) and w⁰(i, j) denotes the vectors w and w⁰ associated with windows of size (2m + 1)× (2n + 1) centered in (i, j). We have

w(i + 1, j)· w⁰(i + 1, j) =

and the exact same line of reasoning as in (b) can be used to show that w(i + 1, j)· w⁰(i + 1, j) = w(i, j)· w⁰(i, j)

Thus the dot product can be updated in 4(2n + 1) operations when moving from one pixel to the one below it. This is to compare to the 2(2m + 1)(2n + 1)− 1 operations necessary to compute the dot product from scratch.

To complete the computation of the correlation function, one must also com-pute the norms|w − ¯w| and |w⁰− ¯w⁰|. This computation also reduces to the evaluation of a dot product and an average, but it can be done recursively for both rows and column shifts.

Suppose that images are matched by searching for each pixel in the left image its match in the same scanline of the right image, within some disparity range [−D, D]. Suppose also that the two images have size M × N and that the windows being compared have, as before, size (2m+1)×(2n+1). By assuming if necessary that the two images have been obtained by removing the outer layer of a (M + 2m + 2D)× (N + 2n + 2D) image, we can ignore boundary effects.

52 Chapter 11 Stereopsis

Processing the first scan line requires computing and storing (a) 2N dot prod-ucts of the form w· w or w⁰· w⁰, 2N averages of the form ¯I or ¯I⁰, and (c) (2D + 1)N dot products of the form w· w⁰. The total storage required is (2D + 5)N , which is certainly reasonable for, say, 1000× 1000 images, and disparity ranges of [−100, 100]. The computation is dominated by the w · w⁰ dot products, and its cost is on the order of 2(2m + 1)(2n + 1)(2D + 1)N . The incremental computations for the next scan line amount to updating all averages and dot products, with a total cost of 4(2n + 1)(2D + 1)N . Assum-ing M À m, the overall cost of the correlation is therefore, after M updates, 4(2n + 1)(2D + 1)M N operations. Note that a naive implementation would require instead 2(2m + 1)(2n + 1)(2D + 1)M N operations.

11.8. Show how a first-order expansion of the disparity function for rectified images can be used to warp the window of the right image corresponding to a rectangular region of the left one. Show how to compute correlation in this case using interpolation to estimate right-image values at the locations corresponding to the centers of the left window’s pixels.

Solution Let us set up local coordinate systems whose origins are at the two points of interest—that is, the two matched points have coordinates (0, 0) in these coordinate systems. If d(u, v) denotes the disparity function in the neighborhood of the first point, and α and β denotes its derivatives in (0, 0), we can write the coordinates of a match (u⁰, v⁰) for the point (u, v) in the first image as

µu⁰ v⁰

µu + d(u, v) v

¶ ,

or approximating d by its first-order Taylor expansion, µu⁰

v⁰

µ1 + α β

0 1

¶µu v

¶ .

It follows that, to first-order, a small rectangular region in the first image maps onto a parallelogram in the second image, and that the corresponding affine trans-formation is completely determined by the derivatives of the disparity function.

To exploit this property in stereo matching, one can map the centers of the pixels contained in the left window onto their right images, calculate the corresponding intensity values via bilinear interpolation of neighborhing pixels in the right image, and finally computer the correlation function from these values. This is essentially the method described in Devernay and Faugeras (1994).

11.9. Show how to use the trifocal tensor to predict the tangent line along an image curve from tangent line measurements in two other pictures.

Solution Let us assume we have estimated the trifocal tensor associated with three images of a curve Γ. Let us denote by p_i the projection of a point P of Γ onto image number i (i = 1, 2, 3). The tangent line T to Γ in P projects onto the tangent line tito γi. Given the coordinate vectors t2and t3 of t2 and t2, we can predict the coordinate vector t₁ of t₁ as t₁= (t^T₂G1¹t₃, t^T₂G1²t₃, t^T₂G1³t₃)^T. This is another method for transfer, this time applied to lines instead of points.

Programming Assignments

11.10. Implement the rectification process.

11.11. Implement a correlation-based approach to stereopsis.

11.12. Implement a multiscale approach to stereopsis.

11.13. Implement a dynamic-programming approach to stereopsis.

11.14. Implement a trinocular approach to stereopsis.

C H A P T E R 12

In document 2461059 Computer Vision Solution Manual (Page 41-54)