Computer Vision cmput 428/615
3D Modeling from images 3D Modeling from images
Martin Jagersand Martin Jagersand
3D from video example
•InexpensiveInexpensive
•Quick and convenient Quick and convenient for the user
for the user
•Integrates with existing Integrates with existing SW e.g. Blender, Maya SW e.g. Blender, Maya
Capture objects
3D from video: Low budget
•InexpensiveInexpensive
•Quick and convenient for the user
•Integrates with existing SW e.g. Blender, Maya
$100: Webcams, Digital Cams $100,000 Laser scanners etc.
3D from video Low budget
•InexpensiveInexpensive
•Quick and convenient Quick and convenient for the user
for the user
•Integrates with existing Integrates with existing SW e.g. Blender, Maya SW e.g. Blender, Maya
Modeling geom primitives into scenes: >>Hours
Capturing 3D from 2D video:
minutes
Low budget 3D from video
•InexpensiveInexpensive
•Quick and convenient Quick and convenient for the user
for the user
•Integrates with existing Integrates with existing SW e.g. Blender, Maya SW e.g. Blender, Maya
Application Case Study Modeling Inuit Artifacts
•New acquisition at the UofA: A group of 8 sculptures depicting Inuit seal hunt
•Acquired from sculptor by Hudson Bay Company
Application Case Study Modeling Inuit Artifacts
Results:
Results:
1.1. A collection of 3D models of each componentA collection of 3D models of each component
2.2. Assembly of the individual models into Assembly of the individual models into animations
animations and and Internet web study materialInternet web study material..
Preliminaries:
Capturing Macro geometry:
• Shape From SilhouetteShape From Silhouette
– Works for objectsWorks for objects – RobustRobust
– Visual hull not true object surfaceVisual hull not true object surface
• Structure From MotionStructure From Motion
– Works for ScenesWorks for Scenes – Typically sparseTypically sparse
– Sometimes fragile (no salient points in scene)Sometimes fragile (no salient points in scene)
• Space carvingSpace carving
– Use free space constraintsUse free space constraints
• (Dense “Stereo” -- later)(Dense “Stereo” -- later)
– Use as second refinement stepUse as second refinement step
•Projection equationProjection equation xxii=P=PiiXX
•Resection:Resection:
– xxii,X P,X Pii
Multi-view geometry - resection
Given image points and 3D points calculate camera projection matrix.
• Projection equationProjection equation xxii=P=PiiXX
• Intersection:Intersection:
– xxii,P,Pi i XX
Multi-view geometry - intersection
Given image points and camera projections in at least 2 views calculate the 3D points (structure)
• Projection Projection equation equation xxii=P=PiiXX
• Structure from Structure from motion (SFM) motion (SFM)
– xxii P Pii,,XX
Multi-view geometry - SFM
Given image points in at least 2 views calculate the 3D points (structure) and camera projection matrices (motion)
•Estimate projective structure
•Rectify the reconstruction to metric (autocalibration)
Examples: geometric modeling
Debevec and Taylor:
Debevec and Taylor: Façade Façade
Examples: geometric modeling
Pollefeys:
Pollefeys: Arenberg CastleArenberg Castle
Examples – modeling with dynamic texture
Cobzas,Yerex,Jagersand Cobzas,Yerex,Jagersand
Ways to 3D pointcloud from images
•Aligned Cameras: Stereo camera (Lab 3)Aligned Cameras: Stereo camera (Lab 3)
•Know cameras beforehand (Photogrammetry)Know cameras beforehand (Photogrammetry)
•First compatible cameras, then 3D GeomFirst compatible cameras, then 3D Geom
– Fundametal matrix F -> Pcompat -> 3D triang Fundametal matrix F -> Pcompat -> 3D triang – All of this in projective geom.All of this in projective geom.
•Simultaneous computation of cameras and 3DSimultaneous computation of cameras and 3D
– Compact math formulation. Easy to understand?Compact math formulation. Easy to understand?
Pinhole camera
• Central projectionCentral projection
• Principal point & aspectPrincipal point & aspect
0 1 1 0 0
0 0 0
0 0 0
~ 1
) / , / ( )
, , (
Z Y X f
f
Z fY fX y
x
Z fY Z fX Z
Y
X T T
1 1 0
0 0 1 1 0
1 1
1
~ 1
y x p c
p c c
p y
c p x
v u
y y
x x
y y
x
x c
px
py
The projection matrix:
cam
cam y
y
x x
I K
p c f p c
f
X 0 x
X x
]
| [
0 1 0
0
0 0
0 0
Projective camera
• Camera rotation and translationCamera rotation and translation
• The projection matrixThe projection matrix
tX X tX
X R cam cam RT RT
tX
x KRT I
P In general:
•P is a 3x4 matrix with 11 DOF
•Finite: left 3x3 matrix non-singular
•Infinite: left 3x3 matrix singular
Properties: P=[M p4]
•Center:
•Principal ray (projection direction)
0 0 ,
1 0
4
1
d d p C
C C
M M P
) 3
det( m v M
•Infinite cameras where the last row of Infinite cameras where the last row of P is P is (0,0,0,1)(0,0,0,1)
•Points at infinity are mapped to points at infinityPoints at infinity are mapped to points at infinity
Affine cameras
z z
y x
T
y x
t d
t t t R
d t t K
P
0
0
t k
j i 0
j i
d0 t
j
k i
{camera}
Z
X
{world} Y
x0
•ErrorError
)
( 0
0
x x
x
x
persp proj
aff d
Good approximation:
• small compared to d0
• point close to principal ray
X
X’
xaff xpersp
Camera calibration
•11 DOF => at least 6 points11 DOF => at least 6 points
•Linear solutionLinear solution
– Normalization requiredNormalization required – Minimizes algebraic errorMinimizes algebraic error
•Nonlinear solutionNonlinear solution
– Minimize geometric error (pixel re-projection)Minimize geometric error (pixel re-projection)
•Radial distortionRadial distortion
– Small near the center, increase towards peripherySmall near the center, increase towards periphery
1
0 min
p
p A
i
i P X
x
known known ?
...
1 1 2 2
K r K r
r
Application: raysets
Gortler and al.; Microsoft Lumigraph
t
v
s
(s,t) u
(u,v)
H-Y Shum, L-W He; Microsoft Concentric mosaics
Ck Cl
C0
Li
Lj vi
vj
i j
•Projection equationProjection equation xxii=P=PiiXX
•Resection:Resection:
– xxii,X P,X Pii
Multi-view geometry - resection
Given image points and 3D points calculate camera projection matrix.
• Projection equationProjection equation xxii=P=PiiXX
• Intersection:Intersection:
– xxii,P,Pi i XX
Multi-view geometry - intersection
Given image points and camera projections in at least 2 views calculate the 3D points (structure)
• Projection Projection equation equation xxii=P=PiiXX
• Structure from Structure from motion (SFM) motion (SFM)
– xxii P Pii,,XX
Multi-view geometry - SFM
Given image points in at least 2 views calculate the 3D points (structure) and camera projection matrices (motion)
•Estimate projective structure
•Rectify the reconstruction to metric (autocalibration)
Depth from stereo
•Calibrated aligned camerasCalibrated aligned cameras
Z
(0,0) (d,0) X Z=f
xl xr
r l
r l
x x
Z df
d x X
X f x Z f
( )
Disparity d
Trinocular Vision System
(Point Grey Research)
Application: depth based reprojection
3D warping,
3D warping, McMillanMcMillan
Plenoptic modeling,
Plenoptic modeling, McMillan & BishopMcMillan & Bishop
Application: depth based reprojection
Layer depth images,
Layer depth images, Shade et al. Shade et al.
Image based objects,
Image based objects, Oliveira & Bishop Oliveira & Bishop
Affine camera factorization
3D structure from many images
The affine projection equations are The affine projection equations are
1
j j
j
y i
x i ij
ij
Z Y X
P P y
x
0001 1
1
j j
j y
i x i ij
ij
Z Y X P
P y
x
~
~
4 4
j j
j y
i x i ij
ij y
i ij
x i ij
Z Y X P
P y
x P
y
P x
Orthographic factorization
The ortographic projection equations are The ortographic projection equations are where
where m ij Pi M j , i 1, . . . ,m , j 1, . . . ,n
All equations can be collected for all
All equations can be collected for all i and i and jj where
where
n
mn m m
m
n n
M ,..., M
, M ,
, m
m m
m m
m
m m
m
2 1
2 1
2 1
2 22
21
1 12
11
M
P P P P
m
M P m
~ M
~ m
j j
j y j
i x i i
ij ij ij
Z Y
X P ,
, P y
x P
Note that P and M are resp. 2mx3 and 3xn matrices and therefore the rank of m is at most 3
(Tomasi Kanade’92)
Orthographic factorization
Factorize
Factorize mm through singular value decomposition through singular value decomposition An affine reconstruction is obtained as follows
An affine reconstruction is obtained as follows
V T
U m
V T
M U
P ~
~ ,
(Tomasi Kanade’92)
n
mn m m
m
n n
M ,..., M
, M m
m m
m m
m
m m
m
min 2 1 2
1
2 1
2 22
21
1 12
11
P P P
Closest rank-3 approximation yields MLE!
~ 0
~
~ 1
~
~ 1
~
T T 1
T T 1
T T 1
y i x
i
y i y
i
x i x
i
P P
P P
P P
Q Q
Q Q
Q Q
~ 0
~
~ 1
~
~ 1
~
T T T
y i x
i
y i y
i
x i x
i
P P
P P
P P
C C C
A metric reconstruction is obtained as follows A metric reconstruction is obtained as follows
Where A is computed from Where A is computed from
Orthographic factorization
Factorize
Factorize mm through singular value decomposition through singular value decomposition An affine reconstruction is obtained as follows
An affine reconstruction is obtained as follows
V T
U m
V T
M U
P ~
~ ,
M Q M
Q P
P ~
~ 1 ,
0 1 1
T T T
y i x i
y i y i
x i x i
P P
P P
P
P 3 linear equations per view on symmetric matrix C (6DOF)
Q can be obtained from C
through Cholesky factorisation and inversion
(Tomasi Kanade’92)
Weak perspective factorization
[D. Weinshall]
[D. Weinshall]
•Weak perspective cameraWeak perspective camera
•Affine ambiguityAffine ambiguity
•Metric constraintsMetric constraints
j i s M s
ˆ ) ˆ )(
ˆ ( ˆ
ˆ MQQ 1X MQ Q 1X
W
ˆ 0 ˆ
ˆ ˆ
ˆ
ˆ 2
j i
j j
i i
s QQ s
s s
QQ s
s QQ s
T T
T T
T T
Extract motion parameters
Eliminate scale
Compute direction of camera axis k = i x j
parameterize rotation with Euler angles
Full perspective factorization
The camera equations The camera equations
for a fixed image
for a fixed image ii can be written in matrix form can be written in matrix form asas
where where
n j
m
j i
i i j
i jm M , 1,..., , 1,...,
λ P
M P m i i i
i i im
i
m im
i i
i m m m
λ ,..., λ
, λ diag
M ,..., M
, M ,
,..., ,
2 1
2 1
2 1
M
m
Perspective factorization
All equations can be collected for all All equations can be collected for all ii as as
where
where m P M
m n
n P
P P P
m m m
m , ...
...
2 1 2
2 1 1
In these formulas
In these formulas mm are known, but are known, but ii,,PP and and MM are are unknown
unknown
Observe that
Observe that PMPM is a product of a 3 is a product of a 3mmx4 matrix and a x4 matrix and a 4x4xnn matrix, i.e. it is a rank 4 matrix matrix, i.e. it is a rank 4 matrix
Perspective factorization algorithm
Assume that i are known, then PM is known.
Use the singular value decomposition PM=U VT
In the noise-free case
S=diag(1,2,3,4,0, … ,0)
and a reconstruction can be obtained by setting:
P=the first four columns of U.
M=the first four rows of V.
Iterative perspective factorization
When i are unknown the following algorithm can be used:
1. Set ij=1 (affine approximation).
2. Factorize PM and obtain an estimate of P and M.
If 5 is sufficiently small then STOP.
3. Use m, P and M to estimate i from the camera equations (linearly) mi i=PiM
4. Goto 2.
In general the algorithm minimizes the proximity measureNote that structure and motion recovered P(,P,M)=5
up to an arbitrary projective transformation
N-view geometry Affine factorization
(HZ Ch 17, 18)
[Tomasi &Kanade ’92]
[Tomasi &Kanade ’92]
•Affine camera Affine camera
•ProjectionProjection
•n points, n points, m views: measurement matrixm views: measurement matrix
]
| [M t
P M 2x3 matrix; t 2D vector
t
Z Y X y M
x
n
m m
n m
n
M M
W X X
x x
x x
1 1
1
1 1
1
~
~
~
~
W: Rank 3
X M V
D U
W
UDV W
T n m
T
ˆ ˆ ˆ 2 3 3 3 3
Assuming isotropic zero-mean Gaussian noise, factorization achieves ML affine reconstruction.
t x x
~
Projective factorization
Homogeneous coord &scale factors
[Sturm & Triggs’96][ Heyden ‘97 ] [Sturm & Triggs’96][ Heyden ‘97 ]
•Measurement matrixMeasurement matrix
•Known projective depthKnown projective depth
– Projective ambiguity Projective ambiguity
•Iterative algorithmIterative algorithm
– Reconstruct withReconstruct with
– Reestimate depth and iterate Reestimate depth and iterate
n
m m
n m n m
m
n n
P P
W X X
x x
x x
1 1
1 1
1 1 1
1 1 1
3mxn matrix
Rank 4
i
j
X P V
D U
W
UDV W
T n m
T
ˆ ˆ ˆ 2 4 4 4 4
1
i
j i
j
Further Factorization work
Factorization with uncertainty Factorization with uncertainty
Factorization for dynamic scenes Factorization for dynamic scenes
(Irani & Anandan, IJCV’02)
(Costeira and Kanade ‘94) (Bregler et al. 2000,
Brand 2001)
Sequential 3D structure from motion using 2 and 3 view geom
• Initialize structure and motion from two viewsInitialize structure and motion from two views
• For each additional viewFor each additional view
– Determine poseDetermine pose
– Refine and extend structureRefine and extend structure
• Determine correspondences robustly by jointly Determine correspondences robustly by jointly estimating matches and epipolar geometry
estimating matches and epipolar geometry
Images
2 view geometry
Epipolar geometry and
Fundamental matrix F
The epipolar geometry
C,C’,x,x’ and X are coplanar
The epipolar plane
All points on project on l and l’
The epipolar planes
Family of planes and lines l and l’
Intersection in e and e’
The epipoles
epipoles e,e’
= intersection of baseline with image plane
= projection of projection center in other image
= vanishing point of camera motion direction
an epipolar plane = plane containing baseline (1-D family) an epipolar line = intersection of epipolar plane with image
(always come in corresponding pairs)
Example: converging cameras
Example: motion parallel with image plane
Example: forward motion
e e’
The fundamental matrix F
algebraic representation of epipolar geometry
x l'
we will see that this mapping is (singular)
correlation (i.e. projective mapping from points to lines) represented by the fundamental matrix F
The fundamental matrix F
algebraic derivation (of existence)
λ P x λC
X
PP I
e' P'P F
x P P' C
P'
l
(note: doesn’t work for C=C’ F=0)
x P
λ
X
e' H
F H K1RK
Alternatively can write:
The fundamental matrix F
geometric derivation
x H x' π
x' e'
l' e' Hπx Fx
mapping from 2-D to 1-D family (rank 2) Step 1: X on a plane
Step 2: epipolar line l’
The fundamental matrix F
correspondence condition
0 Fx
x'T
The fundamental matrix satisfies the condition that for any pair of corresponding points x↔x’ in the two images
x'T l'0
The fundamental matrix F
F is the unique 3x3 rank 2 matrix that satisfies x’TFx=0 for all x↔x’
(i) Transpose: if F is fundamental matrix for (P,P’), then FT is fundamental matrix for (P’,P)
(ii) Epipolar lines: l’=Fx & l=FTx’
(iii) Epipoles: on all epipolar lines, thus e’TFx=0, x
e’TF=0, similarly Fe=0
(iv) F has 7 d.o.f. , i.e. 3x3-1(homogeneous)-1(rank2)
(v) F is a correlation, projective mapping from a point x to a line l’=Fx (not a proper correlation, i.e. not invertible)
Fundamental matrix, summary
•Algebraic representation of epipolar geometryAlgebraic representation of epipolar geometry
Step 1: X on a plane Step 2: epipolar line l’
x x
e
x e
x e l
x x
F H
H
] ' [
' ] ' [ ' ' '
'
0 ' x x FT
F
•3x3, Rank 2, det(F)=0
•Linear sol. – 8 corr. Points (unique)
•Nonlinear sol. – 7 corr. points (3sol.)
•Very sensitive to noise & outliers
[Faugeras ’92, Hartley ’92 ]
Epipolar lines:
Epipoles:
Projection matrices:
[ '] ' | '
'
]
| [
0 ' 0
' '
e v
e e
0 e e
x l
x l
T T
T
F P
I P
F F
F F
(i) Correspondence geometry: Given an image point x in
the first view, how does this constrain the position of the corresponding point x’ in the second image?
(ii) Camera geometry (motion): Given a set of corresponding image points {xi ↔x’i}, i=1,…,n, what are the cameras P and P’ for the two views?
(iii) Scene geometry (structure): Given corresponding image points xi ↔x’i and cameras P, P’, what is the position of (their pre-image) X in space?
F Relates to three questions: