A.3 Camera projection models
A.3.4 Rotation parameterization
Each camera view may have a different local coordinate system from the world coordinates. Changing a coordinate system requires a transformation comprising rotation and translation. This section will briefly summarize some of the widely used parameterizations for rotation.
Euler’s rotation theorem states that the rotation of a rigid object involves a single rotation about an axis. The minimal representation involves 3 parameters, but it it is possible to use an overparameterization such as quaternions (4 parameters). Each of the parameterizations reviewed below has a closed form function for mapping the parameters to the corresponding rotation matrix R∈ SO(3).
Euler angles
The method of Euler angles, first discovered by Leonhard Euler, is a minimal parameterization technique representing rotation R as a series of three rotational motions—yaw, pitch and roll
about some predefined axes in the rotating object’s frame. In terms of equations,
R = Rx(α)Ry(β )Rz(γ) (A.135)
where Rx, Ryand Rzare the submodular rotation matrices and α, β and γ are variables. The
order in which these subrotations are applied can change.
Although this method is intuitively simple to understand, the factorization nature of Rx,
Ryand Rz makes the problem highly nonlinear in α, β and γ, making it difficult to optimize.
Also, there is a singularity condition called gimbal lock in which two or more rotating axes coincide and trigger degenerate rotations. This singularity motion can occur quite frequently since it only requires a 90◦rotation in one of the axes. Consequently, the use of Euler angles in computer vision is somewhat limited.
Axis-angle
Axis-angle is a widely used minimal parameterization technique (also implemented in Chap- ter6), which uses a 3D real vector to represent both the angle and the axis. Given an input vector ω ∈ R3, the rotation angle θ and the axis ˆn are given by
θ :=∥ω∥2 (A.136)
ˆn := ω ∥ω∥2
=: ˆω . (A.137)
The rotation matrix R is given byRodrigues[1816]’ formula
R = I + sin θ θ [ω]×+ 1− cosθ θ2 [ω]2×. (A.138)
Above can be derived from geometrically expressing a point rotating about the axis defined by ω/∥ω∥2by angle θ . Bear in mind that when θ ≈ 0, one should use the first order Taylor
approximation of(A.138), which is I + [ω]×. Note that there is a singularity point at θ = π. The axis-angle parameterization is implemented in the Ceres solver [Agarwal et al.,2014].
Lie group manifold optimization The axis-angle representation can also be derived alge- braically in the context of manifold optimization. As illustrated by Eade[2017], first note that rotation matrices form a compact matrix lie group. This means SO(3) is also a differ- entiable manifold and has a nice property that its constituent lie algebra so(3) describes the tangent space (from SectionA.2.5) at the identity. There are 3 tangent space directions, termed
A.3 Camera projection models 197
generators, which are defined as
G1:= 0 0 0 0 0 −1 0 1 0 G2:= 0 0 1 0 0 0 −1 0 0 G3:= 0 −1 0 1 0 0 0 0 0 . (A.139)
Hence, any small movement from the identity along the tangent plane can be represented by ∑3k=1ωkGk where ω := [ω1, ω2, ω3] is the scale vector. But since{Gk} spans the basis of a
skew symmetric matrix defined in (A.4), the movement (lie algebra) can also be written as [ω]×. This means that when there is a point x∈ R3, the infinitesimal rotation at the identity is given by [ω]×x =:∥ω∥2ωˆ × x where ˆω is a unit vector. This result intuitively shows that the
direction of ω corresponds to the axis of rotation.
Each Lie group has its own exponential map that maps the lie algebra to the corresponding group. In the view point of manifold optimization, this can be interpreted as moving along the tangent space via lie algebra followed by retracting back to the manifold using the matrix exponential map (A.5). Just to state the results, computing this yields
exp([ω]×) = I + [ω]×+1 2[ω] 2 ×+··· = I + sin θ θ [ω]×+ 1− cosθ θ2 [ω]2× (A.140)
where θ :=∥ω∥2. (A.140) is identical to (A.138). Expectedly, the derivative of (A.138) with
respect to ω at the identity will yield a 3× 3 × 3 tensor comprising G1, G2and G3.
The differences between applying the Rodrigues’ formula directly by differentiating it and using manifold optimization are as follows. With the latter method, the derivatives are projected to the tangent space (lie algebra) at the identity. Hence, the position of identity has to be adjusted at each iteration to a new estimate such that the tangent space around the new solution can be computed correctly. This requires storing some information about aggregated rotations. Unfortunately, there is a lack of performance comparison between the two strategies.
Unit quaternions
Quternion is a type of number system that can also be used to represent rotations. It has 4 dimensions with one scalar term and three quaternion units (i, j, k) which obey some internal rules.
Using similar notations from SectionA.3.4, a rotation about an axis ˆn through an angle θ can be represented by a unit quaternion
q = exp θ 2nˆ1i + ˆn2j + ˆn3k = cosθ 2 + ( ˆn1i + ˆn2j + ˆn3k) sin θ 2. (A.141)
Its inverse is well-defined as
q−1 = exp θ 2nˆ1i− ˆn2j + ˆn3k = cosθ 2 − ( ˆn1i + ˆn2j + ˆn3k) sin θ 2. (A.142)
Now, given a point x∈ R3, one can rotate this point by using the formula q(x1i + x2j + x3k)q−1.
One problem with the quaternion representation is that it is essentially an overparameteriza- tion unless the unit norm constraint is applied. One way to apply this is to replace q by q/∥q∥2
in the cost function. Although this does fix the rank, it also makes the problem highly nonlinear. Another way is to assume q lies on a 4D hypersphere, which is a type of the Riemannian manifold, and apply the manifold optimization scheme from SectionA.2.5. Although this does seem to work practically, there are always two quaternions that yield the same rotation, and therefore viewing the manifold in this way may not be mathematically pleasing.
Quaternions Zheng et al. [2012] that the weak perspective camera models illustrated in SectionA.3.2can be represented by raw unnormalized quaternions. (A.128) shows that the norm of the unnormalized quaternion gives the inverse of the average scene depth (¯z).