2.3 Summary
3.2.1 Shape Modeling
In the three dimensional space, the shape of an object can be represented as 3D vertices and a closed surface mesh1 connecting these vertices. Shapes are independent on rotation, scaling and translation of the object. Ideally, the vertices of each shape are representative as landmark points. Generally, landmarks are featured as anatomical structural meaningful positions, junctions connecting distinguished boundaries, geometrically defined point collections, or evenly spaced points on an objects surface between existing landmarks. Moreover, landmark points in the shape set should have a good correspondence. However, for a 3D shape it is impossible to label the landmarks manually. As a result, it’s essential to use an automatic approach to get landmark points of each shape with a good correspondence between different shapes. We will discuss how to establish the correspondence in the next chapter. Now it is assumed that all the shapes in the training set have a good correspondence, which means all the shapes are represented with well labelled landmarks of the same number and connectivity topology (i.e. mesh topology). The landmark points in each shape are represented by vertices with three dimensional Cartesian coordinates.
3.2.1.1 Shape Alignment
Initially, the shapes can differ with respect to size, orientation and position in the 3D space. This does not contribute anything to the shape variation. In order to get the shape variance statistics, we have to scale, rotate and translate all the shapes into a common coordinate. This is what shape alignment will do. The shape alignment can be done with a Procrustes analysis.
We define a single shape S as a collection of np points2, which can be denoted
as anp×3 matrix X, where each row vector xi= (x(ix), y (x) i , z
(x)
i ) represents the
Cartesian coordinates of theith points in this shape andi∈ {1,2,· · · , np}. Let us
1
Here the mesh in our work is a triangular mesh
2In the following context of this chapter, shape points, vertices, and landmark points are the
consider two shapes represented as matrixX andY, where shapeY is fixed but shapeXis moving. We wish to find a transformationT to alignX with respect to
Y, which can be written as
T(X) =s·XR+jtT w.r.t. Y (3.1)
where sis a scaling factor, R is a 3×3 rotation matrix in 3D space, andt is a column translation vector (tx, ty, tz)T,j is a np length unit vector (1,1,· · · ,1)T.
We consider two common methods for finding an appropriate pose parameter set {s,R,t}, namely orthogonal Procrustes analysis and unit quaternions based procrustes analysis.
1. Orthogonal Procrustes Analysis
The basic orthogonal Procrustes analysis problem was discussed and solved in [64]. Generally, this can be identified as an extended orthogonal Procrustes problem, which holds the residual matrix L = sXR+jtT −Y. The objective is to find the least square solution{s,R,t} to minimize the sum of squares of the residual matrix, which can be written as
{s,R,t}= arg mintrace{LTL} (3.2) = arg mintrace{(sXR+jtT −Y)T(sXR+jtT −Y)} (3.3) where trace means the sum of the diagonal elements in a square matrix. The problem also has anther implicit condition, which is an orthogonal rotation matrix, namelyRTR=I, whereIis an identity matrix. Then we can write the Lagrangean function and set its the derivatives with respect to {s,R,t} to zero in order to obtain a least squares estimate (LSE). Afterwards, we can get the solution as follows: R=UVT (3.4) s= trace RTXT I−jj T np Y trace XT I− jj T np X (3.5) t= (Y−sXR)T j np (3.6) where U and V are two orthogonal matrices derived from the Singular Value Decomposition (SVD) of matrixS that is,
S=XT I− jj T np Y (3.7)
and its SVD holds the formatS=UΣVT, where Σ is a diagonal matrix. Actually, (I−jjT/np)Y performs the translation of Y from its centroid to the coordinate
matrix. For more details, it can be referred to Appendix B.
When ASM was proposed by Cootes et al. [16], a 2D weighted orthogonal Pro- crustes analysis was performed using a linear equations set to find the solution. The weights for each point can give more significance to the points which are more stable across the set. However, it is hard to find a solution with respect to 3 dimensional space. In [65], the solution of a weighted orthogonal Procrustes analysis was also given, which can be derived from the unweighted one with minor changes. The details can also be referred to Appendix B.
2. Unit Quaternions based Procrustes Analysis
Alternatively, the unit quaternions based approach can give an absolute orientation [66]. To be specific, the objective is also to minimize the sum of squares of residual errors, as equation 4.3 shows. We also define X and Y as X = {xi =
(x(ix), yi(x), zi(x))|i= 1,2,· · · , np} and Y ={yi = (x(iy), y (y) i , z
(y)
i )|i= 1,2,· · · , np}
where each row is one point coordinate in each shape. The absolute translations are the centroid coordinates as
x= 1 np np X i=1 xi, y= 1 np np X i=1 yi (3.8)
Now the centered shapes are (This is the same as using (I−jjT/np) to left multiply X and Y, as last section states.)
x0i =xi−x, y0i =yi−y, i= 1,2,· · ·, np. (3.9)
The relative translation t is actually the difference of the centroid ofY and the scaled and rotated centroid ofX, namely,
t=y−sxR (3.10) Once we get the scalesand the rotation matrixRT we can calculate the translation
t. The objective function now can be written as
np
X
i=1
ky0i−sx0iRk2 (3.11)
According to the derivation in [66], the scale can be obtained by
s= np X i=1 y0i·(x0iR) np X i=1 kx0ik2 (3.12)
R that maximizes Pnp
i=1[y
0
i·(x0iR)]. Now we use the quaternions approach to
solve this problem, which is to maximize ˙qTMq˙ by finding the optimum quaternion ˙
q. For more explanations, [66] can be referred. Technically, we first introduce a 3×3 matrixSwhich contains all the information required to solve the least squares problem for rotation. S holds the form
S= np X i=1 x0Ti y0i or S=X0TY, S= Sxx Sxy Sxz Syx Syy Syz Szx Szy Szz (3.13)
whereX0 is the collection ofx0i and each row is the row vector x0i. It is the same
withY0 andy0i. Then we form the matrix Mrepresented asM=
Sxx+Syy+Szz Syz−Szy Szx−Sxz Sxy −Syx Syz−Szy Sxx−Syy−Szz Sxy+Syx Szx+Sxz Szx−Sxz Sxy+Syx −Sxx+Syy−Szz Syz+Szy Sxy−Syx Szx+Sxz Syz+Szy −Sxx−Syy+Szz (3.14) Afterwards, we can calculate the normalized eigenvectors and eigenvalues{nj, λj|j =
1,2,3,4} of M, where Mnj = λjnj and knjk2 = 1 are satisfied. Then we
assume the maximal (most positive) eigenvalue is λmax which holds λmax =
max{λj|j= 1,2,3,4}, and the corresponding eigenvector isnmax, which satisfies Mnmax = λmaxnmax. It can be proved that the quadratic form ˙qTMq˙ can be
maximized when ˙q=nmax, where ˙qis the unit quaternion representing the rotation
information.
Now we assign {q0, qx, qy, qz} constructing the unit quaternion ˙q with the four
elements in nmax respectively, then the transpose of the rotation matrix can be
written as RT = q02+qx2−qy2−qz2 2(qxqy−q0qz) 2(qxqz+q0qy) 2(qyqx+q0qz) q02−qx2+qy2−qz2 2(qyqz−q0qx) 2(qzqx−q0qy) 2(qzqy+q0qx) q02−qx2−qy2+qz2 (3.15) or in matrix notation as RT = qx qy qz qx qy qz T + q0 −qz qy qz q0 −qx −qy qx q0 2 (3.16)
With the rotation matrix Rwe can calculate the scalesand translation vectort, then we can obtain the aligned shape as represented by equation 3.1.
Later on Horn et al. [67] proposed another closed form solution for absolute orientation using orthonormal matrices, which is an advance of the quaternions based approach.
We define ns shapes{Si|i= 1,2,· · ·, ns} in the training set, and the shape points
in shapeSi can be denoted as Xi. Then we use an iterative way to align all the
shapes as follows:
(1) Set the first shapeX1 as the mean shape X.
(2) Align all the shapesXi with respect toX to the aligned shape pointsX(ia) = Ti(Xi) as well as the pose parameter sets{si,Ri,ti},
(3) Calculate the mean shape X from the aligned shapes X(ia), and mean pose parameter set {s,R,t}over the training set and save them.
(4) Transform the mean shape X inversely with inverse pose parameter set
{1/s,R−1,−t}to update the mean shape X, namelyT−1(X).
(5) Return to step (2) with the updated mean shapeX, until stable.
(6) Align all the shapes Xi with respect to the stable mean shape X, save the
aligned shapeX(ia) and the pose parameter sets {si,Ri,ti},
Actually both of the approaches discussed above are available, but the quaternion based approach has the advantage of using less parameters (only 4 elements can represent the rotation) to control the transformation, which is more preferable in AAM.
3.2.1.2 Shape Variance Modeling
Now we suppose all the shapes have been aligned. As introduced in [20], we define a 3np dimensional shape vectorxin 3D space to represent the aligned shape points X(a) in a single shapeS as
x= (x1, x2,· · · , xnp, y1, y2,· · ·, ynp, z1, z2,· · ·, znp)
T (3.17)
where (xj, yj, zj) is the Cartesian coordinates of thejthpoints andj∈ {1,2,· · · , np}.
Therefore, the shape points of ns shapes can be represented as a column vector set {xi|i= 1,2,· · ·, ns}. Then we can apply PCA to construct the point distribution
model for modeling the shape statistics. It is worth noting that if the scattering of shape points is Gaussian in the space, the PCA can find the optimal principal axes. If the data do not have a Gaussian distribution, then the variance cannot be used as criterion of evaluating the component significance, and PCA decomposition is not the optimal approach, though it may work well. In this case, the independent components analysis (ICA) could be applied.
We consider a data set with Gaussian distribution. Usually we first calculate the centroid of all shape vectors. The maximum likelihood estimate (MLE) of the
covariance can be given as x= 1 ns ns X i=1 xi, S= 1 ns ns X i=1 (xi−x)(xi−x)T = 1 ns DDT (3.18) whereD is 3np×ns matrix asD= (x1−x,x2−x,· · · ,xns−x).
By PCA, we can get the 3np×3np diagonal matrix Λs in which the diagonal
elements are eigenvalues, and the 3np×3np orthogonal matrixPs which is com-
posed by 3np column-wise corresponding eigenvectors. This can be denoted as
SPs=PsΛs, Ps= (p(1s),p (s) 2 ,· · · ,p (s) 3np), Λs= λ(1s) λ(2s) . .. λ(3sn)p (3.19) We arrange the eigenvalues in a descending order, and corresponding eigenvectors, and we assume nowλ(1s)≥λ2(s)≥ · · ·λ(3sn)p. Based on these parameters, we can have a linear representation for a shape instance, as
x=x+Psbs (3.20)
wherebs={b(1s), b2(s),· · ·, b(3sn)p} is shape model parameter, or mode weight vector.
There are 3np elements inbs and each of them gives a weight to one mode. Because
the shape data are satisfying the Gaussian distribution, the weight should be constrained by−3 q λ(ms)≤b(ms) ≤+3 q λ(ms) where q
λ(ms) is the standard deviation
of the parameters andm∈ {1,2,· · · ,3np}is the mode index. For any shape points
from the training set, the mode weight vector can be obtained by
b(is)=PTs(xi−x) i= 1,2,· · ·, ns (3.21)
When there are many points in the shape, we would like to take the modes with more significances to represent the shape and the rest ones are seen as noise. Whereas the explained variances of each eigenvector is equal to the corresponding eigenvalues and we have arranged the eigenvalues in a descending order, so it is practical to take the firstts modes. We introduce pto represent the percentage of
retained variation, andts can be chosen as ts X m=1 λ(ms)≥ p 100% 3np X m=1 λ(ms) (3.22) Actually, when the number of shapes in the training setnsis much smaller than the
and eigenvectors since the covariance matrix is huge. Because if there arensshapes,
there are at mostns−1 non zero eigenvalues (non computational zero value). As a
result in the case ofns−1<3np, we can calculate the eigenvalues and eigenvectors
faster in the following way: We define ans×ns matrix S0 as
S0 = 1
ns
DTD (3.23) Then we find the eigenvectors and corresponding eigenvalues of S0 as P0s =
(p0(1s),p02(s),· · · ,p0n(ss)) andΛ0s=diag{λ0(1s), λ0 (s)
2 ,· · ·, λ0 (s)
ns}respectively. According
to [16], the firstns eigenvalues ofS (which are already in a descending order) are
same as the eigenvalues ofS0, and the corresponding eigenvectors of Sare linear compositions of eigenvectors of S0. This can be represented as
λi(s)=λ0i(s) i= 1,2,· · · , ns (3.24) p(is)= q 1
λ0(s)
i ns
Dp0(is) (3.25)
Finally with the first ts modes, the shape instance is x≈x+Ps,tsbs,ts =x+
ts
X
m=1
p(ms)b(ms) (3.26) BrieflyPsis used to briefly denotePs,ts which containsts column-wise eigenvectors.