Face Reconstruction using Texture Mapped Shape Models

Models

The various stages of the proposed approach, which were illustrated in figure 1.4, are now utilized for finding an approximation to a 3D shape. Some of these stages such as 3D correspondence, alignment, and texture mapping were described in the previous chapters. The shape estimation approach being discussed here depends to a great extent on these individual processes and can be affected by their accuracy.

5.2.1 Encoding 3D Face Shape Variation

Given a set of 3D face surfaces, once a 3D-3D correspondence is established, every vertex is effectively a landmarked point. This allows for statistically encoding of the variation in the set of face shapes. Each face surface (S) can be represented in the vector form as follows:

S = [x1, y1, z1, ..., xM, yM, zM] (5.1)

A statistical shape model is then obtained by performing principal component analysis on the normalized face space. The first step in principal component analysis is to average the face space: S = 1 M M X i=1 Si (5.2)

5.2. Face Reconstruction using Texture Mapped Shape Models 107

Next, each vector is centered about the mean obtained above and the resultant vectors are used for calculating the covariance matrix (C):

Si = Si− S (5.3) C = 1 M M X i=1 SiSiT (5.4)

The eigen vectors (φ) and eigen values (λ) are then obtained by eigen decomposition of the covariance matrix (equation 5.4). The shape parameters (α) for a shape surface can be obtained_e as:

αi = φTi (S − S) (5.5)

where i = 1...M − 1. The normalized shape parameters are obtained as follows:

αi = e

αi

√ λi

(5.6)

Finally, the statistical shape model is constructed using principal component analysis of shape space: S = S + t X i=1 αi √ λiφi (5.7)

The value of α for 3D face models normally varies within ±3 standard deviations. The shape variation represented by t modes in equation 5.7 is given as:

i=1λi

PM −1

i=1 λi

(5.8)

To synthesize new examples from the model, the parameter vector α is varied using largest t modes and a suitable optimization strategy.

5.2.2 Selection of the Optimization Method

Shape reconstruction is a geometric problem, and geometry is manipulated by varying parameters of the statistical shape model. In this case, the reconstruction has already been limited to

the estimation of shape parameters only by separating the estimation of alignment, illumination and texture from shape estimation. This strategy has important implications for the choice of the optimization method. In this case, there is no need for a complex optimization scheme such as the stochastic Newton optimization since the optimization space has a well defined behavior being affected only by shape variations.

Downhill simplex is a multidimensional optimization algorithm described by [PTVF92]. In addition to being simple and robust this optimization method is geometric in nature and hence is more suitable for shape analysis. During each optimization step, downhill simplex can try a number of things such as reflection, expansion, contraction etc in one or more dimensions. These steps exploit the ability of the given statistical shape space to model the 3D shape from the target image by changing the 3D shape in different ways.

Earlier in the thesis, it was noted that shape reconstruction is a non-convex optimization problem involving many local minima. One way to reach the true minima is to initialize parameters close to the true solution. In addition suitable constraints on the shape parameters can be applied to limit the shape space. However in many cases it is not possible to initialize the shape parameters close to the target shape, therefore the best guess is to start with the mean shape. In this research each of the shape parameters are randomly initialized between ±0.5 standard deviations, since most of the shape variation is concentrated within this range.

5.2.3 Shape Estimation

After performing spatial and intensity alignment, the major difference left between the projected image and the target image is the shape difference. This shape error is minimized by optimizing over the parameters of the statistical shape model. Each of the iterations of the optimizer involves updating shape parameters using the distance between the projected face image and the target image measured by some suitable cost function.

The similarity measure or cost function is as good as the information it is fed. In other words the characteristics of the feature space on which the cost function is based needs to be

5.2. Face Reconstruction using Texture Mapped Shape Models 109

carefully considered while devising a similarity measure. Traditionally, the most important features used for shape reconstruction have been anatomical landmarks, pixel intensities and edge information. Moreover, it has been shown that a multi-feature approach considerably simplifies the optimization problem for shape reconstruction. Therefore, a cost function based on landmarks and intensity is proposed. The shape can then be estimated using such a cost function as follows:

T (α|β, η) = T (αmin|β, η)

where αmin = arg min α

(D) (5.9)

Here the D represent suitable distance measures between the pixel intensities and the corresponding landmark points on two images.

Since pixel intensities and landmarks have different units, combining distance measures based on them becomes an issue. To handle this issue Cootes et al [CET98] assigned appropriate weights to the individual features. The resulting cost function F (x) is obtained by summing the contribution of individual features f1(x), f2(x), . . . , fk(x) and can be written as:

F (x) = w1f1(x) + w2f2(x) + . . . + wkfk(x) (5.10)

The distance measure (D) in equation 5.9 can be simplified by summing the individual cost function for landmarks and intensity while using an appropriate weight and is of the following general form:

D = log(DIntensity(Ip, Iinp)) + (wt)(log(DLandmark(Lp, Linp))) (5.11)

Here Ip, Iinp, Lp, Linp are projected and input images and landmarks respectively, and DIntensity,

DLandmarks represents the Euclidean distance between the corresponding pixel intensities and

landmarks.

image intensity to optimize over the shape parameters (α) for the given camera and illumination parameters (β, η). The logarithm in equation 5.11 is used to normalize the error so that it varies in a suitable fashion.

In document Analysis of 3D Face Reconstruction (Page 128-132)