• No results found

Building a Model

Landmarks Using a Sparse Shape Model

6.2 Building a Model

In general terms, the modelling problem is one of representing a particular class of object or data in a parametric way. The model is defined by a basis that defines a model space which can be indexed by parameters. The model space is a representation of the possible variation in the data and a set of parameters can be used to synthesise any variation of the object or data being modelled.

To construct a sparse shape model for face landmarks, the FRGC dataset is used. Building a model is not dependent on a certain number of landmarks so throughout the discussion of building and fitting a model, the full set of hand labelled landmarks shown in figure 6.1 will be used.

Chapter 4 showed that some landmarks like the pronasale are more easily detected than others like the exocanthions, but this will be ignored when discussing the building and fitting of the model because this is not dependent on candidates. When fitting against candidates from the detector of the previous chapter, removing the more difficult to detect landmarks from the model may be necessary. In order to construct a sparse shape model of the face landmarks, each face must first be aligned to a frame for the model. A frame is the fixed alignment that the model operates in.

6.2. BUILDING A MODEL 113 Principal component analysis (PCA) is used to construct the model space. Section 5.3.2 describes the use of PCA to reduce the dimensionality of data. When defining a model with PCA, the eigenvectors from the principal component analysis provide the basis for the model space and the eigenvalues represent the amount of variation in the data captured by each eigenvector. In the model building phase, correspondence is known between the training landmarks, we construct the shape model to capture the variation in position of each landmark in relation to the others.

6.2.1 Alignment

The faces in the FRGC dataset are not aligned to any canonical form or frame so they must first be brought into alignment with each other before the model can be constructed. When using PCA to construct the sparse shape model, the variation in the coordinate positions (x, y, z) of each corresponding landmark are captured. The aim of modelling is to capture the variation in the shape of the face and relative positions of the landmarks. Since the model encapsulates the positional variation of each landmark in the dataset, the faces must be aligned beforehand;

otherwise the variation in the positions of the faces will overwhelm the much smaller variation in relative landmark positions. The specific alignment frame for the model is not necessarily important, only that each of the training faces are aligned to it. The alignment is performed using generalised Procrustes analysis (GPA)[60, 131, 132].

When aligning each set of landmark points to the model frame, the scaling portion of the alignment is ignored. Dryden and Mardia[60] define shape as all geometrical information when location, scale and rotational effects are removed. Ignoring scale in the alignment process means that the scale component is included to the sparse shape model along with other positional vari-ations. Strictly this would make our sparse shape model a shape and size model. It was decided to include scale in the model because when fitting the model to a set of input points, scale must be accounted for whether it is included in the model or a separate alignment step. By including scale in the model it is calculated along with the other parameters and removes an alignment step.

Also, as the landmark locations and distances between them are an absolute measurement from each face, the scale property is inherent to the facial geometry rather than a factor introduced by imaging.

Algorithm 6.1 shows the generalised Procrustes analysis (GPA) procedure used to align the input faces and find a model frame. The algorithm repeatedly calculates a mean set of landmarks and then aligns each face to this mean. The final orientation of the mean landmark set represents the selected model frame. Multiple iterations are used to ensure the best fit to the mean for each face. The alignment method (align) minimises the sum of the squared distances between

Algorithm 6.1 Generalised Procrustes Analysis Alignment

Input: F(set of corresponding landmarks for each face), ε(threshold), imax(maximum iterations) Output: F0(all landmark sets aligned)

γ ← ∞

corresponding points of input landmark sets and calculated mean. Therefore, in the first iteration we are essentially finding an approximate frame for the model. The initial calculated mean does not represent the true mean location of landmarks as it is contaminated by the different frames each face is in. Subsequent iterations, where the mean is recalculated and the input landmarks are realigned, allow for the selected frame, mean landmark position and input alignment to be refined.

Algorithm 6.1 has three inputs: a set F of faces F where,

F =

for L landmarks, a distance error threshold ε and a maximum allowed iterations imax. The output is a modified F0 where ∀F ∈ F0 are aligned. Figure 6.2, shows the effect of algorithm 6.1 on a collection of face landmarks from the FRGC dataset.

There are two stopping criteria in this algorithm: a change in distance error threshold and a maximum number of iterations, imax. The first stopping criteria refers to the distance error, the summed distance between each landmark and its mean landmark equivalent. A summed distance error is calculated for each input set of landmarks, the stopping criteria is when the maximum change in distance error for a set of landmarks for the current iteration is below a certain threshold, usually set to 0.05mm. This stops the alignment when all the landmark sets have settled into a frame. The second stopping criteria controls the maximum number of iterations to ensure stopping in the case that the landmark sets oscillate between alignments and never satisfy the first criteria.