• No results found

3.3 The Era of eigenfaces

3.3.5 Probabilistic Principal Component Analysis (PPCA)

The so-called probabilistic principal component analysis (PPCA) model that is based on the factor analysis model was developed by Tipping and Bishop [212, 211] and independently by

4

40 State of the art

y − µ

x

w

1

w

2

w

3 DFFS DIFS

p

F¯

(y|Ω)

Figure 3.5: In illustrative example of the two distances DFFS and DIFS (see the text). The subspace is spanned by two principal components w1 and w2, the complementary space is

spanned by w3. The inspace distribution of the object is drawn as a dotted ellipse. For the

observed vector y both DFFS and DIFS are evaluated to determine whether an object is close to its class distribution or not.

Roweis [188, 189] (under the name of sensible PCA). This model has not to our knowledge been used for face modeling, but has been applied to build class hierarchies of objects [57] as well as for analysis and visualization of multispectral data [11]. The PPCA model has the advantage that it leaves more room for interpretation and extensions than the model proposed by Moghaddam and Pentland. We now describe the PPCA model and will later describe its relation to the model of Moghaddam and Pentland.

The model is a linear latent variable model of the form

y = W x + µ +  (3.10)

where a new random variable has been introduced: the Q dimensional latent variable (subspace variable), x. The components of this variable are considered to be identically and independently distributed (i.i.d) as a Gaussian with unit variance, x ∼ N (0, IQ). The noise distribution is

also Gaussian and isotropic,  ∼ N (0, σ2I

D). Here, the D × Q dimensional factor loadings

or generation matrix, W is in general not the eigenvectors of the covariance matrix, but it is orthogonal. Under the model assumptions we can calculate the following distributions ([212] and our calculations):

• The distribution of the observation is Gaussian, y ∼ N (µ, Σy), with the variance (see

for example [4, p.553]):

Σy = σ2ID+ W WT. (3.11)

If we let WQ = W and denote by WD−Q a matrix of vectors that spans the space

3.3 The Era of eigenfaces 41 Eq. 3.8): Σy = [WQWD−Q]          σ2 1 + σ2 0 . .. σ2 Q+ σ2 σ2 . .. 0 σ2           WTQ WTD−Q  , (3.12)

where σq, q = 1 . . . Q are the variances induced by the systematic part of the model, W x.

• The conditional distribution of the observation given the latent variable is p(y|x) = N (W x + µ, σ2I

D). This is also the key motivation for the model: the components of

the observed image y are conditionally independent given the latent subspace variable x. • The posterior distribution of the latent variable given the observation is Gaussian with

p(x|y) = N (µx, Σx|y), where

µx = (σ2IQ+ WTW)−1WT(y− µ), (3.13)

and

Σx|y = σ22I

Q+ WTW)−1.

Note that the last term in Eq. 3.13 is the same as the PCA projection, Eq. 3.2. In PPCA the expectation of the posterior, µx, is indeed the most appropriate choice for projecting an observation into the subspace. We shall discuss in depth this kind of posterior distribution under more general model assumptions in the next chapter. In [212], Tipping and Bishop develop the exact maximum-likelihood estimate of this model, when the covariance Σy is estimated by the sample covariance matrix ˆΣy. The resulting

estimators are:

• The mean vector

ˆ µ= 1 J J X j=1 yj. (3.14)

• The generation matrix

ˆ

W = UQ(ΛQ− σ2IQ)1/2R, (3.15)

where the Q column vectors in the D× Q matrix UQ are the eigenvectors of the sample

covariance matrix ˆΣy, with corresponding eigenvalues in the Q× Q diagonal matrix ΛQ.

The matrix R is an arbitrary Q× Q orthogonal rotation matrix. This arbitrariness is explained by an ambiguity in the model Eq. 3.10: the distribution p(y) is invariant to rotation of the generation matrix, i.e. the model variance does not change (Eq. 3.11). • The noise variance

ˆ σ2 = 1 D− Q D X q=Q+1 λq, (3.16)

where λq+1, . . . , λdare the smallest eigenvalues of ˆΣy. This maximum likelihood estimate

is exactly the same as the estimate found by Moghaddam and Pentland by minimizing the Kullback-Leibler divergence, Eq. 3.9.

42 State of the art

With these estimates, the distribution of observations, p(y), becomes exactly the same as the distribution found by Moghaddam and Pentland (i.e. with the estimated generation matrix Eq. 3.15, the global covariance matrix Eq. 3.12 becomes equal to the global covariance matrix in Eq. 3.8). We can thus explain the relation between the two models: since Moghaddam and Pentland do not model the subspace variable as an independent random variable, we obtain their model from the PPCA model as the marginal distribution at the maximum likelihood estimate ˆΘ = ( ˆW, ˆµ, ˆσ) of the joint distribution p(y, x) as:

p(y)|Θˆ =

Z

p(y, x)|Θˆdx.

As an illustrative example, consider an object whose appearance only changes in the lower region of the image. Furthermore, consider that both models could identify this region as the principal subspace with, say, the same number of principal axes as there are pixels in this region. Now, whereas the model of Moghaddam and Pentland only considers the noise to be present in the upper region of the image (the orthogonal, complementary space), the PPCA model considers an additive noise everywhere in the image. This is also seen in the Eqs. 3.12 and 3.8. The relation between these two models has to our knowledge not been clearly stated in the literature, however, Moghaddam points out in [161] that their model is a special case of the PPCA model.