• No results found

2.3 Population Likelihood Estimation

2.3.1 Modeling a Population’s Variability

Let {Qi} be a population of n objects described by one of the quantile function based representations presented in Section 2.2. This section assumes that {Qi} are samples from an underlying probability distribution P that must be estimated. The resulting estimate, ˆP, is used to define the likelihood that a new object is from P. I parametrically estimate P by assuming it is Gaussian distributed asN(µ,Σ),i.e., I estimate its first and second order statis- tics. The details of this model will now be discussed along with the factors that determine its appropriateness and its ability to be accurately estimated using principal component analysis (PCA).

First, µ can be simply computed as the linear average of {Qi}. Recall thatµ will always be a valid quantile function since the space of the representation is convex. Further, µ will be representative of the population when it exists on a linear subspace. Section 2.1 described in detail the convexity and linear subspaces of QF functions; Section 2.2 discussed how these properties are conserved for the generalized QF representations. Chapters 3 and 4 discuss the linearity of their particular populations.

Σ can be estimated using PCA. In order to understand if it is appropriate to use PCA in this situation, it is important to remember that PCA is typically used for two different tasks with different requirements. One task is to generate points of interest, which requires, in increasing order of stringency, convexity, linearity, and a vector space. The other task is to estimate the likelihood of points in a space, which requires convexity, linearity, and Gaussianity (in increasing order of stringency). The first task is generative while the second is discriminative. The generative task requires a vector space so that only valid points are generated. The discriminative task, however, is typically not concerned with invalid points. Both tasks considered in Chapters 3 and 4 are discriminative, and only the probability of valid points are of interest. The likelihood of invalid points is never asked for, so the fact that they get assigned a nonzero probability is of little concern. Σ is being estimated in this section for such a discriminative task. The convexity of the space and the linearity of the population’s variation have already been discussed. Approximate Gaussianity is assumed in this section and for the populations considered in Chapters 3 and 4.

Now, I consider how well Σ can be estimated using PCA. Assuming that the population {Qi}is appropriately Gaussian, there are three main factors that determine how well Σ can be estimated: the number of points in the population, n, the dimension of the space, d, and the inherent dimensionality of the population, D. The inherent dimensionality of a population is the dimensionality of the subspace (RD) that the population is restricted to in the full space (Rd), disregarding any noise present in the population samples.

PCA is typically considered only in terms of nandd. The populations in chapters 3 and 4 typically haved > n, withnin the 10’s and din the 100’s. This is known as a high dimension low sample size (HDLSS) situation [MCAM]. A direct application of PCA can only estimate a singular covariance matrix, Σ0, in HDLSS situations. Σ0 only estimates the likelihood of points in a subspace ofRd. The likelihood of a pointx inRd is computed by first projecting

x into the computed subspace as x0 and then computing the likelihood of x0. Σ0, however, is inappropriate for the tasks considered in Chapters 3 and 4. Both tasks need to estimate how likely x is from P, when you expect to see points not from P. Thus, the likelihood of points far from the estimated subspace need to be computed. Σ0 discards the difference between x

and x0, the information that in some situations is the most informative for determining if xis from P.

In order to estimate a non-singular covariance matrix, I consider Σ in terms of the pop- ulation’s variation in RD and an isotropic variation, or noise, inRd. The covariance matrix can then be thought of as Σ = Σ0 +σ0I, the sum of a singular covariance matrix and an isotropic variance. The populations considered in Chapters 3 and 4 are shown to exhibit a low inherent dimensionality, allowing this formulation to be effective and efficient despite the high dimensionality of the space (larged) and the limited sample sizes (smalln). Therefore, for the remainder of this section I will assume that D < n < d.

Before discussing how to estimate Σ using PCA, first I first express Σ in a different form. Any non-singular covariance matrix inRdcan be written as Σ =UΛU−1, whereU is a rotation

matrix composed of orthogonal unit vectors and Λ is a diagonal matrix. PCA expresses Σ in such a form where the columns of U are eigenvectors of Σ and the diagonal entries of Λ are eigenvalues of Σ. The above Σ can be expressed in this form using eigenvalues

[λ1, . . . , λD, σ, . . . , σ], where there are d−D σ’s. The maximum likelihood estimate (MLE) of covariance matrices of this form can be estimated as follows. Use PCA to compute, in decreasing order by eigenvalue, thennon-zero eigenvalues,λi, with corresponding eigenvectors,

Ui, in the ddimensional space. The columns ofU areU1, . . . , UD and an arbitrary orthogonal basis in the remaining d−D dimensional subspace. Λ is composed of λ1, . . . , λD and σ =

Pn

i=D+1λi/(n−D), the sum of the remaining eigenvalues normalized appropriately for the HDLSS situation.

In my experiments in Chapters 3 and 4, I have found the above formulation to be overly sensitive toσ. This is due to the fact that oftend >> d−D, makingσ much more important in the likelihood estimate than theDeigenvalues. Therefore, in my model I do not normalize

σ by dividing by n−D, instead I set it as the simple sum of the remaining eigenvalues. This formulation can be viewed as measuring the expected projection error onto the measuredRD subspace. The resulting Gaussian likelihood estimate contains D+ 1 Mahalanobis distances rather than the d in the original formulation, which seems sensible since it is based on the inherent variability of the population instead of the arbitrary dimension of the space.

This section described an approach to estimating the likelihood of a population of objects described by a QF based distribution representation. Next, Sections 2.3.2 and 2.3.3 discuss other interpretations ofµand Σ. Section 2.3.4 then considers how to select the new parameter this approach introduces, the number of kept eigenvalues.