2.2 Quantile Function Generalizations
2.2.3 Quantile Function Mixtures
Section 2.1 showed that the quantile function is appropriate for modeling distributions with variation similar to location and scale change. Mixture variation, however, was shown to form nonlinear paths in the space of QFs. One approach to modeling mixture variation is to explicitly compute the underlying distributions and their mixture weights. This is the approach taken in this section, where quantile function mixtures are computed by estimating each underlying distribution by a quantile function. Let Y and X1, . . . , Xn be univariate random variables, and letw1, . . . , wnbe mixture weights such thatY ∼w1X1+. . .+wnXn. A QF mixtureQ= [w1, Q1, w2, Q2, . . . , wn, Qn] is defined where the QFQdefines the distribution followed by Y, and the QF Qi defines the distribution followed byXi, fori= 1, . . . , n.
This section focuses on understanding the linear subspaces of QF mixtures and on how to relate their Euclidean distance to the EMD. The estimation of the QF mixture parameters for a given distribution or for a given set of distribution samples is only briefly discussed. However, this can be a complicated task that is the focus of a large body of literature in mixture modeling [Has66, TSM85, MP00]. Parametric mixture models, such as a mixture of Gaussian distributions, are constrained, i.e., they cannot exactly represent all distributions. This leads to a parameter estimation task that must trade off between the accuracy of the mixture model and the likelihood of the model parameters as given by a prior. QF mixtures are able to exactly represent all distributions, so do not face this tradeoff.
This flexibility of QF mixtures, however, leads to the disadvantage that there is ambiguity in the representation: a given distribution can be exactly represented by a variety of QF mixtures. A prior on the model parameters resolves this ambiguity by allowing the most likely QF mixture to be selected. In the medical imaging applications discussed in Chapter 4, a QF mixture is estimated using thresholding, where n−1 threshold values separate n underlying distributions that correspond to different tissue types. When the underlying distributions are widely separated, this approach is accurate as well as computationally simple, and it resolves any ambiguities in the representation. This is further discussed in Section 4.2.
The linear subspaces of QF mixtures are easily understood. By construction, QF mixtures have the linear subspaces of QFs plus the linearity given by the mixture parameters. Therefore,
mixture changes between the Qi’s are linear, but mixture variation within each Qi remains nonlinear. The additional mixture linearity, however, comes at a cost. As discussed above, estimating a QF mixture can be more difficult than estimating the distribution’s QF. Also, QF mixtures are less general than QFs; QFs are a purely nonparametric representation while QF mixtures introduce specific model parameters that must be chosen in advance for an application.
While linear interpolation of QF mixtures is appropriate, unfortunately Euclidean distance is not. One simple requirement of a sensible Euclidean distance is for the dimensions to be in commensurate units; the mixture weights and quantiles in a QF mixture are incommensurate. I will now linearly scale the space of QF mixtures to make Euclidean distance appropriate while leaving linear interpolation unchanged. Depending on the number of underlying distributions in the QF mixture and the assumptions made, Euclidean distance can be made to be locally equivalent to the EMD or to an upper or lower bound of the EMD.
First, consider mixtures of 2 quantile functions. Let Q = [w1, Q1, w2, Q2], where Qi has
bi bins. The EMD is measured in units of work, mass ×distance. For a QF, each dimension has a fixed mass and a variable location. A change in a variable is a change in location, which is distance and it can be converted to work by multiplying by its mass, 1/b. In a QF mixture, the weight of the quantiles in Qi arewi/bi. The weights,w1 and w2, can also be put
into units of work. A change to wi corresponds to moving mass from one of the underlying distributions to the other. A change in mass can be converted to work by multiplying by the fixed distance the mass must travel. The distance between the underlying distributions is their EMD, EMD(Q1, Q2). I include both w1 and w2 in this representation, which counts both the
positive and negative movement of the distribution mass. Therefore, I instead multiply wi by half the distribution distance, 12EMD(Q1, Q2). Alternatively, onlyw1 could be included in the
representation, but this approach does not generalize well forn >2. Let Qave= 12(Q1+Q2).
To summarize, Q can be scaled to Q0 by setting Q0i =wiQi/bi and wi0 = 12EMD(Q1, Q2)wi =
EMD(Qi, Qave)wi. In practice, a lower bound of the EMD between Q1 and Q2 can often be
used. If the two distributions are well separated, the difference of their means is an accurate lower bound on their EMD. In this case,w0i =|µi−µave|wi.
The scaling computed above is specific to the QF mixture Q. Euclidean distance is equal to the EMD only when comparing QF mixtures close toQ. Otherwise the assumption that the weights and quantiles can be independently scaled is false. It is also inappropriate to compare QF mixtures that have been scaled with respect to different distributions. Therefore, I define a metric between two QF mixtures using their average to determine the scaling. Similarly, for a population of QF mixtures, distances can be computed with respect to the average QF mixture of the population. For a population, this results in a Euclidean distance near the population’s mean that is approximately equal to the EMD.
Currently, a distance metric has been defined for QF mixtures consisting of two underlying distributions. Whenn >2, a similar metric can also be constructed. However, the exact EMD is difficult to express in terms of the parameters of a QF mixture. Therefore, an upper bound of the EMD is used instead. Let Q = [w1, Q1, w2, Q2, . . . , wn, Qn] be a QF mixture with n underlying distributions. The scaling computed for the quantiles in the n = 2 case are still appropriate, where Qi is scaled to wiQi/bi. The scaling on the weights, however, does need to be reconsidered. For n > 2, when mass moves from an underlying distribution, it is not straightforward which underlying distribution it moves to. This problem is equivalent to the underlying optimal matching done in the EMD itself. Therefore, I use an upper bound on this distance that leverages the triangle inequality. For n= 2, w0i = EMD(Qi, Qave)wi. This distance is exactly the EMD because as the mass moves fromQ1 toQ2, or vice versa, it must
pass through Qave. For n > 2, the mass does not need to pass through Qave. However, due to the triangle inequality, forcing the mass to go through Qave makes the distance an upper bound of the EMD. I use this scaling for the n > 2 case. Intuitively, the wi’s that increased move extra mass to the mean distribution, and thewi’s that decreased grab needed mass from the mean distribution. Therefore, knowing the matching between thewi’s is not needed. Thus, the same scaling of Qis computed for all values ofn, just its interpretation changes.
The space of QF mixtures forms a convex space. The constraint Pn
i=1wi = 1 is linear, which implies that averaging and interpolation will be valid. Also, since the scalings computed above are linear these properties also hold for the scaled QF mixtures.
2.2.4 Summary
Section 2.2 presented three representations of probability distributions that generalize the quantile function. The generalizations for multivariate distributions presented in Sections 2.2.1 and 2.2.2 are used in Chapter 3 to represent textured materials. The generalization for univariate distributions containing mixture variation presented in Section 2.2.3 is used in Chapter 4 to represent the appearance of organs in CT images. Next, Section 2.3 discusses how to estimate likelihoods in all of these spaces from a population.