Learning Mixture Model Parameters - Learning, Large Scale Inference, and Temporal Modeling of D

4.3 Analysis

5.3.4 Learning Mixture Model Parameters

In Secs. 4.4.1 and 4.4.2, we introduced repulsive mixture modeling and repulsive latent social clustering by putting a kDPP prior on the location parameters, _{µk}

and performing Gibbs sampling to learn the mixture model. In many cases, a priori, we have no information about the clusters variances and the amount of repulsion that exists in the data. Instead of setting the hyperparameters to arbitrary values, it is desirable to develop a more robust way to handle this.

We do this by learning the parameters in each iteration of the Gibbs sampling using slice sampling as proposed in Sec. 5.1.3. In experiments in Secs. 4.4.1 and 4.4.2, we find that performing the slice sampling learning of the parameters in each Gibbs iteration leads to results that are at least as good as setting σ2 _{= 1}

and ρ2 _{= 1. Furthermore, this method is now robust to the scaling of the data as}

0 500 1000 σ Cars−Color Google Top 6 DPP/Turk 0 200 400 600 800 σ Cars−SIFT Google Top 6 DPP/Turk 0 500 1000 σ Cars−GIST Google Top 6 DPP/Turk 0 500 1000 σ Dogs−Color Google Top 6 DPP/Turk 0 500 1000 σ Dogs−SIFT Google Top 6 DPP/Turk 0 500 1000 σ Dogs−GIST Google Top 6 DPP/Turk 0 500 1000 σ Cities−Color Google Top 6 DPP/Turk 0 200 400 600 800 σ Cities−SIFT Google Top 6 DPP/Turk 0 500 1000 σ Cities−GIST Google Top 6 DPP/Turk

Figure 5.9: For the image diversity experiment, boxplots of posterior samples of (from left to right) σcat

color, σSIFTcat and σcatGIST. Each plot shows results for human

annotated sets (left) versus Google Top 6 (right). Categories from top to bottom: (a)cars, (b)dogs and (c)cities

and repulsion among clusters.

5.4 Conclusion

Determinantal point processes have become increasingly popular in machine learning and statistics. While many important DPP computations are efficient, learning the parameters of a DPP kernel is difficult due to the fact that the likelihood is non-convex and that the likelihood and it’s gradient are either not known or not computationally feasible in many scenarios. We proposed Bayesian approaches using MCMC, in particular, for inferring these parameters. In addition to being more robust and providing a characterization of the posterior uncertainty, these algorithms can be modified to deal with large-scale and continuous DPPs. We also showed how our posterior samples can be evaluated using moment matching as a model-checking method. Finally we demonstrated the utility of learning DPP parameters in studying diabetic neuropathy and evaluating human perception of diversity in images. We also illustrated that we can perform full Bayesian inference in mixture models by combining the continuous DPP sampling algorithm and the learning of the kernel parameters.

Chapter 6 Markov DPPs

While a discrete DPP is useful in selecting diverse subcollections, there are many applications that require these subsets to be diverse not just individually but also through time. For example, we might use a DPP to display a set of news headlines that are relevant to a user’s interests while maintaining diversity, covering a variety of topics. Suppose further that we are asked to sequentially select multiple diverse sets of items, for example, displaying new headlines day-by-day. In this case we want the subsets to be diverse across time, offering headlines today that are unlike the ones shown yesterday. In this chapter, we construct a Markov DPP (M-DPP) that models a sequence of random sets _{Yt}. The proposed M-DPP defines a

stationary process that maintains DPP margins. Crucially, the induced union process Zt≡ Yt∪Yt−1∪ · · · ∪Yt−p is also marginally DPP-distributed. Jointly,

these properties imply that the sequence of random sets are encouraged to be diverse both at a given time step as well as across consecutive time steps. Figs. 6.1 and 6.2 illustrate the comparison between a sequence of independent samples from a DPP and a sequence sampled from a first order M-DPP which introduces diversity across two consecutive time-steps. This process can also be extended to a higher order such that more than two consecutive subsets are jointly diverse.

Figure 6.1: Sequence of samples drawn independently from a DPP. While DPP points are diverse within each time steps, the union of subsets across consecutive time-steps are not.

Fig. 6.3 illustrates the case for order 2. Actual samples from a 1-dimensional DPP and M-DPP with Gaussian kernels are shown in Fig. 6.4.

Our specific construction of the M-DPP yields an exact sampling procedure that can be performed in polynomial time. Additionally, we explore a method for incrementally learning the quality of each item in the base set _Y based on externally provided preferences. In particular, a decomposition of the DPP kernel matrix has an interpretation as defining the quality of each item and pairwise similarities between items. Our incremental learning procedure assumes a well- defined similarity metric and aims to learn features of items that a user deems as preferable. These features are used to define the quality scores for each item. The M-DPP aids in the exploration of items of interest to the user by providing sequentially diverse results.

Figure 6.2: Sequence of samples drawn from a first order Markov-DPP. Not only are the M-DPP samples diverse within each time steps but the union of two consecutive subsets are jointly diverse as well.

Figure 6.3: Sequence of samples drawn from a second order Markov-DPP. Not only are the second order M-DPP samples diverse within each time steps but the union of any of the three consecutive subsets are jointly diverse as well.

DPP M-DPP

Time Time

Figure 6.4: A set of points on a line (y axis) drawn from a DPP independently over time (left) and from a M-DPP (right). While DPP points are diverse only within time steps (columns), M-DPP points are also diverse across time steps.

6.1 Markov DPPs (M-DPPs)

In certain applications, such as in the task of displaying news headlines, our goal is not only to generate a diverse collection of items at one time point, but also to generate collections of items at subsequent time points that are both highly relevant and dissimilar to the previous collection. To address these goals, we introduce the Markov determinantal point process (M-DPP), which emphasizes both marginal and conditional diversity of selected items. Harnessing the quality and similarity interpretation of the DPP in Eq. (2.11), the M-DPP provides a dynamic way of selecting high quality and diverse collections of items as a temporal process. In this section, we first explore the easier construction of a first order M-DPP/M-

kDPP including their sampling algorithms. We then extend this construction to M-DPP/M-kDPP of higher order such that longer sequences of consecutive subsets are ensured to be jointly diverse.

In document Learning, Large Scale Inference, and Temporal Modeling of Determinantal Point Processes (Page 138-145)