The Gamma-Poisson model - Contributions to probabilistic non-negative matrix factorization

2.2.2 Composite structure of the model . . . 58 2.3 New formulations of GaP . . . 58 2.3.1 GaP as a composite negative multinomial model . . . 58

2.3.2 GaP as a composite multinomial model . . . 59 2.4 Closed-form marginal likelihood. . . 60 2.4.1 Analytical expression . . . 60

2.4.2 Self-regularization . . . 62 2.5 Optimization algorithms . . . 63 2.5.1 Expectation-Maximization . . . 63

2.5.2 Monte Carlo E-step . . . 64

2.5.3 M-step . . . 65 2.6 Experimental work . . . 67 2.6.1 Comparison of the algorithms . . . 67

2.6.2 Examples of the self-regularization phenomenon . . . 74 2.7 Discussion . . . 76

2.1 Introduction

A wide variety of probabilistic models are based on the following observation model

vf n∼Poisson([WH]f n), (2.1)

with W ≥ 0, H ≥ 0. Such models are called Poisson factorization models, or sometimes Poisson factor analysis. As discussed in Section 1.3, we may classify these models in two categories:

• Semi-Bayesian models. When independent Gamma priors are assumed on H, and W is treated as a deterministic variable, we retrieve the so-called Gamma-Poisson model ofCanny (2004) and Buntine and Jakulin (2006);

• Bayesian models, which assume prior distributions on both W and H. Typical prior distributions include independent Gamma distributions, or column-wise Dirich- let distributions. These Bayesian models have found applications in image processing (Cemgil, 2009), as well as in recommender systems (Ma et al.,2011; Gopalan et al.,

2015), where they have received a particular attention. Several works have also been devoted to non-parametric approaches to alleviate the problem of setting the factor- ization rank K (Zhou et al.,2012;Gopalan et al.,2014;Zhou and Carin,2015). In this chapter, we focus on maximum marginal likelihood estimation in the Gamma- Poisson model. This problem has been first been addressed inDikmen and Févotte(2012). In their experiments, they found this estimation process to be robust to over-specified values of K; an intriguing behavior that was left unexplained. We provide the following contributions:

• We provide a closed-form expression of the marginal likelihood. This expression is tedious to compute for large F and K, as it involves combinatorial operations, but is workable for problems in small dimension.

• We show that the proposed closed-form expression reveals a penalization term on the columns of W that explains the “self-regularization” effect observed in Dikmen and Févotte(2012).

• We compare three variants of the EM algorithm, and experimentally show that the one based on the marginalization of H has favorable properties.

The rest of the chapter is organized as follows. Section2.2introduces the Gamma-Poisson model. In Section 2.3, we propose two novel formulations of the model in which H has been marginalized out. This leads to a closed-form expression of the marginal likelihood, discussed in Section 2.4. Three optimization algorithms are presented in Section 2.5, and experimental work is conducted in Section 2.6. We conclude by a general discussion in Section2.7.

N • • α β

h

W

•

v

n N K • • αk βk

h

w

k •

c

v

Figure 2.1: Graphical representations of the Gamma-Poisson model. Observed variables are in blue, while latent variables are in white. Deterministic parameters are represented as black dots. The observation vn is a vector of size F . From left

to right:(a) The standard model. The latent variable hn is of size K; (b) The

augmented model with variables C. The latent variable ckn is of size F , and

hknis scalar.

2.2 The Gamma-Poisson model

2.2.1 First formulation

The Gamma-Poisson (GaP) model is a probabilistic matrix factorization model which was introduced in the field of text information retrieval (Canny,2004;Buntine and Jakulin,

2006). In this field, a corpus of text documents is typically represented by an integer-valued matrix V of size F × N, where each column vn represents a document as a so-called “bag

of words”. Given a vocabulary of F words (or in practice semantic stems), the matrix entry

vf n is the number of occurrences of word f in the document n. GaP is a generative model

described by a dictionary of “topics” or “patterns” W (a non-negative matrix of size F ×K) and a non-negative “activation” or “score” matrix H (of size K × N), as follows:

hkn∼Gamma(αk, βk), (2.2)

vf n|hn∼Poisson ([WH]f n) , (2.3)

where we use the shape and rate parametrization of the Gamma distribution, see its p.d.f. in Equation (0.8). The dictionary W is treated as a free deterministic variable.

We consider maximum marginal likelihood estimation (MMLE), in which H is treated as a latent variable over which the joint likelihood is integrated. In other words, MMLE relies on the minimization of

L(W, α, β)def= − log p(V; W, α, β) (2.4)

= − logZ H

p(V|H; W)p(H; α, β)dH, (2.5)

where α = {α1, . . . , αK} and β = {β1, . . . , βK}. We emphasize that the dictionary W is

treated as a free deterministic variable.

2.2.2 Composite structure of the model

GaP can be augmented with auxiliary variables C, yielding a composite model, thanks to the superposition property of the Poisson distribution (Févotte et al.,2009):

hkn∼Gamma(αk, βk), (2.6) cf kn|hkn∼Poisson(wf khkn), (2.7) vf n= X k cf kn. (2.8)

In the remainder, the vectors ckn = [c1kn, . . . , cF kn]T of size F and which sum up to vn

will be referred to as components. C will denote the F × K × N tensor with coefficients

cf kn. Figure 2.1 displays graphical representations of the Gamma-Poisson model, both in

its initial form (on the left) and augmented form (on the right).

In document Contributions to probabilistic non-negative matrix factorization - Maximum marginal likelihood estimation and Markovian temporal models (Page 55-58)