Indian Buffet Process - Latent Feature Models

1.4 Latent Feature Models

1.4.2 Indian Buffet Process

The Indian buffet process (IBP) (Griffiths and Ghahramani, 2006, 2011)) is a popular example of an exchangeable random feature allocation. Using the matrix representation of feature allocation, the IBP defines a distribution on binary matrices (with an unbounded random number of columns).

A Finite Feature Model. The IBP can be defined as the limit of a finite feature model. Suppose we have N objects and K features. We use a binary variable znk to indicate object n has feature k, thus znk form a binary N × K

matrix Z. We assume that each object possesses feature k with probability πk, and the features are independent. Furthermore, beta distribution priors

are put on πk’s. That is,

znk | πk∼ Bernulli(πk);

πk∼ Beta(α/K, 1).

Integrating out the πk’s, the marginal distribution of Z is

p(Z) = K Y k=1 α KΓ(mk+ α K)Γ(N− mk+ 1) Γ(N + 1 +_Kα) ,

where mk = PN_n=1znk is the number of objects possessing feature k. This

distribution is exchangeable, since it only depends on the counts and does not depend on the ordering of the objects.

Left-ordered Constraint and Equivalence Classes. A feature allocation indicates an ordering of the K features. In many applications, the ordering of the features is not identifiable. When the labels of the features are arbitrary, it is helpful to define an equivalence class of binary matrices, denoted by [Z]. We first introduce an order constraint on binary matrices called the left-ordered constraint. For a binary matrix Z, its corresponding left-ordered binary matrix, denoted by lof (Z), is obtained by ordering the columns of Z

from left to right by the magnitude of the binary number expressed by that column, taking the first row as the most significant bit. For example, Figure 1.1(b) shows the corresponding left-ordered binary matrix of Figure 1.1(a). In the first row of the left-ordered matrix, the columns for which z1k = 1 are

grouped at the left. In the second row, the columns for which z2k = 1 are

grouped at the left of the sets for which z1k = 1. This grouping structure

persists throughout the matrix.

We can then define equivalence classes with respect to the function lof (_{·). This function maps binary matrices to left-ordered binary matrices,} as described before. The function lof (·) is many-to-one: many binary matrices reduce to the same left-ordered form, and there is a unique left-ordered form for every binary matrix. Any two binary matrices Y and Z are lof (·) equivalent if lof (Y ) = lof (Z). In models where feature order is not identifiable, performing inference at the level of lof -equivalence classes is appropriate. The probability of a particular lof -equivalence class of binary matrices [Z] is p([Z]) =P

Z∈[Z]p(Z).

The matrix left-ordered form motivates the following definition. The history of feature k at object n is defined to be (z1k, . . . , z(n−1)k). When n

is not specified, history refers to the full history of feature k, (z1k, ..., zN k).

The histories of features are individuated using the decimal equivalent of the binary numbers corresponding to the column entries. For example, at object 3, features can have one of four histories: 0, corresponding to a feature with no previous assignments, 1, being a feature for which z2k = 1 but z1k = 0, 2,

being a feature for which z1k = 1 but z2k = 0, and 3, being a feature possessed

by both previous objects were assigned. The number of features possessing the history h is denoted by Kh, with K0 being the number of features for which

mk = 0 and K+ =P2

N₋₁

h=1 Kh being the number of features for which mk> 0,

so K = K0+ K+. The function lof thus places the columns of a matrix in

ascending order of their histories.

Using the notion above, the cardinality of [Z] is K!/Q2N−1

h=0 Kh! . Thus, p([Z]) = K! Q2N−1 h=0 Kh! · K Y k=1 α KΓ(mk+ α K)Γ(N − mk+ 1) Γ(N + 1 + α K) . (1.15)

Taking the Infinite Limit. Taking the limit K _{→ ∞ in Equation (1.15),}

lim K→∞p([Z]) = αK+ Q2N−1 h=1 Kh! · exp{−αHN} · K+ Y k=1 (N − mk)!(mk− 1)! N ! , (1.16)

where HN is the N -th harmonic number, HN = PNj=11/j. See Griffiths and

Ghahramani (2011) for details. This distribution is still exchangeable. In practice, we usually drop all columns with all zeros, since they corresponds to the features that no object possesses, and it should not be included in the feature allocation as features are non-empty sets. It can be proved that we can obtain a matrix with finite columns with probability 1 by deleting the columns with all zeros.

Indian Buffet Analogy. The probability distribution defined in Equation (1.16) can be derived from a simple stochastic process, which is referred to

as the IBP. Think about an Indian buffet where customers (objects) choose dishes (features). In the buffet, N customers enter one after another, and each customer encounters infinitely many dishes arranged in a line. The first customer starts at the left of the buffet and takes a serving from each dish, stopping after a Poisson(α) number of dishes. The n-th customer moves along the buffet, sampling dishes in proportion to their popularity, serving himself with probability mk/n, where mk is the number of previous customers who

have sampled a dish. Having reached the end of all previously sampled dishes, the n-th customer then tries a Poisson(α/n) number of new dishes. We use a binary matrix Z with N rows and infinitely many columns to indicate which customers chose which dishes, where znk = 1 if the n-th customer sampled

the k-th dish. The matrices produced by this process are generally not in left-ordered form, and customers are not exchangeable under this distribution. However, if we only record the lof -equivalence classes of the matrices generated by this process, one obtains the exchangeable distribution p([Z]) given by Equation (1.16).

Beta Process. Similar to the relationship between the DP and the CRP, the de Finetti’s measure underlying the exchangeable distribution produced by the IBP is the beta process (BP) (Hjort, 1990). See full details in Thibaux and Jordan (2007).

Other discussions of the IBP and the BP include Teh et al. (2007), Teh and Gorur (2009), Doshi et al. (2009), Paisley et al. (2010), Williamson et al.

(2010), Knowles and Ghahramani (2011), and Miller et al. (2012).

In document Bayesian nonparametric models for biomedical data analysis (Page 39-44)