Computational Methods - Bayesian Analysis of Mixtures of Mixtures

Bayesian Analysis of Mixtures of Mixtures

7.3 Computational Methods

Using the above model, we have to nd a way to calculate the posterior dis-tributions for some interesting parameters and predictive density functions of noise and signal data. It is impossible to get analytical results, so a Monte Carlo approximation is employed. Denoting

(⁰ = (¹:::ⁿ⁰) v⁰ = (v¹:::vⁿ⁰)

(¹ = (ⁿ⁰⁺¹:::ⁿ) v¹ = (vⁿ⁰⁺¹:::vⁿ)

( = ((⁰(¹) v = (v⁰v¹)

= (¹:::ⁿ¹):

Similar to Chapter 3, a Gibbs sampler can be implementedby noticing some dierences between those models. We have the following information from the current model:

99 (a) known data: noise data X and signal data Y .

(b) unknown data: noise data x^j+n0 for j = 1:::n¹.

(c) conditional distributions for noise data: (xⁱ^jⁱvⁱ) N(xⁱ^jⁱvⁱ) for i = 1:::n and (yⁱ^j^jⁿ⁰^+jvⁿ⁰^+j) N(yⁱ^j^j+ⁿ⁰^+jvⁿ⁰^+j) forj = 1:::n¹. (d) Dirichlet processes: (¹v¹):::(ⁿvⁿ) is a sample of size n from

D(⁰G⁰()) and¹:::ⁿ¹ is a sample of size n¹ from D(¹G¹()).

Our main goals here are to obtain the predictive density functions for noise data and signal data, the distributions of the number of components for noise data and signal data, and a mixture deconvolution of the predictive density function for the signal data.

Firstly let us assume that ⁰ ¹m⁰m¹⁰¹ are all known. In order to make inferences about the noise data, the posterior samples of ((v) are necessary. Conditional on = (¹:::ⁿ¹), x^j+n0 = y^j ^;^j is known for j = 1:::n¹. Now we have data X and data (xⁿ⁰⁺¹:::xⁿ) with a sample size of n such that xⁱ is conditional independent distributed with a normal distribution with mean ⁱ and variance vⁱ i = 1:::n. Using the ideas of Chapter 3, we can obtain the posterior samples of ((v).

To completeGibbs sampling, we mustsimulate(^j(VXY ). Given((V ),

is independent of X, so (^j(vXY ) = (^j(¹v¹Y ). Then given ^j and v^n0+j, (y^j^;ⁿ⁰^+j)'s are conditional independent. y^j^;ⁿ⁰^+j is from the normal distribution with mean ^j and variance v^n0+j where ^j is from D(¹G¹()).

Again, this is the framework discussed in Chapter 3. Now, the Gibbs sampler is completed by drawing from (^j(VXY ):

However, some modications are needed in order to obtain the predictive

100 density functions for the noise data and the signal data. Suppose k⁰ k¹ are the number of components for noise and signal data, respectively. There are k⁰ distinct values (^jv^j) among ((v). Following Chapter 3, we have the predictivedistribution of noisex by Monte Carlo average p(x^j(vm⁰ ⁰XY ), namely

f(x^j(vm⁰ ⁰XY ) = ⁰⁰⁺ⁿ^R N(x^jm⁰ ⁰+z)IG(z^js⁰=2V⁰=2)dz +0¹

+n P

j=1n^0jN(x^j^jv^j)

wheren^0j is the number of elements of ((v) with value (^jv^j). As we said in Chapter 3, there is no conjugate posterior distribution forz. The integration can be replaced by the Monte Carlo method. For every Gibbs sampler cycle, sampling z from the prior distribution IG(z^js⁰=2V⁰=2), we can calculate the normal density function at the sampled value z. Here, we should use a proper prior of v in order to draw prior samples of v.

By assuming the independence of D(⁰G⁰()) and D(¹G¹()), we have f(ⁿ¹⁺¹ⁿ⁺¹ vⁿ⁺¹^jv) = f(ⁿ¹⁺¹^j) f(ⁿ⁺¹ vⁿ⁺¹^jv): (7:1) Once we have the posterior samples of and other parameters, the predic-tive density function of yⁿ¹⁺¹ can be obtained by Monte Carlo average:

101

f(yⁿ¹⁺¹^j(vXY ) =

(⁰+n)(1 ¹+n¹)# ⁰¹d⁰+⁰^X^k¹

j=1

n^1jd^j +¹^X^k⁰

i=1

n⁰ⁱN(yⁿ¹⁺¹^jm¹+ⁱvⁱ + ¹) +^X^k0

i=1 k1

j=1

n⁰ⁱn^1jN(yⁿ¹⁺¹^jⁱ +^jvⁱ)]

where d⁰ =^R N(yⁿ¹⁺¹^jm⁰+m¹z + ⁰+ ¹)IG(z^js⁰=2V⁰=2)dz,

d^j =^R N(yⁿ¹⁺¹^j^j +m⁰ ⁰+z) IG(z^js⁰=2V⁰=2)dz, n^1j is the number of ele-mentsof with value ^j andk¹is the numberof distinct values^j among. We can verify above equation by using the equation (7.1) very easily. The rst two terms of the expression can be obtained by samplingz from IG(z^js⁰=2V⁰=2) and calculating N(yⁿ¹⁺¹^jm⁰+m¹z + ⁰+ ¹) and N(yⁿ¹⁺¹^j^j +m⁰ ⁰ +z).

Secondly, we can sample these hyperparameters ⁰ ¹m⁰m¹⁰¹ if they are unknown. Suppose the prior distributions of these hyperparameters are given in the previous section, the following distributions are relevant:

⁰ IG( ⁰^j(t⁰+k⁰^;1)=2(R⁰+S⁰)=2)

(m⁰^j ⁰) N(m⁰^j ⁰=k⁰)

¹ IG( ¹^j(t¹+k¹^;1)=2(R¹+S¹)=2)

(m¹^j ¹) N(m¹^j ¹=k¹)

where S⁰ = ^P^k0^j=1(^j ^; )², = ^P^k0^j=1^j=k⁰, S¹ = ^P^k1^j=1(^j ^; )² and =

j=1^j=k¹.

102 We can also sample ⁰ and ¹ from equation (3.8) by using

(z⁰^j⁰) Beta(⁰+ 1n)

(⁰^jz⁰k⁰) w^{z 0}G(⁰^ja⁰+k⁰b⁰^;log(z⁰))

+(1^;w^{z 0})G(⁰^ja⁰+k⁰^;1b⁰^;log(z⁰))

(z¹^j¹) Beta(¹+ 1n¹)

(¹^jz¹k¹) w^{z 1}G(¹^ja¹+k¹b¹^;log(z¹))

+(1^;w^{z 1})G(¹^ja¹+k¹^;1b¹^;log(z¹))

with weights w^{z 0} and w^{z 1} dened by w^{z 0}

1^;w^{z 0} = a⁰+k⁰ ^;1 n (b⁰^;log(z⁰))

w^{z 1}

1^;w^{z 1} = a¹+k¹^;1 n¹ (b¹^;log(z¹)):

Now, a complete Gibbs sampler can be accomplished for unknown hyperpa-rameters m. A simulation analysis is given in the next section.

7.4 Simulations

We discuss some numericalaspects and present the results of a small simulation study to show the eects of mixtures of mixtures of modeling. Since two dierent models, for example, the standard normal distributionN(y^j01) and a mixtureof two normal distributions 0:5N(y^j;0:51)+N(y^j0:51), may produce the same random samples, we do not use random samples. We analyze the empirical quantiles which represent the distribution quite well. The empirical quantilesz¹:::zⁿ can be dened by F(zⁱ) =i=(n + 1) for i = 1:::n where

103 F() is the cumulative distribution associated with the quantiles. Even though the empirical quantiles have the same drawback, they are better than random samples from the distribution. It is insightful to consider rst the empirical quantiles of each component for the mixture model. For example, quantiles of a normal distribution are available from many sources. We can use the quantiles of each normal distribution to obtain the quantiles of mixture of normal distributions. Generally, we can verify the following properties of a mixture of two distributions.

Suppose the mixture consists of two components. the p-quantile of i-th component is zⁱ for i = 1 2. Then the p-quantile of the mixture is between z¹ and z². Using those z¹ and z² as initial research points, we can use usual numerical methods, say, the bisection method, to obtain thep-quantile of F(z).

150 noise data points were computed from the mixture of two normal dis-tributions 0:5N(x^j^;1:51)+0:5N(x^j1:51). The histogram of simulated noise data is shown in Figure 7.1(a).

At this time, we treat the noise data as `Y' in the model of Chapter 5, using following prior distributions: v IG(v^j10=210=2), IG( ^j0:00:0) and G(^j48) where the notations v are dened in Chapter 5. Then the posterior distribution of the number of components is given by column 2 in Table 7.1. The predictive density function of the data, displayed by the dotted line, is presented in Figure 7.1(c). The solid line plots the true density function of above mixture.

Now we can sample from two discrete states 0 and 10, with probabilities 0.5 and 0.5, respectively. The histogram of signal is plotted in Figure 7.1(b).

If we treat the signal alone as data `Y' in the model of Chapter 5, given the

104 Table 7.1: Posterior Distributions of the Number of Components

noise signal

k Mixture Mixture of Mixture Mixture Mixture of Mixture

1 0.0000 0.00000 0.00000 0.00000

2 0.9889 0.99010 0.00000 0.62177

3 0.0092 0.00777 0.00000 0.27919

4 0.0016 0.00191 0.92732 0.07756

5 0.0003 0.00016 0.05876 0.01771

6 0.01025 0.00324

7 0.00296 0.00049

8 0.00065 0.00004

9 0.00005

prior information vIG(v^j10=210=2), IG( ^j0:00:0) and G(^j48), the posterior distribution of the number of components is shown in column 4 in Table 7.1. The predictive density functions of the signal is plotted by the dotted line in Figure 7.1(d). The solid line plots the true density function of signal mixture.

In order to follow the general approach of this chapter, the prior information for all parameters is given byvG(v^j10=210=2), ⁱ G( ⁱ^j00), for i = 0 1.

⁰ G(⁰^j48) and ¹ G(¹^j412). We performed simulation studies com-paring results of mixtures of mixtures of modeling and the true distribution.

Although we use dierent prior information for mixtures of modeling and mix-tures of mixmix-tures of modeling, comparisons to the true distribution are also provided where available. We chose the initial value for ⁱ as i^th noise data, ^j as the dierence of j^th signal data and noise data. For each analysis, the num-ber of burn-in cycle was 2,000 and Monte Carlo sample size used was 8,000.

The convergence of the Gibbs sampler was monitored by using the method of Chapter 3.

105

-4 -2 0 2 4

0.0 0.05 0.10 0.15 0.20 0.25

(a) p(x|noise, signal)

0 5 10

0.0 0.05 0.10 0.15 0.20 0.25

(b) p(y|noise, signal)

-4 -2 0 2 4

0.0 0.05 0.10 0.15 0.20 0.25

0 5 10

0.0 0.05 0.10 0.15 0.20 0.25

(d) pdfys of signal

Figure 7.1: Simulation Analysis

|| true density functions for plot (c) and (d)

... predictive density functions of mixture of mixture method - - - predictive density functions of method of Chapter 5.

106 The posterior distributions for k⁰ and k¹ are given by columns 3 and 4 in Table 7.1. It is clear that there are two components for the noise data. The posterior mode for k¹ is 2. The chance for more than two components for signal data is about 0.378. However, if we use a mixture analysis presented in Chapter 5, column 4 in Table 7.1 shows that the posterior mode for k¹ is about 4. Figure 7.1 shows the predictive density functions by using dierent models. As can be seen from the results, the model of mixturesof noise is better than the model of a single normal noise. The predictive density functions of noise and signal based on mixtures of mixtures of modeling are much closer to their true density functions, respectively. Figure 7.1(a) and 7.1(b) show how the predictive density functions from mixtures of mixtures modeling match up their histograms. This simulated data is very clearly bimodal in noise and signal. The goal of such a test was really just to test the program and theory.

In document BAYESIAN NONPARAMETRIC. Guoliang Cao. Duke University. Mike West, Advisor. Donald Burdick. Michael Lavine. Peter Muller. Dennis A. (Page 108-116)