Bayesian Analysis of Mixtures of Mixtures
7.3 Computational Methods
Using the above model, we have to nd a way to calculate the posterior dis-tributions for some interesting parameters and predictive density functions of noise and signal data. It is impossible to get analytical results, so a Monte Carlo approximation is employed. Denoting
(0 = (1:::n0) v0 = (v1:::vn0)
(1 = (n0+1:::n) v1 = (vn0+1:::vn)
( = ((0(1) v = (v0v1)
= (1:::n1):
Similar to Chapter 3, a Gibbs sampler can be implementedby noticing some dierences between those models. We have the following information from the current model:
99 (a) known data: noise data X and signal data Y .
(b) unknown data: noise data xj+n0 for j = 1:::n1.
(c) conditional distributions for noise data: (xijivi) N(xijivi) for i = 1:::n and (yijjn0+jvn0+j) N(yijj+n0+jvn0+j) forj = 1:::n1. (d) Dirichlet processes: (1v1):::(nvn) is a sample of size n from
D(0G0()) and1:::n1 is a sample of size n1 from D(1G1()).
Our main goals here are to obtain the predictive density functions for noise data and signal data, the distributions of the number of components for noise data and signal data, and a mixture deconvolution of the predictive density function for the signal data.
Firstly let us assume that 0 1m0m101 are all known. In order to make inferences about the noise data, the posterior samples of ((v) are necessary. Conditional on = (1:::n1), xj+n0 = yj ;j is known for j = 1:::n1. Now we have data X and data (xn0+1:::xn) with a sample size of n such that xi is conditional independent distributed with a normal distribution with mean i and variance vi i = 1:::n. Using the ideas of Chapter 3, we can obtain the posterior samples of ((v).
To completeGibbs sampling, we mustsimulate(j(VXY ). Given((V ),
is independent of X, so (j(vXY ) = (j(1v1Y ). Then given j and vn0+j, (yj;n0+j)'s are conditional independent. yj;n0+j is from the normal distribution with mean j and variance vn0+j where j is from D(1G1()).
Again, this is the framework discussed in Chapter 3. Now, the Gibbs sampler is completed by drawing from (j(VXY ):
However, some modications are needed in order to obtain the predictive
100 density functions for the noise data and the signal data. Suppose k0 k1 are the number of components for noise and signal data, respectively. There are k0 distinct values (jvj) among ((v). Following Chapter 3, we have the predictivedistribution of noisex by Monte Carlo average p(xj(vm0 0XY ), namely
f(xj(vm0 0XY ) = 00+nR N(xjm0 0+z)IG(zjs0=2V0=2)dz +01
+n P
k
0
j=1n0jN(xjjvj)
wheren0j is the number of elements of ((v) with value (jvj). As we said in Chapter 3, there is no conjugate posterior distribution forz. The integration can be replaced by the Monte Carlo method. For every Gibbs sampler cycle, sampling z from the prior distribution IG(zjs0=2V0=2), we can calculate the normal density function at the sampled value z. Here, we should use a proper prior of v in order to draw prior samples of v.
By assuming the independence of D(0G0()) and D(1G1()), we have f(n1+1n+1 vn+1jv) = f(n1+1j) f(n+1 vn+1jv): (7:1) Once we have the posterior samples of and other parameters, the predic-tive density function of yn1+1 can be obtained by Monte Carlo average:
101
f(yn1+1j(vXY ) =
(0+n)(1 1+n1)# 01d0+0Xk1
j=1
n1jdj +1Xk0
i=1
n0iN(yn1+1jm1+ivi + 1) +Xk0
i=1 k1
X
j=1
n0in1jN(yn1+1ji +jvi)]
where d0 =R N(yn1+1jm0+m1z + 0+ 1)IG(zjs0=2V0=2)dz,
dj =R N(yn1+1jj +m0 0+z) IG(zjs0=2V0=2)dz, n1j is the number of ele-mentsof with value j andk1is the numberof distinct valuesj among. We can verify above equation by using the equation (7.1) very easily. The rst two terms of the expression can be obtained by samplingz from IG(zjs0=2V0=2) and calculating N(yn1+1jm0+m1z + 0+ 1) and N(yn1+1jj +m0 0 +z).
Secondly, we can sample these hyperparameters 0 1m0m101 if they are unknown. Suppose the prior distributions of these hyperparameters are given in the previous section, the following distributions are relevant:
0 IG( 0j(t0+k0;1)=2(R0+S0)=2)
(m0j 0) N(m0j 0=k0)
1 IG( 1j(t1+k1;1)=2(R1+S1)=2)
(m1j 1) N(m1j 1=k1)
where S0 = Pk0j=1(j ; )2, = Pk0j=1j=k0, S1 = Pk1j=1(j ; )2 and =
P
k
1
j=1j=k1.
102 We can also sample 0 and 1 from equation (3.8) by using
(z0j0) Beta(0+ 1n)
(0jz0k0) wz 0G(0ja0+k0b0;log(z0))
+(1;wz 0)G(0ja0+k0;1b0;log(z0))
(z1j1) Beta(1+ 1n1)
(1jz1k1) wz 1G(1ja1+k1b1;log(z1))
+(1;wz 1)G(1ja1+k1;1b1;log(z1))
with weights wz 0 and wz 1 dened by wz 0
1;wz 0 = a0+k0 ;1 n (b0;log(z0))
wz 1
1;wz 1 = a1+k1;1 n1 (b1;log(z1)):
Now, a complete Gibbs sampler can be accomplished for unknown hyperpa-rameters m. A simulation analysis is given in the next section.
7.4 Simulations
We discuss some numericalaspects and present the results of a small simulation study to show the eects of mixtures of mixtures of modeling. Since two dierent models, for example, the standard normal distributionN(yj01) and a mixtureof two normal distributions 0:5N(yj;0:51)+N(yj0:51), may produce the same random samples, we do not use random samples. We analyze the empirical quantiles which represent the distribution quite well. The empirical quantilesz1:::zn can be dened by F(zi) =i=(n + 1) for i = 1:::n where
103 F() is the cumulative distribution associated with the quantiles. Even though the empirical quantiles have the same drawback, they are better than random samples from the distribution. It is insightful to consider rst the empirical quantiles of each component for the mixture model. For example, quantiles of a normal distribution are available from many sources. We can use the quantiles of each normal distribution to obtain the quantiles of mixture of normal distributions. Generally, we can verify the following properties of a mixture of two distributions.
Suppose the mixture consists of two components. the p-quantile of i-th component is zi for i = 1 2. Then the p-quantile of the mixture is between z1 and z2. Using those z1 and z2 as initial research points, we can use usual numerical methods, say, the bisection method, to obtain thep-quantile of F(z).
150 noise data points were computed from the mixture of two normal dis-tributions 0:5N(xj;1:51)+0:5N(xj1:51). The histogram of simulated noise data is shown in Figure 7.1(a).
At this time, we treat the noise data as `Y' in the model of Chapter 5, using following prior distributions: v IG(vj10=210=2), IG( j0:00:0) and G(j48) where the notations v are dened in Chapter 5. Then the posterior distribution of the number of components is given by column 2 in Table 7.1. The predictive density function of the data, displayed by the dotted line, is presented in Figure 7.1(c). The solid line plots the true density function of above mixture.
Now we can sample from two discrete states 0 and 10, with probabilities 0.5 and 0.5, respectively. The histogram of signal is plotted in Figure 7.1(b).
If we treat the signal alone as data `Y' in the model of Chapter 5, given the
104 Table 7.1: Posterior Distributions of the Number of Components
noise signal
k Mixture Mixture of Mixture Mixture Mixture of Mixture
1 0.0000 0.00000 0.00000 0.00000
2 0.9889 0.99010 0.00000 0.62177
3 0.0092 0.00777 0.00000 0.27919
4 0.0016 0.00191 0.92732 0.07756
5 0.0003 0.00016 0.05876 0.01771
6 0.01025 0.00324
7 0.00296 0.00049
8 0.00065 0.00004
9 0.00005
prior information v IG(vj10=210=2), IG( j0:00:0) and G(j48), the posterior distribution of the number of components is shown in column 4 in Table 7.1. The predictive density functions of the signal is plotted by the dotted line in Figure 7.1(d). The solid line plots the true density function of signal mixture.
In order to follow the general approach of this chapter, the prior information for all parameters is given byv G(vj10=210=2), i G( ij00), for i = 0 1.
0 G(0j48) and 1 G(1j412). We performed simulation studies com-paring results of mixtures of mixtures of modeling and the true distribution.
Although we use dierent prior information for mixtures of modeling and mix-tures of mixmix-tures of modeling, comparisons to the true distribution are also provided where available. We chose the initial value for i as ith noise data, j as the dierence of jth signal data and noise data. For each analysis, the num-ber of burn-in cycle was 2,000 and Monte Carlo sample size used was 8,000.
The convergence of the Gibbs sampler was monitored by using the method of Chapter 3.
105
**
-4 -2 0 2 4
0.0 0.05 0.10 0.15 0.20 0.25
(a) p(x|noise, signal)
**
0 5 10
0.0 0.05 0.10 0.15 0.20 0.25
(b) p(y|noise, signal)
-4 -2 0 2 4
0.0 0.05 0.10 0.15 0.20 0.25
(c) pdfys of noise
0 5 10
0.0 0.05 0.10 0.15 0.20 0.25
(d) pdfys of signal
Figure 7.1: Simulation Analysis
|| true density functions for plot (c) and (d)
... predictive density functions of mixture of mixture method - - - predictive density functions of method of Chapter 5.
106 The posterior distributions for k0 and k1 are given by columns 3 and 4 in Table 7.1. It is clear that there are two components for the noise data. The posterior mode for k1 is 2. The chance for more than two components for signal data is about 0.378. However, if we use a mixture analysis presented in Chapter 5, column 4 in Table 7.1 shows that the posterior mode for k1 is about 4. Figure 7.1 shows the predictive density functions by using dierent models. As can be seen from the results, the model of mixturesof noise is better than the model of a single normal noise. The predictive density functions of noise and signal based on mixtures of mixtures of modeling are much closer to their true density functions, respectively. Figure 7.1(a) and 7.1(b) show how the predictive density functions from mixtures of mixtures modeling match up their histograms. This simulated data is very clearly bimodal in noise and signal. The goal of such a test was really just to test the program and theory.