• No results found

B.2 EM algorithm of chapter 8

B.2.1 E-step

We now derive the posterior densities f (w|γ, y, θ(t)) and f (γ|y, θ(t)) from section 8.2.

Let us first verify the density f (y, γ) by integrating out w from

f (y, w, γ)

where we recall A = {wj|wcj = wc0j, all c, c0 ∈ C}. Fix j and sum over all possible val-ues of (wcj)c∈C ∈ {0, 1}K, where K = #C. Let’s for ease of notation in the derivation assume that the names of the clusters are c = {1}, . . . , {K}, with methylation indicators w1j, . . . , wKj respectively. Start by summing over w1j = 0, 1:

wKj = 0, 1. It’s easy to see that we arrive at

Theorem B.2.1. Conditioning on the gene importance indicator, γj, the posterior den-sity ofwj is

Proof. From (B.1) and (B.2) we get

Lemma B.2.2. Conditioning on the gene importance indicator, γj, the posterior expec-tation ofwcj, with respect to the densityf (wjj, yj, θ(t)), is

E[wcjj, yj, θ(t)] = γjτcj(t)+ (1 − γj1j(t).

Proof. Using the definitions in (B.5) and (B.6) and the posterior density in (B.4) we get

E[wcjj = 1, yj, θ(t)] =

Theorem B.2.3. The posterior density of γj is

f (γj|yj, θ(t)) = ηj(t)γj

1 − η(t)j 1−γj

, (B.7)

where we recall the definition of the posterior expectation ofγj

ηj(t) = Eγj|yj, θ(n) = p(t)f1(t)(yj)

p(t)f1(t)(yj) + (1 − p(t))f2(t)(yj), with

f1(t)(yj) =Y

c∈C



π(t)1c Y

i∈c

φ(yij(t)1i, σ2(t)1i ) + π0c(t)Y

i∈c

φ(yij(t)2i, σ2i2(t)) , and

f2(t)(yj) = π1(t)

n

Y

i=1

φ(yij(t)1i, ς1i2(t)) + π0(t)

n

Y

i=1

φ(yij(t)2i, ς2i2(t)).

Proof. From (B.2) and (B.3) we get

f (γj|yj, θ(t))

= f (yj, γj(t)) f (yj(t))

= n

p(t)f1(t)(yj) oγjn

1 − p(t)f2(t)(yj) o1−γj

p(t)f1(t)(yj) + (1 − p(t))f2(t)(yj)

=

( p(t)f1(t)(yj)

p(t)f1(t)(yj) + (1 − p(t))f2(t)(yj) )γj(

1 − p(t)f2(t)(yj) p(t)f1(t)(yj) + (1 − p(t))f2(t)(yj)

)1−γj

,

which is that of (B.7).

We now have everything we need to formally derive the Q-function. Recall the complete data loglikelihood given in (8.8). We need to calculate Ewjj|yj[wcjγj], for all c, Ewjj|yj[(1 − γj)w1j], and Ewjj|yj[(1 − γj) log IA(wj)]. We get

1.

Eγjwcj

yj, θ(t)

= EγjE[wcjj, yj, θ(t)]

yj, θ(t)

= Eγj γjτcj(t) + (1 − γj1j(t)|yj, θ(t)

= τcj(t)Eγj|yj, θ(t)

= τcj(t)ηj(t),

since γj2 = γj and γj(1 − γj) = 0.

2.

E(1 − γj)wcj

yj, θ(t)

= E(1 − γj)E[wcjj, yj, θ(t)]

yj, θ(t)

= E(1 − γj) γjτcj(t)+ (1 − γj1j(t)|yj, θ(t)

= ν1j(t)E(1 − γj)|yj, θ(t)

= ν1j(t)(1 − ηj(t))

3.

E(1 − γj) log IA(wj)

yj, θ(t)

= Pγj = 0|yj, θ(t) · E(1 − γj) log IA(wj)

γj = 0, yj, θ(t) +Pγj = 1|yj, θ(t) · E(1 − γj) log IA(wj)

γj = 1, yj, θ(t)

= Pγj = 0|yj, θ(t) · E log IA(wj)

γj = 0, yj, θ(t)

= Pγj = 0|yj, θ(t) Z

IA(wj) ν1j(t)w1j

1 − ν1j(t)1−w1j

log IA(wj)dwj

= 0,

since IA(wj) log IA(wj) = 0 · (−∞) = 0 or IA(wj) log IA(wj) = 1 · 0 = 0.

Now plug into (8.8) and we arrive at the Q-function in (8.11)

In maximizing the Q-function in (B.8) we differentiate with respect to (p, π1, (π1c)c∈C) and ((µ1i, σ21i, µ2i, σ22i)i, (α1i, ς1i2, α2i, ς2i2)i), where

Differentiating with respect to p and setting to zero we get

G

which leads to

Differentiating with respect to π1c, for all c ∈ C, and setting to zero we get

G

Differentiating with respect to π1 and setting to zero we get

G

Differentiating with respect to µ1i, for i ∈ c, and setting to zero we get

G

Similarly if we differentiate with respect to µ2i, for i ∈ c, and set to zero we get

µ(t+1)2i =

Now differentiating with respect to σ1i2, for i ∈ c, and setting to zero we get

Similarly, if we differentiate with respect to σ2i2, for i ∈ c, and set to zero we get

σ2i2(t+1)= all i, we get the updating formulas

α(t+1)1i =

BIBLIOGRAPHY

Aalen, O. O. (1980), “A model for non-parametric regression analysis of counting pro-cesses,” Lecture Notes in Statistics-2: Mathematical Statistics and Probability The-ory, (W. Klonecki, A. Kozek, and J. Rosinski, eds.), Springer Verlag: New York, pp.

1–25.

Banfield, J. D. and Raftery, A. E. (1993), “Model-Based Gaussian and Non-Gaussian Clustering,” Biometrics, 49, 803–821.

Baxter, J., Jones, R., Lin, M., and Olsen, J. (2004), “SLLN for Weighted Indepen-dent IIndepen-dentically Distributed Random Variables,” Journal of theoretical probability, 17, 165–181.

Benjamini, Y. and Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” Journal of the Royal Statistical Society, Series B, 57, 289–300.

Besag, J. G. (1974), “Spatial interaction and the statistical analysis of lattice systems (with discussion),” Journal of the Royal Statistical Society. B, 34, 192–236.

Booth, J. G., Casella, G., and Hobert, J. P. (2008), “Clustering using objective functions and stochastic search,” Journal of the Royal Statistical Society. Series B (Methodolog-ical), 70, 119–139.

Chandler, R. E. and Bate, S. (2007), “Inference for clustered data using the indepen-dence log-likelihood,” Biometrika, 94, 167–183.

Cheeseman, P. and Stutz, J. (1995), “Bayesian Classification (AutoClass): Theory and Results,” in Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G.

Piatesky-Shapiro, P. Smyth, and R. Uthurusamy, AAAI Press, 49, 153–180.

Demidenko, E. (2004), Mixed Models, Theory and Applications, Wiley Series in Proba-bility and Statistics.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum likelihood from incomplete data via the EM algorithm (with discussion),” Journal of the Royal Statis-tical Society, 39, 1–38.

Edwards, D. (2000), Introduction to Graphical Modelling, 2nd edition, Springer texts in Statistics.

Efron, B. (1988), “Logistic Regression, Survival Analysis, and the Kaplan-Meier Curve,” Journal of the American Statistical Association, 83, 414–425.

Figueroa, M. E., Reimers, M., Thompson, R. F., Ye, K., Li, Y., Selzer, R. R., Fridriksson, J., Paietta, E., Wiernik, P., Green, R. D., Greally, J. M., and Melnick, A. (2008),

“An Integrative Genomic and Epigenomic Approach for the Study of Transcriptional Regulation,” PLoS One, 3, e1882.

Figueroa, M. E., Wouters, B. J., and Skrabanek, L. (2009), “Genome-wide epigenetic analysis delineates a biologically distinct immature acute leukemia with myeloid/T-lymphoid features.” Journal of the American Statistical Association, 113, 2795–2804.

Figueroa, M. E., Lugthart, S., Li, Y., Erpelinck-Verschueren, C., Deng, X., Christos, P. J., Schifano, E., Booth, J., van Putten, W., Skrabanek, L., Campagne, F., Mazum-dar, M., Greally, J. M., Valk, P. J., Lowenberg, B., Delwelsend, R., and Melnick, A.

(2010), “Epigenetic signatures identify biologically distinct subtypes in acute myeloid leukemia,” Cancer Cell, 17.

Fosen, J., Ferkingstad, E., Borgan, O., and Aalen, O. O. (2006), “Dynamic path analysis - a new approach to analyzing time-dependent covariates.” Lifetime Data Analysis, 12, 143–167.

Fraley, C. and Raftery, A. E. (1998), “How Many clusters? Which Clustering Method?

Answers via Model-Based Cluster Analysis,” The Computer Journal, 41, 578–588.

Fraley, C. and Raftery, A. E. (2002), “Model-Based Clustering, Discriminant Analysis and Density Estimation,” Journal of the American Statistical Association, 97, 611–

631.

Friedman, J. H. and Meulman, J. J. (2003), “Clustering Objects on Subsets of At-tributes.” technical report. Stanford University. Dept. of Statistics and Stanford Linear Accelerator Center.

Ghosh, D. and Chinnaiyan, A. M. (2002), “Mixture Modelling of Gene Expression Data From Microarray Experiments.” Bioinformatics, 18, 275–286.

Gueorguieva, R. V. and Agresti, A. (2001), “A correlated probit model for multivariate repeated measures of mixtures of binary and con- tinuous responses,” Journal of the American Statistical Association, 96, 1102–1112.

Heard, N. A., Holmes, C. C., and Stephens, D. A. (2006), “A quantitative study of gene regulation involved in the immune response of Anopheline mosquitoes: an applica-tion of Bayesian hierarchical clustering of curves.” Journal of the American Statistical Association, 101, 18–29.

Kalbfleisch, J. D. and Prentice, R. L. (2002), The Statistical Analysis of Failure Time Data, Wiley Series in Probability and Statistics.

Kim, S., Tadesse, M. G., and Vannucci, M. (2006), “Variable selection in clustering via Dirichlet process mixture models,” Biometrika, 93, 877–893.

Lauritzen, S. L. (1996), Graphical Models, Clarendon, Oxford.

Lindsay, B. G. (1988), “Composite likelihood methods,” Contemporary Mathematics, 80, 221–240.

McLachlan, G. and Peel, D. (2000), Finite Mixture Models, Wiley.

McLachlan, G. J. and Basford, K. E. (1988), Mixture Models: Inference and Applica-tions to Clustering, New York: Marcel Dekker.

McLachlan, G. J., Bean, R. W., and Peel, D. (2002), “A Mixture Model-Based Approach to the Clustering of Microarray Expression Data.” Bioinformatics, 18, 413–422.

Ruppert, D., Wand, M., and Carroll, R. (2003), Semiparametric Regression, Cambridge University Press.

Tadesse, M. G., Sha, N., and Vannucci, M. (2005), “Bayesian variable selection in clus-tering high-dimensional data,” Journal of the American Statistical Association, 100, 602.

Related documents