EM algorithm of chapter 8 - Dynamic Path Analysis And Model Based Clustering Of Microarray Data

B.2 EM algorithm of chapter 8

B.2.1 E-step

We now derive the posterior densities f (w|γ, y, θ^(t)) and f (γ|y, θ^(t)) from section 8.2.

Let us first verify the density f (y, γ) by integrating out w from

f (y, w, γ)

where we recall A = {wj|w_cj = w_c⁰_j, all c, c⁰ ∈ C}. Fix j and sum over all possible val-ues of (w_cj)_c∈C ∈ {0, 1}^K, where K = #C. Let’s for ease of notation in the derivation assume that the names of the clusters are c = {1}, . . . , {K}, with methylation indicators w_1j, . . . , w_Kj respectively. Start by summing over w_1j = 0, 1:

w_Kj = 0, 1. It’s easy to see that we arrive at

Theorem B.2.1. Conditioning on the gene importance indicator, γ_j, the posterior den-sity ofwj is

Proof. From (B.1) and (B.2) we get

Lemma B.2.2. Conditioning on the gene importance indicator, γ_j, the posterior expec-tation ofw_cj, with respect to the densityf (w_j|γ_j, y_j, θ^(t)), is

E[wcj|γj, yj, θ^(t)] = γjτ_cj^(t)+ (1 − γj)ν_1j^(t).

Proof. Using the definitions in (B.5) and (B.6) and the posterior density in (B.4) we get

E[wcj|γj = 1, yj, θ^(t)] =

Theorem B.2.3. The posterior density of γ_j is

f (γ_j|y_j, θ^(t)) = η_j^(t)γj

1 − η^(t)_j 1−γj

, (B.7)

where we recall the definition of the posterior expectation ofγ_j

η_j^(t) = Eγ_j|y_j, θ⁽ⁿ⁾ = p^(t)f₁^(t)(yj)

p^(t)f₁^(t)(y_j) + (1 − p^(t))f₂^(t)(y_j), with

f₁^(t)(y_j) =Y

c∈C

π^(t)_1c Y

i∈c

φ(y_ij|µ^(t)_1i, σ^2(t)_1i ) + π_0c^(t)Y

i∈c

φ(y_ij|µ^(t)_2i, σ_2i^2(t)) , and

f₂^(t)(y_j) = π₁^(t)

i=1

φ(y_ij|α^(t)_1i, ς_1i^2(t)) + π₀^(t)

i=1

φ(y_ij|α^(t)_2i, ς_2i^2(t)).

Proof. From (B.2) and (B.3) we get

f (γ_j|y_j, θ^(t))

= f (yj, γ_j|θ^(t)) f (y_j|θ^(t))

= n

p^(t)f₁^(t)(yj) oγjn

1 − p^(t)f₂^(t)(yj) o1−γj

p^(t)f₁^(t)(y_j) + (1 − p^(t))f₂^(t)(y_j)

( p^(t)f₁^(t)(yj)

p^(t)f₁^(t)(yj) + (1 − p^(t))f₂^(t)(yj) )γj(

1 − p^(t)f₂^(t)(yj) p^(t)f₁^(t)(yj) + (1 − p^(t))f₂^(t)(yj)

)1−γj

which is that of (B.7).

We now have everything we need to formally derive the Q-function. Recall the complete data loglikelihood given in (8.8). We need to calculate E_w_j_,γ_j|yj[wcjγj], for all c, Ewj,γj|yj[(1 − γj)w1j], and Ewj,γj|yj[(1 − γj) log IA(wj)]. We get

Eγ_jw_cj

y_j, θ^(t)

= Eγ_jE[w_cj|γ_j, y_j, θ^(t)]

y_j, θ^(t)

= Eγj γ_jτ_cj^(t) + (1 − γ_j)ν_1j^(t)|yj, θ^(t)

= τ_cj^(t)Eγ_j|y_j, θ^(t)

= τ_cj^(t)η_j^(t),

since γ_j² = γ_j and γ_j(1 − γ_j) = 0.

E(1 − γ_j)w_cj

y_j, θ^(t)

= E(1 − γ_j)E[w_cj|γ_j, y_j, θ^(t)]

y_j, θ^(t)

= E(1 − γj) γ_jτ_cj^(t)+ (1 − γ_j)ν_1j^(t)|yj, θ^(t)

= ν_1j^(t)E(1 − γ_j)|y_j, θ^(t)

= ν_1j^(t)(1 − η_j^(t))

E(1 − γ_j) log I_A(w_j)

y_j, θ^(t)

= Pγj = 0|yj, θ^(t) · E(1 − γj) log IA(wj)

γj = 0, yj, θ^(t) +Pγ_j = 1|y_j, θ^(t) · E(1 − γ_j) log I_A(w_j)

γ_j = 1, y_j, θ^(t)

= Pγ_j = 0|y_j, θ^(t) · E log I_A(w_j)

γ_j = 0, y_j, θ^(t)

= Pγj = 0|yj, θ^(t) Z

IA(wj) ν_1j^(t)w1j

1 − ν_1j^(t)1−w1j

log IA(wj)dwj

= 0,

since I_A(w_j) log I_A(w_j) = 0 · (−∞) = 0 or I_A(w_j) log I_A(w_j) = 1 · 0 = 0.

Now plug into (8.8) and we arrive at the Q-function in (8.11)

In maximizing the Q-function in (B.8) we differentiate with respect to (p, π₁, (π_1c)_c∈C) and ((µ_1i, σ²_1i, µ_2i, σ²_2i)_i, (α_1i, ς_1i², α_2i, ς_2i²)_i), where

Differentiating with respect to p and setting to zero we get

which leads to

Differentiating with respect to π1c, for all c ∈ C, and setting to zero we get

Differentiating with respect to π₁ and setting to zero we get

Differentiating with respect to µ_1i, for i ∈ c, and setting to zero we get

Similarly if we differentiate with respect to µ_2i, for i ∈ c, and set to zero we get

µ^(t+1)_2i =

Now differentiating with respect to σ_1i², for i ∈ c, and setting to zero we get

Similarly, if we differentiate with respect to σ_2i², for i ∈ c, and set to zero we get

σ_2i^2(t+1)= all i, we get the updating formulas

α^(t+1)_1i =

BIBLIOGRAPHY

Aalen, O. O. (1980), “A model for non-parametric regression analysis of counting pro-cesses,” Lecture Notes in Statistics-2: Mathematical Statistics and Probability The-ory, (W. Klonecki, A. Kozek, and J. Rosinski, eds.), Springer Verlag: New York, pp.

1–25.

Banfield, J. D. and Raftery, A. E. (1993), “Model-Based Gaussian and Non-Gaussian Clustering,” Biometrics, 49, 803–821.

Baxter, J., Jones, R., Lin, M., and Olsen, J. (2004), “SLLN for Weighted Indepen-dent IIndepen-dentically Distributed Random Variables,” Journal of theoretical probability, 17, 165–181.

Benjamini, Y. and Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” Journal of the Royal Statistical Society, Series B, 57, 289–300.

Besag, J. G. (1974), “Spatial interaction and the statistical analysis of lattice systems (with discussion),” Journal of the Royal Statistical Society. B, 34, 192–236.

Booth, J. G., Casella, G., and Hobert, J. P. (2008), “Clustering using objective functions and stochastic search,” Journal of the Royal Statistical Society. Series B (Methodolog-ical), 70, 119–139.

Chandler, R. E. and Bate, S. (2007), “Inference for clustered data using the indepen-dence log-likelihood,” Biometrika, 94, 167–183.

Cheeseman, P. and Stutz, J. (1995), “Bayesian Classification (AutoClass): Theory and Results,” in Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G.

Piatesky-Shapiro, P. Smyth, and R. Uthurusamy, AAAI Press, 49, 153–180.

Demidenko, E. (2004), Mixed Models, Theory and Applications, Wiley Series in Proba-bility and Statistics.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum likelihood from incomplete data via the EM algorithm (with discussion),” Journal of the Royal Statis-tical Society, 39, 1–38.

Edwards, D. (2000), Introduction to Graphical Modelling, 2nd edition, Springer texts in Statistics.

Efron, B. (1988), “Logistic Regression, Survival Analysis, and the Kaplan-Meier Curve,” Journal of the American Statistical Association, 83, 414–425.

Figueroa, M. E., Reimers, M., Thompson, R. F., Ye, K., Li, Y., Selzer, R. R., Fridriksson, J., Paietta, E., Wiernik, P., Green, R. D., Greally, J. M., and Melnick, A. (2008),

“An Integrative Genomic and Epigenomic Approach for the Study of Transcriptional Regulation,” PLoS One, 3, e1882.

Figueroa, M. E., Wouters, B. J., and Skrabanek, L. (2009), “Genome-wide epigenetic analysis delineates a biologically distinct immature acute leukemia with myeloid/T-lymphoid features.” Journal of the American Statistical Association, 113, 2795–2804.

Figueroa, M. E., Lugthart, S., Li, Y., Erpelinck-Verschueren, C., Deng, X., Christos, P. J., Schifano, E., Booth, J., van Putten, W., Skrabanek, L., Campagne, F., Mazum-dar, M., Greally, J. M., Valk, P. J., Lowenberg, B., Delwelsend, R., and Melnick, A.

(2010), “Epigenetic signatures identify biologically distinct subtypes in acute myeloid leukemia,” Cancer Cell, 17.

Fosen, J., Ferkingstad, E., Borgan, O., and Aalen, O. O. (2006), “Dynamic path analysis - a new approach to analyzing time-dependent covariates.” Lifetime Data Analysis, 12, 143–167.

Fraley, C. and Raftery, A. E. (1998), “How Many clusters? Which Clustering Method?

Answers via Model-Based Cluster Analysis,” The Computer Journal, 41, 578–588.

Fraley, C. and Raftery, A. E. (2002), “Model-Based Clustering, Discriminant Analysis and Density Estimation,” Journal of the American Statistical Association, 97, 611–

631.

Friedman, J. H. and Meulman, J. J. (2003), “Clustering Objects on Subsets of At-tributes.” technical report. Stanford University. Dept. of Statistics and Stanford Linear Accelerator Center.

Ghosh, D. and Chinnaiyan, A. M. (2002), “Mixture Modelling of Gene Expression Data From Microarray Experiments.” Bioinformatics, 18, 275–286.

Gueorguieva, R. V. and Agresti, A. (2001), “A correlated probit model for multivariate repeated measures of mixtures of binary and con- tinuous responses,” Journal of the American Statistical Association, 96, 1102–1112.

Heard, N. A., Holmes, C. C., and Stephens, D. A. (2006), “A quantitative study of gene regulation involved in the immune response of Anopheline mosquitoes: an applica-tion of Bayesian hierarchical clustering of curves.” Journal of the American Statistical Association, 101, 18–29.

Kalbfleisch, J. D. and Prentice, R. L. (2002), The Statistical Analysis of Failure Time Data, Wiley Series in Probability and Statistics.

Kim, S., Tadesse, M. G., and Vannucci, M. (2006), “Variable selection in clustering via Dirichlet process mixture models,” Biometrika, 93, 877–893.

Lauritzen, S. L. (1996), Graphical Models, Clarendon, Oxford.

Lindsay, B. G. (1988), “Composite likelihood methods,” Contemporary Mathematics, 80, 221–240.

McLachlan, G. and Peel, D. (2000), Finite Mixture Models, Wiley.

McLachlan, G. J. and Basford, K. E. (1988), Mixture Models: Inference and Applica-tions to Clustering, New York: Marcel Dekker.

McLachlan, G. J., Bean, R. W., and Peel, D. (2002), “A Mixture Model-Based Approach to the Clustering of Microarray Expression Data.” Bioinformatics, 18, 413–422.

Ruppert, D., Wand, M., and Carroll, R. (2003), Semiparametric Regression, Cambridge University Press.

Tadesse, M. G., Sha, N., and Vannucci, M. (2005), “Bayesian variable selection in clus-tering high-dimensional data,” Journal of the American Statistical Association, 100, 602.

In document Dynamic Path Analysis And Model Based Clustering Of Microarray Data (Page 169-181)