Prior specification - Statistical Method - Bayesian Modeling of Latent Heterogeneity in Complex

3.2 Statistical Method

3.2.4 Prior specification

To complete the Bayesian model specification, I assign prior distributions to all of the parameters. For each parameter, I use the same prior distribution across mixture components.

In the latent class membership model, I follow previous research [Garrett and Zeger, 2000; Elliott et al., 2005] by assigning the probit regression coefficients δkin (3.2) a prior distribu-

tion M V Ns(0, I). On the probability scale, this prior distribution yields a non-informative

prior on the probability of latent class membership, with its mode at approximately _K1. In the longitudinal health outcomes model (3.4), I assign the latent-class specific regression coefficients βrk a diffuse prior distribution of the form M V Np(0, Σβ), where Σβ

is a diagonal variance-covariance with some large variance. I assign the observation-level variance-covariance Σkan inverse-Wishart prior distribution IW (νΣ, S−1_Σ ), where νΣ is the

degrees of freedom and S−1_Σ is a positive definite matrix. In (3.5), for the hierarchical variance-covariance of the patient-specific random effects Ψk, I assign each of the con-

stituent variance-covariances Ψkr an inverse-Wishart prior distribution.

Like the model of longitudinal health outcomes, for the visit process model in (3.6) and (3.7) and the response process model in (3.8) and (3.9), I use diffuse normal prior distributions on the latent class-specific regression coefficients φk and λrk, and inverse-

3.2.5 Posterior computation

Let y_iA(l)= (y_1iA(l), . . . , y_RiA(l))T, and βk= (β1kT , . . . , βRkT )T. Assuming prior independence,

I specify the joint posterior distribution as

where for notational simplicity, the design matrices in the conditional densities for dij,

yiA(l), and mriA(l)are suppressed.

For posterior computation, I propose an MCMC algorithm that uses easily sampled closed-form full conditionals. After assigning initial values to model parameters, the algorithm iterates among the following steps:

1. For k = 1, . . . , K − 1, update δk for the latent class membership model in (3.2).

Compute πik for k = 1, . . . , K in (3.1).

2. For k = 1, . . . , K, update parameters for the longitudinal health outcomes model in (3.4) and (3.5), including βrk, bri, Σk, and Ψk.

3. For k = 1, . . . , K, update parameters for the visit process model in (3.6) and (3.7), including φk, τi, and Ωk.

4. For k = 1, . . . , K, update parameters for the model of the response process given a clinic visit in (3.8) and (3.9), including λrk, κri, and Θrk.

5. Sample latent class indicators ci for i = 1, . . . , n from M ultinomial(1; pi1, . . . , piK),

where pi1, . . . , piK are the posterior probabilities of latent class assignment given by

where y_i∗= (y_iA(1)T , . . . , y_iA(nT

i)), and Σ

∗

k is an niR × niR block diagonal matrix with

elements Σk (R × R) for each yiA(l)(l = 1, . . . , ni).

The full MCMC algorithm is detailed in Appendix D.

3.2.6 Model selection

I use model selection as a tool to guide sensitivity analysis about missing data assumptions. First, I select the optimal number of latent classes among models with the same assumed missing data mechanism. Then, assuming each of the selected number of latent classes, I fit models varying the missing data assumptions. This approach enables investigating the sensitivity of statistical inferences to missing data assumptions given the selected number of latent classes.

To conduct model selection, I use two model information criteria and a graphical tech- nique known as latent class identifiability displays (LCIDs) [Garrett and Zeger, 2000]. The model information criteria include the Bayesian Information Criterion (BIC) [Schwarz, 1978], and a modified version of the Deviance Information Criterion (DIC) [Spiegelhal- ter et al., 2002] known as the DIC3 [Celeux et al., 2006]. I calculate the BIC using the

marginal density of y∗_i, di, m1i, . . . , mRi after integrating out latent class membership ci

and the random effects bi, τi, κ1i, . . . , κRi for each of the outcomes, given by

where I can analytically compute only the integral for the longitudinal health outcomes y∗_i. I estimate the integrals for the visit process di and response process given a clinic visit

m1i, . . . , mRi using numerical integration. I then define the BIC as

BIC =

i=1

logf (y∗_i, di, m1i, . . . , mRi| ˆπ; ˆβ, ˆΣ, ˆΨ; ˆφ, ˆΩ; ˆλ, ˆΘ) + d logNeff,

where ˆπ, ˆβ, ˆΣ, ˆΨ, ˆφ, ˆΩ, ˆλ, ˆΘ are the Bayesian estimators of the unknown parameters; d is the number of free parameters in the mixture model; and Neff is the effective sample

size from the model of longitudinal health outcomes y∗_i estimated by accounting for the correlations among the longitudinal measurements belonging to same patient [Jones, 2011]. The first term is a measure of goodness of fit, and the second term provides a penalty for model complexity.

In Bayesian hierarchical models, the number of free parameters may be unclear. As the Bayesian analogue to the BIC, [Spiegelhalter et al., 2002] proposed the DIC in which the number of effective parameters is estimated. For some unknown parameter α, the DIC is computed as DIC = ¯D(α) + pD, where ¯D(α) is the posterior mean deviance estimated from

MCMC samples, and pD is the effective number of parameters taken as pD = ¯D(α) − D( ˆα).

at the posterior mean estimator of α. However, according to [Celeux et al., 2006], in finite mixture modeling, the posterior mean estimator often leads to a negative effective number of parameters. The authors recommend the DIC3, in which the posterior mean estimator is replaced by the estimator of the marginal density (3.11) obtained from MCMC samples. Analogous to the BIC, ¯D(α) is a measure of goodness of model fit, while pD is a penalty

for model complexity. Smaller values of BIC and DIC3 indicate a preferred model.

[Garrett and Zeger, 2000] propose using LCIDs to examine the extent to which the data are able to distinguish among the assumed number of latent classes. In LCIDs, plots of the prior versus posterior distributions for regression coefficients in the latent class membership model are examined. Largely overlapping prior and posterior distributions may suggest that the number of latent classes is too large given the data.

3.2.7 Model checking

For model checking under MAR or MNAR missing data mechanisms, previous research [Gelman et al., 2005] has recommended conducting Bayesian posterior predictive checking [Gelman et al., 1996] with completed datasets that include observed and imputed data, and replicates of the completed datasets drawn from the complete-data model in (3.1) - (3.5). At each MCMC iteration, a discrepancy measure is computed using the completed and replicated completed datasets. The Bayesian predictive p-value denotes the probability that the discrepancy measure under the replicated completed data is greater than that under the completed data, with p-values outside the range of 0.05 and 0.95 suggesting a lack of model fit. To examine overall model adequacy, I use the multivariate mean square error for my discrepancy measure [Daniels and Hogan, 2008], which for the complete-data

model in (3.1) - (3.5), I compute as T = K X k=1 n X i=1 ni X l=1

(y_iA(l)− µ_iA(l))Σ−1_k (y_iA(l)− µ_iA(l))T × 1_c_i_=k,

where µ_iA(l) = x_iA(l)βT

k + ziA(l)bTi . I generate the replicated completed dataset by first

sampling replicate latent class indicators crep_i and then drawing yrep_iA(l) conditional on crep_i [Fruhwirth-Schnatter, 2006].

In addition, for randomly selected datasets, I compare plots of the completed data with the replicated completed data to evaluate model fit and the reasonableness of the imputations [Gelman et al., 2005].

In document Bayesian Modeling of Latent Heterogeneity in Complex Survey Data and Electronic Health Records (Page 57-63)