Comparison between Gaussian Joint Model and Log-Gamma Joint Model

5.3 Model Comparison

5.3.1 Comparison between Gaussian Joint Model and Log-Gamma Joint Model

Section 5.1 and Section 5.2 contain simulation studies for the Gaussian joint model and the log-gamma joint model. In these illustrations, we use the Gaussian joint model to fit Gaussian distributed simulation data and use the log-gamma model to fit log-gamma distributed simulation data. In this section, we implement both joint models using both the Gaussian distributed data generated according to Section 5.1 (Case 1, 2, & 3) and the log-gamma distributed data generated according to Section 5.2 (Case 1, 2, & 3).We compare the estimates at individual and cluster levels.

(1) MSE at individual level

To quantify the accuracy of the Gaussian joint model and the log-gamma joint model at individ-ual level, we computed the mean squared error (MSE) of the estimated longitudinal measurements and the MSE of the estimated hazard rate of all subjects and all time points. We simulate 50 replicates (50 according to Section 5.1.1 and 50 according to Section 5.2.1) and compare the MSEs in Table 5.3. The log-gamma joint models have slightly larger longitudinal MSE than the Gaussian joint models, but have much smaller MSE of hazard rate. Thus, the Gaussian joint models are only slightly better in estimating longitudinal trajectories than the log-gamma joint model, but the log-gamma joint model is much better in estimating hazard than the Gaussian joint model. These conclustions are consistent for both Gaussian and log-gamma distributed data.

(2) Clustering evaluation

We also compare the estimation accuracy of the Gaussian joint model and the log-gamma joint model at cluster level. We start this discussion by evaluating the cluster performance of both joint models. In every MCMC iteration, we update the cluster indicator for each subject, and as such, we obtain a total number of clusters at each iteration. The posterior mode of the number of clusters is computed using the last 3,000 iterations, and can be interpreted as an estimate of the number of clusters. Table 5.5 shows the mean and standard error of those modes for 50 replicate generations of the data. In most cases, both the Gaussian and log-gamma joint models overestimate the number of clusters. However, the estimates from the log-gamma joint model are closer to the truth than the estimates from the Gaussian joint models. The standard errors of the estimated number of clusters from the log-gamma joint model are also consistently smaller than the Gaussian joint model.

We also evaluate the cluster of performance using the adjusted Rand index. The Rand index is a measure of the similarity between two different clusterings of the same set of data. It essentially considers how each pair of data is assigned in each clustering. There are two cases that represent a similarity between the clusterings. The first case occers when the elements of the paired data are assigned together in a cluster in each of the two clusterings, and the second case occurs when they are placed into different clusters in both clustering. Difference between clusterings occurs when the elements of the paired data are assigned in the same cluster in one clustering but in different clusters in the other. When the true cluster labels are known, the similarity between the estimated clustering and the truth could be a measure of the accuracy of the estimation. From this, Rand [31]

proposed a measure of the similarity between two clusterings of the same data, which is defined as the number of similar assignments of paired data normalized by the total number of pairs. Given a set of N objects S = {o1, o2, . . . , oN}, define the two clusterings with C¹ = {C₁¹, C₂¹, . . . , C_k¹

1} and C² = {C₁², C₂², . . . , C_k²

2}, where C¹ partitions S into k1 subsets and C² partitions S into k2 subsets.

The Rand index is formally defined as

RI =

i,jr_ij can be considered as the number of agreements between clustering C¹ and C², and the Rand index represents the frequency of occurrence of agreements over the total pairs.

The Rand index ranges between 0 and 1, with 0 indicating that the two clusterings have no similarity (i.e., when one consists of only one cluster and the other has N cluster with single point in each), and 1 indicating that the clusterings are identical.

However, the Rand index makes no correction for chance. That is, we cannot tell whether a specific value of RI is large or small, because when cluster assignment is at random, the value of RI

is not zero. The non-adjusted Rand index implies a dependency between the number of clusters and the number of objects. Specifically, Morey and Agresti [27] stated that RI will increase to 1.0 as the number of clusters k1and k2increase. Thus, some corrections have been made to overcome this disadvantage. Hubert and Arabie [19] proposed an adjusted Rand index based on the assumption that the number of agreements rij has a generalized hypergeometric distribution. They provided the expectation of the Rand index as

E(RI) = 1 + 2

Then the adjusted Rand index (ARI) is given by,

ARI = RI − E(RI)

assuming a maximum Rand index of 1, i.e. max(RI) = 1 in (5.1).

In this dissertation, we use the ARI in (5.2) to evaluate the clustering of the joint models.

First we need to obtain a point estimate of the clustering, and then we can use the estimated cluster assignment to compute the ARI. Several methods have been proposed to obtain a point estimate of the clustering using draws from the posterior clustering distribution. For example, the maximum a posteriori (MAP) clustering and the least-squares model-based clustering (Dahl [9]). In our simulation study we used Dahl’s method, which selects one of the observed clusterings in the Markov chain as the point estimate. Specifically, the least-square clustering is one of the observed clustering in the last 3,000 iterations which minimizes the sum of squared deviations of its association matrix from the pairwise probability matrix (Dahl [9]). The mean ARI and the standard error of 50 replicates are shown in Table 5.5. The log-gamma joint models have higher ARI than the Gaussian joint models, which means the clustering from the log-gamma joint models is closer to the true clustering than the clustering from the Gaussian joint models. Therefore, we conclude that the log-gamma joint model appears to have better performance in terms of clustering and detecting subgroups among subjects than the Gaussian joint model.

Based on the cluster assignments, we can evaluate the estimation of longitudinal trajectory and hazard rate at cluster level. We selected the least-squares clustering and used the values in that iteration as the estimates of β, γ, α and λ at cluster level. Then we computed the estimated trajectories in each cluster using the estimated values of β. For the hazard rate, we compute the mean covariates in each cluster and the estimated hazard rate at those mean covariates. Figure 5.11 and 5.12 shows the results in Case 3 (three cluster simulation study with Gaussian distributed data) and Case 6 (three cluster simulation study with log-gamma distributed data). The red solid lines are the true trajectories and hazard rates in each cluster based on true cluster assignments.

The blue dash lines are the estimated trajectories and hazard rates from model GaussianMH, GaussianSS and LG. Both of the Gaussian and the log-gamma joint models estimate the cluster-specific trajectories and cluster-cluster-specific hazard well. The Gaussian joint models overestimate the number of clusters. The log-gamma joint model is more accurate at the cluster level in terms of the estimated number of clusters and cluster-specific hazard rate.

(3) Effective sample size

MCMC provides a way to sample from the full-conditional distributions for the parameters.

However, these samples are not independent, which could be an issue when making inferences on the parameters. Suppose there are N samples x1, x2, . . . , xn drawn from a distribution with mean µ and standard deviation σ. Then the population mean of this distribution can be estimated by the sample mean,

ˆ µ = 1

i=1

x_i. If the samples are independent, the variance of ˆµ is given by,

V ar(ˆµ) = σ² N.

In practice, the MCMC samples could be correlated, which implies that the variance of ˆµ is not equal to ^σ_N². Thi´ebaux and Zwiers [36] define the effective sample size (ESS) by equating the ensemble mean square of the time-averaged mean, say σ²_x_¯, to the standard formula for the variance of the mean of N_ess independent samples, that is

σ_x²_¯= σ² Ness

. (5.3)

Table 5.5: Clustering evaluation for the Gaussian joint model and the log-gamma joint model Case Data Distribution Joint Model Number of Cluster ARI (SE)

Truth Estimate (SE)

Case 1^a Gaussian GaussianMH 1 1.3 (0.45) 0.86 (0.35) GaussianSS 1 1.9 (0.78) 0.78 (0.42)

LG 1 1.1 (0.32) 0.9 (0.32)

Case 2^b Gaussian GaussianMH 2 2.3 (0.58) 0.89 (0.05) GaussianSS 2 2.4 (0.61) 0.90 (0.03)

LG 2 2.1 (0.32) 0.89 (0.04)

Case 3^c Gaussian GaussianMH 3 4.2 (1.05) 0.82 (0.06) GaussianSS 3 4.6 (0.90) 0.83 (0.04)

LG 3 3 (0) 0.85 (0.03)

Case 4^d LG GaussianMH 1 2.1 (0.99) 0.4 (0.52)

GaussianSS 1 2.2 (1.03) 0.7 (0.48)

LG 1 1.0 (0.14) 0.98 (0.14)

Case 5^e LG GaussianMH 2 2.7 (0.88) 0.88 (0.06)

GaussianSS 2 2.9 (0.90) 0.89 (0.03)

LG 2 2 (0) 0.91 (0.03)

Case 6^f LG GaussianMH 3 4.4 (0.80) 0.83 (0.05)

GaussianSS 3 4.8 (1.11) 0.82 (0.05)

LG 3 3 (0) 0.85 (0.06)

NOTE: Standard errors are in parentheses.

aCase 1. one cluster Gaussian simulation data with unique values of β (3, −1).

bCase 2. two cluster Gaussian simulation data with unique values of β (3, −1) and (2, −0.5).

cCase 3. three cluster Gaussian simulation data with unique values of β (3, −1), (3, −0.5) and (2, −0.5).

dCase 4. one cluster log-gamma simulation data with unique values of β (3, −1).

eCase 5. two cluster log-gamma simulation data with unique values of β (3, −1) and (2, −0.5).

fCase 6. three cluster log-gamma simulation data with unique values of β (3, −1), (3, −0.5) and (2, −0.5).

Table 5.6: Contingency table for the Rand index.

C¹ C² C₁² C₂² . . . C_k²₂ Sums C₁¹ r₁₁ r₁₂ . . . r_1k₂ r_1·

... ... ... ... ... C_k¹

1 r_k₁₁ r_k₁₂ . . . r_k₁_k₂ r_k₁·

Sums r·1 r·2 . . . r·k2 N

Figure 5.11: True vs. estimated longitudinal trajectories and hazard rates at cluster level in Case 3. The red solid lines are the true trajectories and hazard rates in each cluster based on true cluster assignments. The blue dash lines are the estimated trajectories and hazard rates. (a) and (b) are the results from model GaussianMH; (c) and (d) are the results from model GaussianSS; (e) and (f) are the results from model LG.

Figure 5.12: True vs. estimated longitudinal trajectories and hazard rates at cluster level in Case 6. The red solid lines are the true trajectories and hazard rates in each cluster based on true cluster assignments. The blue dash lines are the estimated trajectories and hazard rates. (a) and (b) are the results from model GaussianMH; (c) and (d) are the results from model GaussianSS; (e) and (f) are the results from model LG.

By Thi´ebaux and Zwiers [36], Laurmann and Gates [22] and following Anderson [2], we have the lag-correlation function ρ_v. Here

ρv(x) = 1

Hence, from (5.3) by equating σ²_x_¯ and σ²/N_ess we obtain a measure of the effective sample size

Ness= σ²

There are different ways to generate measure of the effective sample size (Thi´ebaux and Zwiers [36]). However, in this dissertation, we consider only the quantity defined by (5.6). Note that if the correlation is negative, the effective sample size could be larger than the actual sample size.

We should note that (5.4)-(5.6) are all estimates of the properties of the complete MCMC series, of which the x_i, i = 1, . . . , N are a part. Various approaches are proposed to estimate Ness (Thi´ebaux and Zwiers [36]). Thi´ebaux and Zwiers [36] discussed a method that considers estimating ESS from the power spectum of the observed sequence. For large sample size, they used an approximation for the variance of the sample mean

σ²_x_¯≈ 2πf_xx(0)/N,

where fxx(τ ) is the spectral density function of the observed times series with lag τ . An estimate of ESS is then given by,

Ness≈ N σ² 2πfxx(0).

In this dissertation we used the function effectiveSize in R package coda for implementation, which uses the estimate of the spectral density at frequency zero.

Table 5.7 shows the effective sample size in the last 3,000 iterations from model GaussianMH, model GaussianSS and the log-gamma joint model in one replicate in Cases 1-6. Model Gaus-sianMH has low effective size since rejections in the Metropolis-Hastings method, which implies

positive temporal correlations. The slice sampler effectively increased the effective sample size on β and α in Gaussian joint models. The log-gamma joint models have much higher effective size than the Gaussian joint models on β and γ. Model GaussianSS and the log-gamma joint model have relatively higher effective size on α than model GaussianMH. All programs ran on the High Performance Computing Cluster at Florida State University. For Case 3 with three cluster log-gamma distributed data of 300 subjects, it took about 3.9 hours for the log-log-gamma joint model and about 3.5 hours for the Gaussian joint model to run 5,000 MCMC iterations. For Case 6 with three cluster Gaussian distributed data of 300 subjects, it took about 3.9 hours for the log-gamma joint model and about 3.0 hours for the Gaussian joint model to run 5,000 MCMC iterations.

In document Florida State University Libraries (Page 68-76)