Results - Simulation Study - Bayesian Modeling of Complex High-Dimensional Data

2.3 Simulation Study

2.3.3 Results

We applied VFMM and Bayesian FMM to the 1-D and 3-D simulated data and compared their performance in estimation and region detection. During the fittings of both VFMM and FMM, we adopted wavelet transformation by using Daubechies wavelets with 6 resolution levels. In the 3-D case, wavelet compression was performed when running VFMM, and the truncation parameter was chosen to retain at least 95% of the total variation. For Bayesian FMM, we ran 5000 MCMC iterations with the first 3000 iterations as the burn-in period.

Posterior samples of B^∗were also obtained for the VFMM model in order to calculate APVar.

Based on the posterior estimation results, we calculated the summary statistics for B^∗ and listed them in Table 2.1.

For region detection, in the 1-D case we focused on detecting regions of cell line effect, organ effect, organ-cell-line interaction along with mean effect at log₂(1.5) difference. In the 3-D scenario, we focused on detecting regions with |Bi(t)− Bj(t)| > 5, where i, j is any pair of effect among 6 pairwise effects we simulated. For VFMM, we performed basis-space testing proposed in Section 2.2.2 with ϵ = 0.07 and evaluated the performance of region detection in data domain. For Bayesian FMM in 1-D scenario, the region detection was performed by controlling FDR on a grid of T in data domain following the Bayesian FDR approach of Meyer et al. [39]. While in the 3-D case, we performed the similar test on the basis domain due to ultra-high dimensionality. We set the significance level α = 0.05 for both models. Performance in region detection is measured using SEN, FNR and SPEC; results for all contrast effects are averaged and compared in Table 2.1. In addition to estimation and region detection, Table 2.1 also compares computation time in hours.

From Table 2.1, we see that for the 1-D case, VFMM resulted in higher AMSE (0.052 vs.

0.051) and lower APVar (0.041 vs. 0.081) than FMM. These patterns become more evident

2.3. Simulation Study 35

Estimation of B^∗ Region Detection Time

Data Model AMSE APVar SEN FNR SPEC (hrs)

1-D FMM 0.051 0.080 0.717 0.284 0.840 1.45

VFMM 0.052 0.041 0.722 0.278 0.835 0.2

3-D FMM 0.099 0.0006 0.956 0.044 0.997 21.6

VFMM 0.124 0.0001 0.956 0.044 0.997 1.27

Table 2.1: Simulation results of FMM and VFMM for both 1-D and 3-D cases.

for the 3-D case. These statistics reflect the effects of using mean-field assumption for the posterior distribution—assuming independence in posterior distribution may result in elevated estimation error and narrower credible intervals. For region detection, we see that for the 1-D case, VFMM resulted in higher sensitivity (0.722 vs. 0.717), lower specificity (0.835 vs. 0.840), and lower false negative rate (0.278 vs. 0.284). This indicates that, with narrower credible interval, VFMM tends to flag more significant locations, thus may be more powerful than FMM. For the 3-D case, the region detection results are comparable for VFMM and FMM. This may be caused by the fact that the significant regions are easier to be identified for this contrast effect (perhaps due to high signal-to-noise ratio). Regarding computation time, Table2.1demonstrates a clear advantage of VFMM relative to FMM. All computations are performed on a linux server equipped with Intel(R) Xeon(R) CPU E5-4627 v2 @ 3.30GHz with 252G RAM storage. For the MCMC-based FMM, the computation time required for running 5000 MCMC sampling increases from 1.45 hours in the 1-D case to 21.6 hours in the 3-D case. The VFMM, on the other hand, requires only 0.2 hours for the 1-D cases and 1.27 hours for the 3-D case. In addition to shorter computation time for posterior estimation, inference such as region detection is performed in the basis-space for VFMM, thus there is no need to perform inverse-transform for a large amount of posterior samples.

Furthermore, the storage space required for VFMM is also substantially reduced as there is no need to save posterior samples. These computation benefits make VFMM attractive for large-scale data.

Cell Line Effect

vfmm mean fmm mean truth vfmm bfdr fmm bfdr fmm ci

Figure 2.2: The 1-D simulation case: estimation and region detection results for the simulated cell line effect C(t) = (B1(t)− B2(t) + B3(t)− B4(t))/2.

vfmm mean fmm mean truth vfmm bfdr fmm bfdr fmm ci

Figure 2.3: The 1-D simulation case: estimation and region detection results for the simulated organ effect C(t) = (B₁(t) + B₂(t)− B3(t)− B4(t))/2.

vfmm mean fmm mean truth vfmm bfdr fmm bfdr fmm ci

Figure 2.4: The 1-D simulation case: estimation and region detection results for the simulated organ -cell-line interaction C(t) = (B₁(t)− B2(t)− B3(t) + B₄(t))/2.

In addition to summary statistics shown in Table2.1, we also plot the estimation and region detection results for selected contrast effects. Figure 2.2 shows the results of the cell line effect C(t) = (B₁(t)− B2(t) + B₃(t)− B4(t))/2. Green and blue lines mark the posterior means of VFMM and FMM respectively. The yellow line shows the true value of C(t).

The 95% percent credible bands were shown by shaded gray area for VFMM and by red

2.3. Simulation Study 37

dash lines for FMM. For FMM, the 95% percent credible bands were calculated by finding the (0.025, 0.975) percentiles pointwisely on a grid of T based on posterior samples of C(t) (obtained after inverse wavelet transform). For VFMM, we generated 1000 samples of B^∗ from the approximate posterior distribution and applied inverse wavelet transformation to transform these samples to data domain. The credible bands were then calculated similarly as in the FMM case. Detected regions are flagged by magenta and cyan dots at the bottom of the plot. From Figure 2.2, we see that while VFMM and FMM resulted in very close mean estimates, VFMM produced narrower credible bands and more flagged locations than FMM. Besides Figure2.2, Figure2.3 and Figure2.4illustrate more detailed results for organ effect and organ-by-cell interactions.

VFMM

Figure 2.5: The 3-D simulation case: region detection results for the contrast effects (B1(t)− B₂(t)), along with the truth. Only one 2-D slice of the the 3-D image is plotted. White areas are regions that are not flagged. Colors represent estimated values. Left, middle and right figures correspond to results of VFMM, FMM and the truth respectively.

VFMM

Figure 2.6: The 3-D simulation case: region detection results for the contrast effects (B₁(t)− B₃(t)), along with the truth. Only one 2-D slice of the the 3-D image is plotted. White areas are regions that are not flagged. Colors represent estimated values. Left, middle and right figures correspond to results of VFMM, FMM and the truth respectively.

Figure 2.5 shows the flagged regions by VFMM and FMM for the contrast effect (B1(t)− B2(t)) in the 3-D simulated case, along with the truth. For demonstration convenience, only one a 2-D slice of the 3-D image is shown. White areas are regions that are not flagged. Colors represent estimated values. For this contrast effect, we expect two significant local regions to be flagged—one corresponds to a cube with staircase intensity, the other corresponds to a ball with the highest intensity in center. From Figure 2.5, we see that VFMM and

2.3. Simulation Study 39

Figure 2.7: The 3-D simulation case: region detection results for the contrast effects (B₁(t)− B₄(t)), along with the truth. Only one 2-D slice of the the 3-D image is plotted. White areas are regions that are not flagged. Colors represent estimated values. Left, middle and right figures correspond to results of VFMM, FMM and the truth respectively.

VFMM

Figure 2.8: The 3-D simulation case: region detection results for the contrast effects (B₂(t)− B₃(t)), along with the truth. Only one 2-D slice of the the 3-D image is plotted. White areas are regions that are not flagged. Colors represent estimated values. Left, middle and right figures correspond to results of VFMM, FMM and the truth respectively.

FMM perform similarly in identifying the two regions. Both VFMM and FMM estimated the ball shape very well, but fail to recover the staircase pattern in the cubic region. All estimated effects of FMM and VFMM are illustrated, which reveal similar inference results in ultra-high dimensional data.

VFMM

Figure 2.9: The 3-D simulation case: region detection results for the contrast effects (B₂(t)− B₄(t)), along with the truth. Only one 2-D slice of the the 3-D image is plotted. White areas are regions that are not flagged. Colors represent estimated values. Left, middle and right figures correspond to results of VFMM, FMM and the truth respectively.

VFMM

Figure 2.10: The 3-D simulation case: region detection results for the contrast effects (B₃(t)− B₄(t)), along with the truth. Only one 2-D slice of the the 3-D image is plotted. White areas are regions that are not flagged. Colors represent estimated values. Left, middle and right figures correspond to results of VFMM, FMM and the truth respectively.

In document Bayesian Modeling of Complex High-Dimensional Data (Page 47-54)