Simulation Studies - Bayesian modeling and inference for asymmetric responses with applications

To demonstrate the effects of fitting various sub-classes of the SNI-CAR formulation and non-random missingness on subject-level fixed effects, we conduct a simulation study. We use the full mouth MRF graph leading to S = 168, ρ = 0.99, and no spatial covariates (such as site in gap). Data are generated from the model

P (yi(s) = observed) = 1 − Φ(a0+ b0θi(s)),

yi(s)|yi(s) observed ∼ N (a1+ b1θi(s), σi2)

where θi ∼ ST (x>β1S, τi2Q−1(ρ), 31S, 4), and ST is the skew-t density with location

(mean) vector x>β1S, covariance matrix τi2Q

−1_{(ρ), the skewness parameter 3, and}

the shape parameter (degrees of freedom) 4. Each simulated data set contains data generated from this model for N = 50 patients. The p = 3 subject-level covariates xi

are generated independently from the N(0, 1) density, and the regression coefficients are β = (0, 1, 2)/3. Finally, τ_i2 = a1 = b1 = 1 and a0 = −1. Under this setup, M = 200

datasets are generated from each of the two designs that varies with the missing data mechanism b0. They are:

Design 1: b0= 0 and σ2i = 1,

Design 2: b0= 1 and σ2i = 1,

For all designs, the observations within patients are spatially correlated. The subject- specific variances were all fixed to 1. We analyze each simulated data set using six models:

Model 1: Normal (N) model without non-random missingness, that is, b0= 0,

Design Model b0 β0 β1 β2 RB1 RB2 MSE 1 1 - 0.380 0.630 0.900 -0.318 -0.329 0.059 2 - 0.630 0.855 0.995 -0.008 -0.045 0.044 3 - 0.190 0.995 1.000 -0.008 -0.006 0.006 4 0.055 0.520 0.740 0.975 -0.089 -0.162 0.051 5 0.040 0.555 0.875 1.000 0.009 -0.025 0.042 6 0.050 0.160 0.995 1.000 -0.011 -0.005 0.005 2 1 - 0.440 0.705 0.905 -0.341 -0.334 0.048 2 - 0.530 0.885 0.995 -0.148 -0.176 0.027 3 - 0.215 0.865 0.985 -0.128 -0.079 0.015 4 0.780 0.480 0.715 0.930 -0.310 -0.308 0.048 5 1.000 0.615 0.910 1.000 -0.170 -0.215 0.036 6 0.970 0.210 0.920 0.990 -0.027 -0.018 0.011

Table 3.1: Simulation study results. Column labels b0- β2give the proportion of 95% intervals

that exclude zero. Columns RB1 and RB2 denote the Relative Bias for parameters β1 and β2,

while the column MSE stands for the overall mean squared error for all parameters.

Model 3: Skew-t (ST) model and b0 = 0,

Model 4: Normal model (N) with non-random missingness,

Model 5: Skew-normal model (SN) with non-random missingness, and

Model 6: Skew-t (ST) model with non-random missingness,

where all models account for the spatial association via the CAR structure. While Models 2 and 5 only accommodates asymmetry, Models 3 and 6 includes asymmetry and heavy tail behavior.

The results are presented in Table 3.1. For each model and each design, we calculate the proportion of the 95% posterior intervals for b0 and the regression coefficients that

exclude zero. We also compute the (overall) Mean Squared Error (MSE) and Relative Bias (RB) for the parameters, which are also used in the simulation studies in the occupational hygiene project (see 2.5.3). MSE = _p×M1 PM

i=1 Pp j=1( ˆβ (i) j − βj)2, and RelBiasj = _M1 PMi=1 ˆ β_j(i)−βj βj , where ˆβ (i)

data set and βj is the true value.

For Design 1 (that generates ignorable missing data), fitting non-ignorable missing N and SN models (Models 4 and 5) leads to enhanced power for β1, compared to the

respective Models 1 and 2. However, power remains the same for Models 3 and 6 (the ST cases). For estimating the null β0, quite interestingly, the power increases in Model

4 compared to Model 1, but reduces for Models 5 and 6 compared to Models 2 and 3, respectively. The RB for both β1 and β2 reduces for Model 4, compared to Model

1 (the N models). However, for SN and ST models, the RBs of β1 and β2 are mostly

comparable between their non-ignorable and ignorable missing counterparts, except for the SN cases (Models 2 and 5) in β2 where it reduces for Model 5 compared to 2.

For Design 2 (which generates non-randomly missing data), there is a clear improvement in the performance for models that handle non-ignorable missing data (Models 4-6) over the ones that doesn’t (Models 1-3), on the overall. Specifically, for β1, there is

improvement in power (see Column 5) in the non-ignorable models over their ignorable counterparts. However, for β2, the power is comparable across both Designs and the

6 models. In addition, RB also reduces for the N and ST non-ignorable missingness models (Models 4 and 6) over their counterparts (Models 1 and 3) for both β1 and β2.

However, this was reversed for the SN models, i.e., the non-ignorable SN model (Model 5) exhibited slightly increased bias over Model 2 for both β1 and β2. The estimated

value of the overall MSE is lower for Model 6 compared to Model 4, same comparing Models 3 and 1, but strangely, higher for Model 5 versus Model 2.

Overall, we conclude that when the underlying dataset exhibit skewness, tail behavior and non-ignorable missingness (Design 2), the skew-t model turns out to be more flexible and efficient than the skew-normal and the usual normality based CAR models for parameter estimation. Quite interestingly, even when the data is generated under ignorable missingness pattern, some non-ignorable missingness model (such as

the Normal) can present substantially improved parameter estimation compared to its ignorable counterpart. However, not much differences are noticed in the estimates from the SN and the ST models. Note that the introduction of various sources of random heterogeneity via skewness, thick-tails, spatial referencing and non-ignorable missingness indeed complicates our framework. Quite often, these sources are not individually identifiable, and that precludes us from understanding and estimating the individual influence of each one of these to the fixed effects estimation.

In document Bayesian modeling and inference for asymmetric responses with applications (Page 75-78)