S IMULATION STUDY EXTENSION TO MODEL SELECTION FOR A BINARY OUTCOME

GENERALIZED LINEAR MODELS WHEN THE EXPOSURE IS UNTRANSFORMED

MISE Turning

5 M EASUREMENT ERROR CORRECTION FOR SELECTION OF A QUADRATIC TERM

5.6 S IMULATION STUDY EXTENSION TO MODEL SELECTION FOR A BINARY OUTCOME

Simulation study design and evaluation criteria

The simulation study for logistic regression in Chapter 4 (Section 4.6) was extended to include a model with a linear functional form of the error-prone exposure. The two quadratic associations from Table 4.5 and this “linear” association were simulated again in this section to assess model selection between the linear (Equation 1.1) and quadratic (Equation 4.1) substantive models when the outcome was a binary variable. The binary outcome, 𝑌_𝑖, was generated as a Bernuolli random variable with probability equal to the logit transformation of the linear predictor. The value of each 𝛽0 was chosen to result in an event rate for the outcome 𝑌 of 20%. As in the previous chapter,

only one setting was considered.

116 Method implementation

All methods were implemented exactly as they were in Section 5.5.3 except for the following changes.

For the implementation of the fully Bayesian MCMC, there was no scaling of the binary outcome. Three chains were run for each simulation with a burn-in period of 300,000 samples with the last 20,000 burn-in samples evaluated for convergence using 𝑅̂. Another 20,000 samples were collected from each chain for inference. ESS was estimated from these samples.

The BTS method was implemented in MCMC as above, but with six chains for each simulation and 10,000 samples collected for inference from each chain.

Model selection using the latent 𝑿

The mean p-value associated with the likelihood ratio test comparing the linear and quadratic models for the J-shaped association was 2x10-4_{. While this was a much higher p-value than that}

observed for the J-shaped association with a continuous outcome, it was still a strong indicator that the quadratic model was the best fit. For all simulations with the J-shaped association, when the latent 𝑋 was fit to each model, the quadratic model was selected using either the 95% CI for 𝛽𝑋2 or AIC.

The mean p-value associated with the likelihood ratio test comparing models for the weak quadratic association was 0.28. Hypothesis testing, i.e. observing whether the 95% CI for 𝛽𝑋2

excludes zero, resulted in the selection of the quadratic model in only 29% of simulations and use of AIC after fitting both the linear and quadratic models resulted in the selection of the quadratic model in only 49% of simulations (Table 5.4).

When simulated data generated from the linear association was fit to both the quadratic and linear models, the p-value associated with the likelihood ratio test was 0.48.

Box 5.3 Simulation study design

Simulations per scenario: 100 True exposure: 𝑋𝑖 ~ 𝑁(10, 1)

Classical error model:

𝑊𝑖𝑗 = 𝑋𝑖+ 𝑈𝑖𝑗, 𝑈𝑖𝑗~𝑁(0, 𝜎𝑈2), 𝑗 = 1,2

Outcome models:

J-shaped association, 𝑁 = 2000:

P(𝑌_𝑖 = 1|𝑋_𝑖) ~ Bern(−1.8 + 0.2(𝑋 − 9)2₎

Weak quadratic association, 𝑁 = 10000: P(𝑌_𝑖 = 1|𝑋_𝑖) ~ Bern(−4.5 + 0.03 ∗ 𝑋2₎

Linear association, 𝑁 = 10000:

P(𝑌𝑖 = 1|𝑋𝑖) ~ Bern(−7.66 + 0.62 ∗ 𝑋)

117 Table 5.4 Selection of the quadratic model with a weak quadratic or linear association with a binary outcome: 200 simulations performed to find the power and type I error rate in selection of the quadratic when the shape was either a weak quadratic or a linear association as determined by the likelihood ratio test (at α-level 5%, 10%, or 20%), the exclusion of the zero from the 95% confidence interval or credible interval (CI), or the Akaike information criteria (AIC) or alternatively the deviance information criteria (DIC; for the standard MCMC method only). Applied for a validation study performed on 30% of 2000 study participants with measurement error variance ¼ the variance of 𝑋.

Hypothesis testing 𝐻0: 𝛽𝑋2= 0

AIC/DIC Hypothesis testing 𝐻0: 𝛽𝑋2 = 0

AIC/DIC

α=5% α=10% α=20% α=5% α=10% α=20%

Power to select quadratic fit for a weak quadratic association

Type I error rate in selecting the quadratic fit given a linear association

Latent X 29% 41% 52% 49% 5% 9% 21% 17% Naïve 12% 20% 42% 39% 7% 12% 20% 17% RC 15% 23% 43% 39% 6% 12% 20% 17% MCMC 20% 35% 46% 47% 6% 9% 23% 36% MCMC-RC 16% 34% 44% 41% 6% 10% 22% 12% INLA-RC 16% 34% 45% 41% 5% 10% 21% 11%

Power to select quadratic fit for a J-shaped association Latent X 100% 100% 100% 100% Naïve 98% 98% 98% 98% RC 97% 98% 98% 98% MCMC 98% 98% 99% 96% MCMC-RC 98% 98% 99% 99% INLA-RC 98% 98% 99% 99%

Model selection using the naïve analysis

After the addition of 𝜎_𝑈2_{= 0.25 to the exposure, in the naïve analysis the 95% CI for 𝛽}

𝑋2 excluded

zero for the J-shaped association in 98% of simulations (Table 5.4). The naïve AIC similarly selected the quadratic model over the linear model in 98% of simulations for the J-shaped association.

For the weak quadratic association, in the naïve analysis the 95% CI for 𝛽_𝑋₂ excluded zero in only 12% of simulations (Table 5.4). Use of AIC resulted in a selection of the quadratic model in 39% of simulations.

Empirically, slightly elevated type I error rates were observed for the linear association, but this was likely due to random variability given the relatively small number of simulations.

Results: Model selection using hypothesis testing

For the J-shaped association, nearly all methods resulted in the selection of the quadratic model in 97-98% of simulations using either the 95% CI/CrI.

For the weak quadratic association, assuming an α-level of 5%, RC selected the quadratic model in 3% more simulations than the naïve analysis on the basis of hypothesis testing (Table 5.4). MCMC-RC and INLA-RC selected the quadratic model in 4% more simulations than the naïve

118 Table 5.5 Quadratic model fit with a binary outcome from Bayesian transformation selection (BTS) samples including mean bias in the quadratic model regression estimates, coverage of the 95% confidence intervals or credible intervals and the associated Monte Carlo error 95% confidence interval (MCE 95% CI), average mean integrated square error (MISE) over the exposure range 7-13, and bias in the turning point of the mean curve (where applicable) estimated from 200 simulations for each setting and association (J-shaped, asymptotic, and weak quadratic). Assumes a validation study performed on 30% of 2000 study participants with a measurement error variance ¼ the variance of

𝑋. The prior probability (𝜋) for the Bernoulli distribution for 𝐼𝑋2 is 0.5. *indicates a MCE 95% CI for bias that excludes

zero. TP indicates turning point. † indicates 1 simulation wouldn’t run for that method.

Association shape: Bias Coverage (MCE 95% CI) Average MISE TP bias 𝜷𝟎 𝜷𝑿_𝟏 𝜷𝑿_𝟐 𝜷𝑿_𝟏 𝜷𝑿_𝟐 J-shaped, N=2000 True values 14.4 -3.6 0.2 9.0 BTS† -0.411 0.078 -0.0038 92% (87 – 97) 92% (87 – 97) 0.044 0 Weak quadratic, N=10000 True values -4.5 0 0.03 BTS -1.452* _0.278*_-0.0133*_{27% (18 – 36)} _{28% (19 – 37)} _0.015

analysis. MCMC selected the quadratic model on the basis of the 95% CrI exclusion of zero in 8% more simulations than the naïve analysis.

Type I error rates for selection of the quadratic model when the underlying association was linear were similar to the naïve analysis for all methods.

Results: Model selection using AIC/DIC

The use of the AIC to select the quadratic model over the linear model when the association was J-shaped resulted in the selection of the quadratic model in 98% of simulations for RC (same as the naïve analysis) and 99% for MCMC-RC and INLA-RC (Table 5.4). Use of DIC to select the best MCMC model resulted in selection of the quadratic model in only 96% of simulations when the association was J-shaped (3% less than the naïve analysis).

For the weak quadratic association, use of the DIC with MCMC resulted in selection of the quadratic model in 47% of simulations (8% more than the naïve analysis). Use of the AIC after correction with RC selected the quadratic model in only 39% (same as the naïve analysis). MCMC-RC and INLA-RC selected the quadratic model using AIC in 41% of simulations. RC, MCMC-RC, and INLA-RC had a type I error rate when using AIC between that observed for the naïve analysis and the latent 𝑋. MCMC, using the DIC, selected the quadratic model for the linear association at more than twice the rate of the naïve analysis, i.e. 36% vs 17%. This further exemplifies the phenomenon of Bayesian feedback (Section 5.5.7) when the likelihood contributes too little information to the posterior [129].

Results: Model selection using BTS

Using only the samples from the quadratic model, for the J-shaped association, the curve fits resulting from the implementation of BTS were very similar to those from the Bayesian analysis with MCMC wherein a quadratic substantive model was specified (Table 5.5). For the weak

119 Table 5.6 Bayes Factor (BF) evidence for the quadratic model with a binary outcome using the Bayesian transformation selection (BTS) approach given the mean posterior probability of the inclusion of a squared term in addition to a linear term (𝐼𝑋2) in ten settings with a continuous outcome. Either a validation study was performed on

30% of 2000 study participants or a replicate study was performed on all 2000 participants. Measurement error variance (𝜎𝑈2) was either ¼ the variance of X or equal to the variance of X. The prior probability (𝜋) for the Bernoulli

distribution for 𝐼𝑋2 was either 0.5 or 0.05. IQR was the interquartile range. In the setting with no measurement error

(*), the Bayesian analysis included only the substantive model described for this method; all other settings incorporated the exposure and measurement error models. 200 simulations performed for each setting. † indicates 1 simulation wouldn’t run for that method. ‡ indicates a value greater than 100 or less than 0.01.

Prior for 𝜋 Posterior mean of 𝐼𝑋2 (IQR) Median BF (IQR) % of simulation for which BF ≥ 𝑥 Association shape: 𝑥 = 1 𝑥 = 3 𝑥 = 10 J-shaped† 0.5 0.866 (0.83 – 1.00) 61.8 (5.0 – 100‡) 94 81 51 Weak quadratic 0.5 0.127 (0.01 – 0.17) 0.07 (0.01‡_{– 0.21)} ₅ ₀ ₀ Linear 0.5 0.061 (0.01 – 0.08) 0.01 (0.01‡ – 0.09) 0 0 0

quadratic association, because there were far fewer samples where 𝐼_𝑋₂= 1, the ESS for 𝛽̂_𝑋₂ was very small in most simulations (median 17.7; IQR 9.6 – 37.0). Consequently, the quadratic curve fit from these limited samples was very poor.

From Table 5.6, it can be seen that the J-shaped association had the highest posterior probability that the quadratic model was the most appropriate model, i.e. the mean 𝐸[𝐼̃𝑋2|𝑾, 𝑋, 𝑌, 𝜽] = 0.65.

By comparison, the weak quadratic association had a mean posterior probability for the quadratic model of only 0.29. The linear association had a much higher mean posterior mean of 𝐼𝑋2for

logistic regression than that observed for linear regression; this is likely due to the fact that the likelihood had less information in this setting and the prior information held more weight. Using the BF, there was at least weak evidence for the quadratic model in 35% of simulations with the J-shaped association and 8% with the weak quadratic association. For both association shapes, this was a dramatic reduction from the naïve analysis.

5.7 S

UMMARY

In this chapter, I introduced hybrid Bayesian/RC methods using either MCMC or INLA for the Bayesian posterior estimates. These methods reliably improved the power to select the quadratic model over the naïve analysis or standard RC while maintaining type I error rates at or below the nominal level. The hybrid methods were also much faster than fully Bayesian methods and, as will be seen in the next chapter, could easily accommodate more complex model selection procedures.

I also introduced the use of a method of BTS in order to determine the probability that a squared term should be included in the model. This model requires a Bayesian framework to interpret, but can inform the user as to the specific probability of each model.

120 All methods were observed to have type I error rates at or below the nominal level when making use of hypothesis testing for model selection (within the bounds of empirical variability). For the Bayesian methods (MCMC, MCMC-RC, and INLA-RC), whether the type I error is at or below the nominal level will depend on appropriate specification of the prior distributions. Misspecification of the prior or the model may lead to increased type I error and loss of power. AIC is likewise affected for MCMC-RC and INLA-RC but likely maintains the relative type I error rates as would be observed for hypothesis testing at a nominal α-level of 16%.

RC recovered no power lost to the effects of classical measurement error for the model demonstrated in this section. In the logistic regression simulation study, RC appeared to recover a small proportion of the power (Table 5.4), but that was due to bias remaining in the regression coefficients (Table 4.6). If additional fully measured covariates related to the error-prone exposure were available or other sources of information were integrated into the measurement error model, power gains would more likely be observed [130,131]. However, the reduction in bias using correction by RC comes at a cost of greater variability (referred to as the “bias versus variance” tradeoff in Carroll et al [1]) and thus no power is gained.

As implemented in this chapter, the hybrid methods and MCMC had improved power to select the quadratic model for a weak quadratic association when hypothesis testing or AIC was used. DIC was unreliable: sometimes resulting in power gains and others in loss of power significantly below that of the naïve analysis. Use of DIC also resulted in greatly inflated type I error when used in combination with a replicate study as a result of Bayesian feedback wherein the choice of parameters or model specification for the substantive model impacts the choice of parameters for the measurement error or exposure models [129]. This feedback resulted in overselection of the more complex model (i.e. high type I error).

When looking exclusively at samples for which 𝐼𝑋2 = 1, BTS was observed to result in the same

reduction of bias and good coverage properties observed in Chapter 4 for a Bayesian model with the quadratic model specified. The use of BTS resulted in improved power to select the quadratic model for the higher measurement error variance (i.e. when the model was correctly specified). Bayesian feedback was not observed in this model for linear regression with a replicate study or for logistic regression.

The performance of the measurement error correction methods when the substantive model was logistic regression was similar to when the substantive model was linear regression. The most important caveat being that the likelihood will have less information for a binary outcome with the same effect size (or strength of association) and sample size.

121 Lessons learned about model selection in the context of these measurement error correction methods will be applied in the next chapter to fractional polynomial model selection.

122

6 MEASUREMENT ERROR CORRECTION FOR

In document Use of the Bayesian family of methods to correct for effects of exposure measurement error in polynomial regression models (Page 116-123)