Appendix 3. Strength of association in the Cox proportional hazards model To assess the performance of the standard two-part calibration for varying strength of
4 A multivariate method to correct for Measurement Error in Exposure variables using
4.2.4 A description of the proposed multivariate measurement error adjustment method
To adjust for the bias in the association parameters, we proposed a method that combines the observed self-report data in the DQ with the external validity data for the DQ derived from the literature. The method uses a Bayesian MCMC approach that accounts for the uncertainty in the literature reports, uncertainty that is both due to heterogeneity in the study populations in the literature reports and in the parameter estimation. Here, we described the bias-adjustment steps for the proposed method.
First, we obtained the posterior distributions of the unadjusted logHRs estimates (π½π½Μππ1, π½π½Μππ2)T that ignore the measurement error in the DQ. This was done by fitting a Bayesian Cox proportional hazards model (4.1) to the observed self-report data in the DQ for FV intake and cigarette smoking. In the Bayesian Cox model, we assumed weakly informative independent normal priors (πππ½π½
πΉπΉππ) for the unadjusted logHRs by choosing a large variance as πππ½π½πΉπΉππ~N(0, 106).
77
Second, we estimated the posterior distribution of the covariance matrix for the observed self-report DQ data ( Ξ£ππ ). Based on data exploration of the DQ data, a multi- normal distribution was assumed for the self-report intake data as ππ~N(Β΅ππ, Ξ£ππ). To ensure minimal influence of the prior information on the estimate of Ξ£ππ, a weakly informative inverse Wishart prior (ππ Ξ£ππ) was assumed as ππ Ξ£ππ~IW(Ξ0, Ο 0), where Ξ0= ππ2 (identity matrix) is the scale parameter and ππ0= 2 is the degrees of freedom. Note, this parameterization ensures a weak informative inverse Wishart prior for Ξ£ππ (Lesaffre and Lawson, 2012). Noteworthy, varying the magnitude of ππ0 did not alter the results much, because the likelihood dominated the prior given the large size of the EPIC data set.
Third, we generated the validity coefficients for the FV intake and cigarette smoking using prior information from the literature on external validation studies. We interpreted the lower and upper limits for the literature-reported validity coefficients as 0.05 and 0.95 quantiles of the distribution of plausible values, respectively. Using the limits of the literature-reported validity coefficients and the 90% quantile and because the distribution of correlation coefficients are usually skewed (Gorsuch, 2010; Lu, 2006), the validity coefficients were generated in a Fisher-z transformed scale as explained in Appendix 1. The generated validity coefficients were transformed back to the original scale using the inverse of Fisher-z transformation.
Fourth, using the validity coefficients generated from the literature data (ππππππππππ) and the posterior distribution for the variances of self-report intakes (ππππ
ππ
2) estimated from the observed DQ data for FV intake and cigarette smoking, the corresponding distribution for the variance of true intakes (ππππ2ππ) was estimated as ππππ2ππ= οΏ½πποΏ½ππππππππΓ πποΏ½πππποΏ½2 using expression (4.6), but with πΌπΌ1ππ set to one.
Lastly, in order to estimate all the elements of Ξ, we needed to estimate the covariance between true intakes πποΏ½ππ1ππ2. This could be done by decomposing the covariance in the observed DQ data πποΏ½ππ1ππ2 into the unknown covariance between true intakes πποΏ½ππ1ππ2 and
78 the unknown covariance between the errors πποΏ½ππ
πΉπΉ1πππΉπΉ2 as shown in expression (4.5). This covariance decomposition is only possible by making plausible prior assumption on either of the two covariances. Here, we made assumption on the plausible range of the covariance between the errors, because making this assumption is more intuitive for the two study variables in this work. To estimate the covariance between the errors πποΏ½πππΉπΉ1πππΉπΉ2, the error variance πποΏ½ππ2πΉπΉππ was calculated as the difference between the estimated variance in the observed DQ data πποΏ½ππ2ππ and the estimated variance in true intake data πποΏ½ππ2ππ as πποΏ½ππ
πΉπΉππ 2 = πποΏ½
ππ2ππ (1 β ππππ2ππππππ). The remaining task is to estimate the unknown correlation between the errors (πποΏ½ππ
πΉπΉ1πππΉπΉ2) required to obtain πποΏ½πππΉπΉ1πππΉπΉ2. To our knowledge, there were no previous studies at the time of this work with information on the error correlation between FV intake and the number of cigarettes smoked in a lifetime. Due to lack of literature data on this error correlation, we generated the correlation between the errors ππππ
πΉπΉ1πππΉπΉ2 from a plausible range, guided by the correlation in the observed DQ data and the prior information on the most probable sign of the correlation between the errors in the FV intake and cigarette smoking (explained in the next section). With the generated πππππΉπΉ1πππΉπΉ2, we could therefore obtain πποΏ½ππ1ππ2 as the difference between πποΏ½ππ1ππ2 and πποΏ½ππ
πΉπΉ1πππΉπΉ2 parametrized as πποΏ½ππ1ππ2 = πποΏ½ππ1ππ2β πππππΉπΉ1πππΉπΉ2πποΏ½ππ1πποΏ½ππ2 οΏ½οΏ½1 β ππππ21ππ1οΏ½οΏ½1 β ππππ22ππ2οΏ½. Thus, the distribution of the adjusted logHR for FV intake (π½π½Μππ1) could be estimated from the distribution of οΏ½ΞοΏ½TοΏ½β1π½π½Μ
πΈπΈ as shown in expression (4.3) and by following the above steps.
4.2.5 A comparison of the proposed method with other measurement error