C ORRECTION METHODS FOR A QUADRATIC MODEL

GENERALIZED LINEAR MODELS WHEN THE EXPOSURE IS UNTRANSFORMED

4 M EASUREMENT ERROR CORRECTION FOR A QUADRATIC TRANSFORMATION OF THE ERROR

4.4 C ORRECTION METHODS FOR A QUADRATIC MODEL

In this section, I will describe the modifications required to adapt the correction methods presented in Chapter 2, where a linear form of the error-prone variable appeared in the substantive model, to the quadratic substantive model presented in Equation 4.1. As in previous chapters, it is assumed that either a validation study has been performed on some fraction of the study population or that a replicate study has been performed providing at least two measures of the error-prone measure on all or part of the study population. In all cases, a classical error model is assumed unless specifically stated otherwise.

Regression calibration (RC)

RC operates by replacing 𝑋 in the substantive model with the expectation of 𝑋𝑖 given the error-

prone measure(s), 𝑾𝑖, and any adjustment variables, 𝒁𝒊. When the substantive model includes

transformations of 𝑋, e.g. 𝑋𝑝_{, RC is extended by replacing 𝑋}𝑝_{with 𝐸[𝑋}𝑝_{|𝑾, 𝒁]. For a quadratic}

model with a continuous outcome, RC therefore works by regressing 𝑌_𝑖 on 𝐸[𝑋_𝑖|𝑾_𝒊, 𝒁_𝒊] and 𝐸[𝑋_𝑖2_|𝑾

𝒊, 𝒁𝒊]:

4.6 𝐸[𝑌_𝑖|𝑾_𝑖, 𝒁_𝒊] = 𝛽₀ + 𝛽_𝑋1𝐸[𝑋_𝑖|𝑾_𝒊, 𝒁_𝒊] + 𝛽_𝑋2𝐸[𝑋_𝑖2|𝑾_𝒊, 𝒁_𝒊] + 𝜷_𝒁𝑻𝒁_𝒊.

The expectation 𝐸[𝑋_𝑖|𝑾_𝒊, 𝒁_𝒊] may be estimated as described in Chapter 2 by regressing 𝑊₁ on 𝑋 if a validation study is present or by regressing 𝑊₂ on 𝑊₁ if a replicate study is present (Equations 2.2 and 2.3). If a validation study is present, the variance of 𝑋 given the error-prone measure and any accurately measured covariates, 𝜎_{𝑋|𝑊𝑍}2 _{, may be directly estimated. Alternatively, in a replicate}

study, 𝜎_{𝑋|𝑊𝑍}2 _{may be estimated as the covariance of the error-prone measures conditional on 𝒁} 𝑖.

80 The definition of variance (var(𝑋) = 𝐸[𝑋2_{] – 𝐸[𝑋]}2_{) can be used to rearrange}_𝐸[𝑋

𝑖2|𝑾𝒊, 𝒁𝒊]

[1,49]. Therefore, we replace 𝐸[𝑋𝑖2|𝑾𝒊, 𝒁𝒊] in Equation 4.6 with 𝜎𝑋|𝑊𝑍2 + 𝐸[𝑋𝑖|𝑾𝒊, 𝒁𝒊]2:

4.7 𝐸[𝑌_𝑖|𝑾_𝑖, 𝒁_𝒊] = 𝛽₀+ 𝛽_𝑋1𝐸[𝑋_𝑖|𝑾_𝒊, 𝒁_𝒊] + 𝛽_𝑋2𝐸[𝑋_𝑖|𝑾_𝒊, 𝒁_𝒊]2+ 𝛽_𝑋2𝜎_{𝑋|𝑊𝑍}2 + 𝜷_𝒁𝑻𝒁_𝒊.

Equation 4.7 can be rearranged to put all constant terms together:

4.8 𝐸[𝑌𝑖|𝑊𝑖, 𝒁𝒊] = (𝛽0+ 𝛽𝑋2𝜎𝑋|𝑊𝑍

2 _{) + 𝛽}

𝑋1𝐸[𝑋𝑖|𝑊𝑖, 𝒁𝒊] + 𝛽𝑋2𝐸[𝑋𝑖|𝑊𝑖, 𝒁𝒊]2 + 𝜷𝒁

𝑻_𝒁 𝒊.

It follows from Equation 4.8 that if 𝐸[𝑋𝑖|𝑾𝑖, 𝒁𝒊] is used in place of 𝑋 and 𝐸[𝑋𝑖|𝑾𝒊, 𝒁𝒊]2 in place

of 𝑋2_{in the quadratic substantive model, the desired 𝛽̂}

𝑋1 and 𝛽̂𝑋2 are equivalent to the observed

linear and quadratic parameters and the desired 𝛽̂₀ is equivalent to 𝛽̂₀∗_{− 𝛽̂}

𝑋2𝜎̂𝑋|𝑊𝑍

2 _.

This may be extended to logistic regression with the same caveats regarding approximation as for the untransformed model (Section 2.2). The Cox proportional hazards model is a special case of this method as the term 𝛽_𝑋₂var(𝑋_𝑖|𝑾_𝒊, 𝒁_𝒊) is subsumed by the baseline hazard (15).

Bootstrapping [1] or an extension of the delta method can be used to obtain SEs [6]. Bayesian analysis using MCMC

A three-part conditional independence structure was introduced in Section 2.3 and applied in the context of an untransformed predictor in Chapter 3. This approach may be extended easily for use when the substantive model also includes an 𝑋2_{term, i.e. the quadratic model in Equation 4.1.}

In this chapter, the substantive model specified, 𝑓(𝑌𝑖|𝑋𝑖, 𝒁𝑖; 𝜷), is updated to be the quadratic

model given in Equation 4.1. The measurement error model, 𝑓(𝑾_𝒊|𝑋_𝑖; 𝝅), remains the classical error model (Equation 1.7) and the exposure model, 𝑓(𝑋_𝑖|𝒁_𝒊; 𝜶), remains the distribution of 𝑋 dependent on any accurately measured covariates, 𝒁, here assumed to be normal.

Scaling of the variables may be necessary to specify plausible prior distributions [28,85] and centering the exposure and its squared term will reduce correlation between the terms and improve MCMC convergence [28]. Scaling and centering must be performed after the transformation of the squared term. The mean and standard deviation of the latent 𝑋 and 𝑋2_{can be estimated from}

𝑾 when no validation study is available. INLA

It was discussed in Section 2.5 that the joint distribution of the latent Gaussian parameters, including the latent 𝑋, is assumed to have the attributes of a GMRF. Whether the simplified Laplace approximation used for the latent Gaussian parameters is reliable depends on the accuracy of this assumption. In Chapter 2, the latent Gaussian parameters, 𝝂, included the regression parameters from the substantive model, the exposure model, and the latent 𝑋, i.e.

81 𝛽₀, 𝜷_𝒁, 𝛼₀, 𝜶_𝒁, and 𝑋_𝑖. When the substantive model is the quadratic model, 𝝂 includes 𝛽₀, 𝜷_𝒁, 𝛼₀, 𝜶_𝒁, 𝑋_𝑖, and 𝑋_𝑖2_{. The method cannot treat both}_{𝑋 and 𝑋}2_{as approximately normally}

distributed latent variables as the square of a normally distributed variable cannot also be normally distributed. Therefore, a significant extension to the INLA method would be required to accommodate transformations of a latent variable.

Furthermore, the software for applying INLA cannot accommodate a transformation of a latent Gaussian variable within the substantive model; therefore, the impact of this violation of principle cannot be easily assessed.

Given these limitations, in this work I will no longer pursue INLA as previously described as a method of measurement error correction when the error-prone measure has been transformed within the substantive model. However, in the next section, a hybrid method using MCMC or INLA and attributes of RC is described.

Bayesian regression calibration

In this chapter and the previous, correction methods are applied in the context of relatively simple models (i.e. the classical error model and the use of a specified functional form of the error-prone exposure). In this relatively straightforward setting, I would like to propose a novel correction method which is a hybrid between Bayesian methods and RC. This method is expected to easily adapt to settings with a complex error model and an unknown functional form of the error-prone predictor, i.e. model selection. MCMC solutions are powerful and flexible, but slow to converge, particularly when model selection is incorporated (Chapters 5 and 6). While the time involved to run standard RC is negligible, estimation of the maximum likelihood estimates of 𝐸[𝑋𝑖|𝑾𝒊, 𝒁𝒊]

for more complex error models can be cumbersome or even prohibitive where the likelihood cannot be expressed in closed form [110]. When assuming the classical error model, this is not a limiting problem with linear and quadratic substantive models but becomes so for more complex substantive models such as the full set of models required for the fractional polynomial method, the topic of Chapter 6.

Standard RC relies on the estimation of 𝐸[𝑋_𝑖|𝑾_𝒊, 𝒁_𝒊] and 𝐸[𝑋_𝑖2_|𝑾

𝒊, 𝒁𝒊] via maximum likelihood

estimation. An alternative means of obtaining 𝐸[𝑋_𝑖|𝑾_𝒊, 𝒁_𝒊] and 𝐸[𝑋_𝑖2_|𝑾

𝒊, 𝒁𝒊] is via the posterior

mean of 𝑓(𝑋_𝑖|𝑾_𝒊, 𝒁_𝒊; 𝜶, 𝝅), where 𝜶 and 𝝅 represent the parameters of the exposure model and the error model, respectively (Section 2.3). Posterior samples of the latent 𝑋, denoted 𝑋̃𝑖, are

drawn using MCMC after the chains have reached convergence. By squaring all samples of 𝑋̃_𝑖, one can estimate 𝐸[𝑋̃_𝑖2|𝑾𝒊, 𝒁𝒊] directly. 𝐸[𝑋̃𝑖2|𝑾𝒊, 𝒁𝒊] will be a good estimate of 𝐸[𝑋𝑖2|𝑾𝒊, 𝒁𝒊] as

long as the error and exposure models are not misspecified and enough samples have been drawn to approximate the distribution. Each estimated expectation, 𝐸[𝑋̃_𝑖|𝑾_𝒊, 𝒁_𝒊] and 𝐸[𝑋̃_𝑖2_|𝑾

82 then be inserted directly into Equation 4.6 to fit the quadratic model. Inference can then be made according to frequentist principles.

The MCMC chains would be expected to converge more quickly using this simpler model than the fully Bayesian model which incorporates the substantive model. This is particularly true for non-linear outcome models such as logistic regression.

Either MCMC or INLA may be used to generate the posterior samples of 𝑋̃_𝑖. While sampling is not inherent to the INLA method, samples may be obtained from the estimated posterior distribution. This operation is still much faster than MCMC because there is no need to wait for convergence or concerns about autocorrelation, i.e. the ESS is equivalent to the number of samples drawn. In this work, each method used in this way will be referred to as MCMC-RC or INLA-RC, respectively. R code demonstrating the implementation of each is provided in Appendix B. Any other Bayesian method of analysis, such as Hamiltonian Monte Carlo algorithms, may be used similarly.

Bayesian RC would be expected to underestimate variance in the regression parameters of the substantive model as it does not propagate the uncertainty due to measurement error from the MCMC model to the substantive model (Section 2.2.1). In theory, bootstrapping may be used for better estimates of the SEs; however, this could only be done for simple models and small data sets for MCMC-RC. For INLA-RC, bootstrapping is more feasible but was not used in simulation studies in this thesis.

Multiple imputation

Multiple imputation of squared terms in the missing data context

The desirability of compatibility of the substantive model and the imputation model when performing MI was discussed in Section 2.6. However, several authors have considered MI methods for imputing covariates when they appear as transformed terms in the substantive model which use imputation models that are not compatible with the substantive model [111–113]. The simplest method is to impute the missing variable 𝑋 assuming a linear relationship to 𝑌, then transform it to 𝑋2_{for the substantive model. This preserves the 𝑋 and 𝑋}2_{relationship but violates}

the theoretical properties of the joint model for the substantive model and imputation model underlying the multiple imputation. This method, sometimes referred to as the “passive approach”, has been shown to result in biased regression estimates [111,112]. Alternatively, in what’s been called the “Just Another Variable” (JAV) approach, one can impute both 𝑋 and 𝑋2

separately using chained equations as if they were different variables [111]. Unlike the “passive approach”, JAV does not preserve the relationship between 𝑋 and 𝑋2_{. In some settings, JAV may}

83 improve estimates over the “passive approach”, but in many common settings it still results in bias [111–113].

In a method called polynomial combination, Vink and van Buuren proposed to impute not 𝑋, but 𝑋 + 𝑋2_{[112]. While this method results in less bias than JAV and preserves the}_{𝑋 and 𝑋}2

relationship, it is not easily extendable to other transformations of the latent exposure.

SMC-FCS, which uses rejection sampling to ensure compatibility (Section 2.6), was demonstrated in its original publication for use with a quadratic substantive model (Equation 4.1) [46]. For this transformation of the latent exposure as well as others, SMC-FCS was demonstrated to be effective at minimizing bias due to missing data.

Multiple imputation of squared terms in the measurement error context

None of the above MI approaches have, to my knowledge, been applied to exposure measurement error in the published literature.

In Chapter 3, I explored the use of SMC-FCS for measurement error correction with either a validation study or a replicate study present. When a validation study has been performed, SMC- FCS is an effective tool for minimizing bias without any alteration to the method as used for missing data. However, for replicate studies, it was necessary to alter the method to incorporate the measurement error model and to stipulate proper priors for both the 𝜎_𝑋2_{and 𝜎}

𝑈2 variances to

ensure reliable posterior inference [27]. From the simulation studies performed (Section 3.5.3), it was further shown that without stipulation of somewhat informed priors for the substantive model regression coefficients, estimated posterior distributions of the regression parameters are uninformative when the likelihood contains little data, i.e. due to high measurement error variance and/or small sample size. Therefore, use of a fully Bayesian model from which to draw samples of the latent 𝑋 may be required for reliability across scenarios.

The same MCMC model as used for the fully Bayesian analysis may be used to obtain samples of 𝑋̃𝑖 and 𝑋̃𝑖2 from 𝑓(𝑋𝑖|𝑾𝒊, 𝑌𝑖, 𝒁𝒊; 𝜽) which could then be imputed into data sets to be used in the

standard MI fashion. That is, the quadratic model would be fit to each imputed data set and pooled regression estimates obtained by applying Rubin’s Rules. In this way, estimates with frequentist properties may be obtained in lieu of Bayesian posterior means.

Given the limitations of MI as demonstrated in Chapter 3 and as explored by others in the field of missing data, in this thesis I will no longer pursue MI as a method of measurement error correction where the error-prone measure has been transformed.

4.5 S

IMULATION STUDY WITH A CONTINUOUS OUTCOME

In document Use of the Bayesian family of methods to correct for effects of exposure measurement error in polynomial regression models (Page 80-85)