Modeling Strategy - Methodology for Specific Aim 1

3. METHODOLOGY

3.3 Methodology for Specific Aim 1

3.3.6 Modeling Strategy

A two-stage hierarchical regression model will be utilized for the primary analyses and for aim 1b. For sensitivity analyses and aim 1a, only the first-stage model, the polytomous logistic regression model will be used due to reduced sample size.

3.3.6.1 Polytomous Logistic Regression

As shown in Table 3.3, there will be eight sets of models using a polytomous logistic regression model. The outcome will be defined using either the 6 level-3 groupings or the individual CHDs. Exposure will be a single average for weeks 2 through 8 or simultaneously modeling each week of exposure. Exposure will be examined both continuously and categorically, based on results from the data exploration analyses described above. For specific aim 1b, the factor scores will be used to assign exposure.

The models will be constructed using a backward elimination strategy, starting with a full model that contains all covariates that were identified as potential modifiers and confounders in the data exploration analyses. EMM will be assessed first, by comparing the

full model which includes the interaction term(s) to a model that does not. This will be done for each potential modifier, and if one drops out, the new full model will not contain that interaction term. Likelihood ratio tests will be performed to determine which modifiers will remain using an α-level of 0.20. Any variables not identified as modifiers will be assessed for confounding by individually removing covariates and comparing stratum-specific estimates (if modifiers are present) for all cardiac levels back to the full model. We will examine the change-in-estimate and change-in-precision for each calculated estimate. Because we are using polytomous models, the models result in multiple estimates, one for each CHD grouping (i.e. outcome level). Therefore, there is only one adjustment set for all of the different categories of defects. If the change-in-estimate for at least one defect category is greater than 0.05, the covariate being examined will be retained in the model. If the change-in-estimate for some defect categories caused by adjusting for a variable corresponds to a greater loss of precision in others, we will evaluate running separate models for the different cardiac outcomes as opposed to running polytomous models. This will be done for all covariates. The order of removal will be covariates that were identified through data exploration and then covariates that were identified by the DAG only and then covariates that were identified by the DAG and through data exploration. The final

determination of the model will consider the results of this strategy along with the DAG and the adjustment sets in previous research.

As a sensitivity analysis, the models will be conducted by restricting to women who live within 10 km of a monitor. We hypothesize that we may reduce exposure

misclassification by limiting the population to women whose assigned exposure may be more reliable. For aim 1a, if folic acid-containing supplement use was not determined to be

a modifier through the modeling building process, it will be put back in to the final model and reassessed using likelihood ratio tests.

3.3.6.2 Hierarchical Model

Because we simultaneously assessed multiple weeks of exposure and multiple defects/groupings, we constructed two-stage hierarchical regression models, using a software program adapted from Witte et al, to account for the correlation between

estimates and partially address multiple inference.For the primary analyses and aim 1b, the polytomous logistic regression described above represents the first-stage model. Equation 1 represents the unconditional, polytomous logistic regression model containing all individual weeks of exposure, or the single 7-week average, and the full adjustment set determined by the process described above.

Pr |,

∑

(1)

x represents either the seven-week average or the vector of weekly pollutant concentrations, βd is the vector of regression coefficients corresponding to pollutant exposure for a specific CHD (d), w represents the covariates and γd is the vector of regression coefficients corresponding to the covariates for that specific CHD. The second- stage model is given in equation 2

(2)

where Zi is a row in the design matrix that includes an intercept term and then indicator variables for type of defect, broader defect grouping, and exposure week/level for the i-th β,

π is the vector of coefficients estimated from the data and δi are independent normal random variables with a mean of zero and a variance of τ2 that describe the residual variation in βi, not captured by the design matrix. The obtained second-stage coefficients

are used to estimate the means toward which the first-stage coefficients will be shrunk towards, with the magnitude of the shrinkage depending upon the precision of the maximum likelihood estimate obtained in stage 1 and the value of the second stage

variance, τ2. 144,145 We fixed τ2 at 0.5, corresponding to a prior belief with 95% certainty that the residual odds ratio will fall within a 16-fold span.

This model was implemented using Proc IML in SAS v9.2. To assess whether our results were robust to changes in model specification we explored setting the value of τ2 to 0.25, corresponding to a 7-fold odds ratio span as well as to a value of 1, corresponding to a 50-fold span. Additionally, we explored different specifications for the design matrix which would define the prior mean as either a common mean for all defects, a common mean for each defect, or a common mean for each exposure week/level, across defects.

In document 6205.pdf (Page 78-81)