CHAPTER 3: Methods 3.3 Analytic Approach: Study Aim 2 3.3.5 Data Analysis Exploratory analyses were first conducted by examining the frequency distributions and descriptive statistics for each of the variables included in this analysis. Although these data have gone through extensive data editing and consistency checks, I inspected all variables for implausible or out-of-range values and where possible, used other data collected in the questionnaires to check for logic and consistency with the key variables of interest. Bivariate distributions of fibroid status with the exposures of interest and each of the covariates were also examined, as was the percentage of records with missing responses on the covariates. To get a clearer picture of the relationship between age and fibroid status, I categorized age as 21-29 (due to small numbers) and then by successive 2-year categories (e.g., 30-31, 32-33, …, 58-59), and plotted the log-odds of fibroids by age. The log-odds of fibroids tended to increase in a linear fashion for the most part, but seemed to level off (or even form a slightly inverse “U” shape) after about age 50. “Uncorrected” regression model Logistic regression was used to estimate the association between pesticide use and uterine fibroid prevalence. The first step was to use the uncorrected outcome, self-reported uterine fibroid diagnosis. Although effect measure modification was not a primary focus, a number of the possible endocrine disrupting pesticides were removed from the market in the 43 dependent (e.g., younger women would not have used DDT), odds ratios and 95% confidence intervals were estimated for each age stratum (21-34, 35-39, 40-44, 45-49, 50-54, 55-59) and visually inspected for differences. I tested for statistical interaction by age of the associations between fibroids and pesticide use patterns, ever use of hormonally active pesticides, and chemical class pesticide groupings by including interaction terms for each exposure and age with a P < 0.10 significance level. I evaluated the linearity assumption for categorical predictors by including disjoint indicator terms and inspecting graphs of the log-odds of fibroids plotted against the variable’s categories (151). When a linear trend was seen, I modeled the variable as a single ordinal (e.g., 0, 1, 2) variable and computed a Wald P value for its coefficient. Based on the non-linear relationship between log odds of fibroids and age, I added a quadratic term for age in the models. The quadratic term for age was statistically significant, but resulted in very small changes in the exposure effect estimates. However, excluding the quadratic term resulted in a poorly-fit model as assessed by the Hosmer-Lemeshow goodness-of-fit test (P <0.0001) (152), so it was retained. A backward elimination approach was used to build the final multivariable logistic regression model. Age (continuous), age squared, and state of residence were forced into the models. Each of the other two covariates was dropped one at a time sequentially from the full model (starting with the covariate with the highest P-value in the full model and working down), and retained if it resulted in a 10% or greater change in the exposure odds ratio relative to the full model. Outcome correction The next step in the analysis was to run logistic regression models utilizing a method proposed by Magder and Hughes to correct for outcome misclassification (12). This method incorporates values of sensitivity and specificity into the estimation of logistic regression parameters and corresponding variances using the Expectation-Maximization (EM) algorithm to obtain maximum likelihood estimates (153). The procedure can be described as essentially performing a “…standard logistic regression considering each study subject as both diseased and not diseased with weights determined by the probability that the study subject is truly diseased given the data” (12). To paraphrase their illustrative example, suppose a woman reports that she has had a fibroid diagnosis. Given the sensitivity and specificity of the self-report and the values of that woman’s covariates, the probability that she truly has fibroids is estimated as 90%. Then a standard logistic regression is performed with that woman entered twice: once as diseased with weight = 0.90 and again as non- diseased with weight = 0.10. These probabilities need to be recalculated after the logistic regression parameters are estimated because of the fact that the probabilities are partially based on the value of the parameters. This leads to new probabilities, which lead to new regression parameters. This process—estimating the probabilities and the regression parameters—is repeated until the parameter estimates converge. The benefit of the Magder and Hughes method is that it accommodates varying sensitivity and specificity values for different subgroups of the analysis population. Based on results from the validity analysis in Aim 1, sensitivity for white women increased with age (except for the oldest age group) but specificity decreased slightly with age. The descriptive analysis of presence/absence of fibroids at ultrasound among women reporting a previous 45 diagnosis suggests, however, that these women may not have been wrong. Rather, tumor regression could have occurred with intervening factors such as time since diagnosis or pregnancies. I used a SAS macro available from the authors at http://medschool.umaryland.edu/epidemiology/software.asp to perform the outcome correction. I used results from the Aim 1 analysis to inform the estimates for sensitivity and specificity of self-reported fibroids diagnosis. For the main correction model, specificity was set to 0.95 but sensitivity varied by age: 18-29, 0.15; 30-34, 0.20; 35-39, 0.35; 40-44, 0.40; 45-59, 0.30. Sensitivity was set to 0.85 for women who reported having had a hysterectomy (n = 3,022) based on the assumption that they would be better reporters of fibroid diagnosis. As above, all corrected odds ratios were adjusted for age, age squared, and state. Additional analyses Several secondary analyses were conducted. First, I examined associations between specific pesticides and uterine fibroid diagnosis and compared effect estimates obtained using different referent groups: 1) including never users of any pesticides as well as users of pesticides other than that of interest and 2) only users of pesticides other than that of interest (Appendix B). Next, I evaluated the degree to which assumptions about self-report validity influence the corrected odds ratios and 95% confidence intervals (Appendix C). I used age-specific sensitivity (regardless of hysterectomy status) and specificity = 0.95 as the initial set of assumptions, and then varied sensitivity, specificity, and both. Assumptions about self-report validity among women with hysterectomy were evaluated by varying sensitivity and to 59 years old, whereas the validity analysis population only includes women up to age 49, it was difficult to predict the shape of the sensitivity and specificity curves for women in older age ranges. The final sensitivity analysis was conducted to examine the influence of In document Pesticide use and self-reported uterine leiomyomata among farm women : an analysis of the Agricultural Health Study with assessment of outcome misclassification (Page 55-60)