3. RESEARCH DESIGN AND METHODS
3.7 DATA ANALYSIS
The statistical analyses conducted in this study were performed using STATA version 8.2 (8). We excluded participants with extreme values for energy intake as we felt that their responses may not be valid. Men reporting total calories per day less than 800 kcal or greater than 5000 kcal were removed from further analysis. The corresponding values for women were 600 and 4000 kcal. In addition, because we were interested, a priori, in
stratifying analyses of the NCCCSs I and II by race, we excluded participants of these studies who gave a self-report race as “other,” in order to keep only individuals that self- identified as White or African American in the dataset.
Trans fatty acid intake was used as a energy-adjusted variable. Besides potentially confounding the association, total energy intake can add variation and weaken the
association present between trans fatty acid intake and the outcome (9). As stated by Dr. Willett and Dr. Stampfer in their paper on total energy intake:
…Before attributing causality to a specific nutrient, the burden is upon the epidemiologist to demonstrate that the effect of this nutrient is independent of caloric intake (10).
They suggest that the best method for doing this is through the employment of residuals from a regression model where the independent and dependent variables are total caloric intake and nutrient intake, respectively (10). While the residual method of calorie
adjustment is most often used for continuous values of nutrient intake, it is also appropriate to use when categorizing nutrient intake variables (11). Therefore, we energy-adjusted the variable of trans fatty acid using the residuals method. The exposure variable, residuals of
38
trans fatty acids consumption, was then converted from a continuous variable to a
categorical one. Quartiles were created based on the distribution in the control population. Unconditional logistic regression modeling was used to explore the relationship postulated in each of the specific aims. In order to generate unbiased effect estimates, offset terms were included in the models for the NCCCSs I and II to correct for the randomized recruitment sampling fractions, as discussed above (1, 12). Effect measure modification was assessed using a test of homogeneity and a likelihood ratio test with a p-value cut-off of 0.15. A priori, the variables chosen to be assessed for effect measure modification in the DHS IV analysis were sex and NSAID use. For the NCCCSs I and II these were sex, NSAID use, and highest level of education attained. Confounding was assessed for multiple variables. First, a directed acyclic graph depicting the relationship occurring between trans
fatty acid intake, colon/rectal cancer/adenoma, and the possible confounding factors was drawn to get a better understanding of which variables might qualify as confounders (Figure 3.1). From this figure and knowledge of what prior studies of the trans fatty acid-colorectal cancer relationship have identified as potential confounders, the following variables were identified and examined as possible confounding factors: age, sex, race, family history of colorectal cancer, BMI, physical activity, NSAID use, smoking status, highest level of education achieved, vegetable consumption, red meat consumption, calcium intake, and alcohol consumption. Once working with the actual data, the criteria for confounding proposed by Drs Rothman and Greenland were used to identify variables that met the definition of a confounder. Based on this criteria, a confounder must: a) be a risk factor for the disease among the unexposed; b) be associated with the exposure in the source
39
the disease (i.e. not on the causal pathway) (13). These variables were then checked to see if they met the assumption of linearity. If the assumption was not met, the variable was dealt with by creating indicator variables to use in the final model. Once the full model was determined (a model including the exposure, interaction terms for effect measure modifiers, all possible confounding variables, and offset terms, where applicable) backwards
elimination with a 10 percent change-in-estimate criterion was used to refine the model. Additional analyses were conducted in each of the three studies using multinomial logistic regression. In the DHS IV study, adenoma characteristics were analyzed to see if any of these was associated more strongly with trans fatty acid consumption. Location (proximal or distal), size (none, less than one centimeter, or one centimeter or greater based on the size of the largest adenoma), and number (none, one adenoma, more than one
adenoma) were studied, with controls serving as the referent outcome for all analyses. Location was classified as proximal if an adenoma was detected in the cecum, ascending colon, hepatic flexure, or transverse colon and as distal if an adenoma was present in the splenic flexure, descending colon, sigmoid colon, or rectum. Participants with adenomas in both the proximal and distal colon were excluded. In addition to these analyses on the prevalence of colorectal adenomas, we investigated the association between trans fatty acid consumption and the type of polyps identified during colonoscopy. Participants with polyps were categorized as having either hyperplastic or adenomatous polyps and were categorized as controls if they had neither hyperplastic polyps nor adenomatous polyps detected. If a participants had both hyperplastic and adenomatous polyps, they were excluded. Location of the cancer was further investigated in the NCCCSs I and II. For the NCCCS I, cancer was classified as being located in the proximal or distal colon, using the same definition as
40
used for the DHSIV. For the NCCCS II, cancers were identified as being located in either the sigmoid colon, rectosigmoid, or rectum. The associations for specific types of trans fatty acids were also evaluated in the NCCCS II. In each of these additional analyses,
confounding factors were re-evaluated to be sure that the final model being presented was not biased as a result of lack of control for confounders.
41