Confounding - Methodological issues - Research Design and Methods

2. Research Design and Methods

2.9 Methodological issues

2.9.3 Confounding

Confounding is defined as the mixing of effects of a third covariate with that of the exposure on disease, resulting in a biased effect estimate. In order for a covariate to be a confounder, it must be causally related to the exposure of interest and the outcome of interest (98). In this study, the main exposure in all analyses was a genotype or a haplotype.

Therefore, any confounders would have to causally affect a genotype or haplotype to meet the confounding criteria described by Rothman and Greenland. The effect of potential confounders was also evaluated using statistical models. If the addition of a covariate changed the |ln(OR)| of the exposure of interest by more than 0.10, then that covariate was considered a confounder.

While some associations between environmental variables and genotype may be observed due to the random error, no environmental variables were expected to be causally associated with candidate gene genotypes (meaning that on a directed acyclic graph (DAG), genotype was not the descendent of an environmental variable). Therefore even if the covariate was associated with genotype and outcome, it still did not meet the definition of a confounder. Even more importantly, if that covariate was on the causal pathway between genotype and the outcome, adjusting for that covariate could bias the association between exposure and outcome.

134

candidate genotypes was ancestry. The proportion of African ancestry was estimated for each subject and included in all models, as described above. Adjustment for ancestry had little effect on ORs estimated for breast cancer overall and basal-like breast cancer. Ancestry adjustment did affect ORs for a small number of genotypes and luminal A breast cancer, and was adjusted for in all models to control for bias in these associations.

Effect measure modifiers in analyses of genotype-environment interactions can be susceptible to confounding by other environmental variables. However, this was problematic because adjusting for a confounder of the effect modifier has the potential to bias the

association of the genotype. Even more importantly, if the potential confounder was on the causal pathway between genotype and the outcome, adjusting for that covariate could drive the estimated genotype effect towards the null.

Relationships between WHR and basal-like and parity, lactation, and luminal A breast cancer were explored using DAGs. In the WHR-breast cancer DAG menopausal status and parity/lactation status were identified as potential confounders. Potential confounders from DAG analysis were evaluated for a statistical effect on the parameter estimate of

interest. Adjustment for these two risk factors did not alter the parameter estimates for WHR, and they were not included in further WHR interaction analyses. Reviews of the literature suggest that BMI is a confounder of the association between WHR and breast cancer in premenopausal women, and failure to adjust for BMI biases associations towards the null (82, 83). In CBCS data, BMI adjustment biased the association for between WHR and breast cancer overall by more than ln(OR) = 0.10. The bias for basal-like and luminal A

associations was lower than this threshold, but closer to 0.10 than to 0. Based on the effect of BMI adjustment in CBCS data and the acknowledgement of BMI as a confounder in the

135

literature, BMI was kept as a confounder in WHR interaction models.

In the DAG examination of relationships between parity, lactation, and breast cancer, menopausal status and age at menarche were identified as potential confounders. Neither of these risk factors affected parameter estimates for the association between the combined parity and lactation variable and basal-like or luminal A breast cancer. Neither factor was included in further analyses as a confounder.

2.9.4 WHR misclassification

Waist and hip circumference were measured at the time of interview. Cases were interviewed a median 3.9 months (range, 0.8 – 42.5 months) after diagnosis, meaning that waist circumference and hip circumference may have been measured after the start of adjuvant therapy for some cases. If case WHR at the time of interview was systematically different from pre-diagnosis WHR, there is the potential for misclassification. There was no systematic event that would have led to WHR change in controls, so misclassification would be non-differential.

Weight change is a commonly documented side effect of breast cancer-related therapy [reviewed by (99, 100)]. In most patient series, patients gained approximately 2 to 20 pounds, and the amount of weight gained varied by study cohort and treatment (99, 100). Most studies reported that weight gain began shortly after breast cancer diagnosis, and the amount of weight gained increased over time (100-104). In some studies, patients experienced weight loss during the year following diagnosis (101, 105). Freedman et al. (106) reported that a group of healthy controls gained more weight on average than breast cancer patients receiving adjuvant therapy, but that the breast cancer patients had a greater fluctuation in weight during the time period shortly before the initiation of chemotherapy

136 until 6 months post-chemotherapy completion.

Ingram et al. (107) reported that post-diagnosis weight change was related to the type of adjuvant therapy, but other studies found no difference by chemotherapy type or regimen (103, 108, 109). Studies have also reported that weight change in breast cancer patients is associated with being premenopausal (99, 101, 106, 110). Two studies reported that lower pre-diagnosis BMI was associated with weight change, but another study did not find an association (101, 103, 109). There is also evidence that African-American breast cancer patients experienced greater weight gain compared to white patients, especially following adjuvant chemotherapy (101, 109).

In addition to changes in weight, studies have reported that breast cancer patients experienced increases in body fat percentage, fat mass, waist size, and hip size (102, 106, 111-113). Goodwin et al. (114) reported that although waist size and hip size increased 1 year after diagnosis, WHR was unchanged over the same time period. However women may have already started chemotherapy at the time of baseline WHR measurement, biasing the association toward the null.

A sensitivity analysis was conducted in order to estimate the potential effect that WHR misclassification due to chemotherapy might have on the association between WHR and basal-like breast cancer. The sensitivity analysis was conducted using a publicly available probabilistic bias analysis program <https://sites.google.com/site/biasanalysis/> (115), which calculates a simulated data table of “true” classification based on the observed data table and estimated sensitivity and specificity of the classification. The CBCS lacks data on whether cases had started chemotherapy by the time of interview. Sensitivity and specificity ranges were estimated based on the stage and race distribution in CBCS basal-like

137

breast cancer cases and the prevalence of chemotherapy treatment by stage in North Carolina Central Cancer Registry data (116).

2.9.5 Outcome misclassification

Probabilistic sensitivity analyses were also conducted to evaluate the potential effects of molecular subtype misclassification. There has been some discussion in the literature as to the true definition of ‘basal-like’ breast cancer (117, 118). Not all studies use the same set of markers to define ‘basal-like’, and in studies that have used markers similar to those used by CBCS, there was not 100% agreement between tumors defined as basal-like using

microarray expression profiles and immunohistochemistry definitions (27, 119, 120).

Simulations of genotype and basal-like vs. luminal A associations were conducted, assuming non-differential misclassification of case status. Sensitivity and specificity ranges were based on previously published data (27, 119, 120). Sensitivity analyses were conducted using a publicly available program (115). All analyses were run for 5000 simulations.

2.10 Data interpretation

The results from this analysis were interpreted based on effect size, precision, and any trends or patterns in the data. The precision of the effect estimates were measured by

calculating the confidence interval ratio (CLR), which is equal to the upper 95% confidence limit divided by lower 95% confidence limit. A single CLR has relatively little meaning, but it can be useful for comparing several effect estimates to each other. A lower CLR indicates a more precise estimate. Null hypothesis testing was not used to draw conclusions about SNP or haplotype main effects.

138

cancer overall (all cases) were reported here. The basal-like subtype is of interest because it is largely uncharacterized, and is a unique type of hormone-receptor negative breast cancer. The luminal A subtype is of interest because luminal A is the most common subtype and therefore a logical point of reference. Also, the candidate genes under study and potential effect measure modifiers were selected based on risk factors for these two subtypes. The luminal A and basal-like subtypes were also the two subtypes with the largest sample size. Even though parameters were estimated for luminal B, HER2+/ER-, and unclassified subtypes in the polytomous model, the associations were not reported due to limited sample size and imprecise OR estimates.

139 2.11 References

1. Familial breast cancer: collaborative reanalysis of individual data from 52

epidemiological studies including 58,209 women with breast cancer and 101,986 women without the disease. Lancet 2001;358(9291):1389-99.

2. Balmain A, Gray J, Ponder B. The genetics and genomics of cancer. Nat Genet 2003;33 Suppl:238-44.

3. Walsh T, King MC. Ten genes for inherited breast cancer. Cancer Cell 2007;11(2):103-5.

4. Malone KE, Daling JR, Doody DR, Hsu L, Bernstein L, Coates RJ, et al. Prevalence and Predictors of BRCA1 and BRCA2 Mutations in a Population-Based Study of Breast Cancer in White and Black American Women Ages 35 to 64 Years. Cancer Res 2006;66(16):8297-308.

5. Pharoah PD, Antoniou A, Bobrow M, Zimmern RL, Easton DF, Ponder BA.

Polygenic susceptibility to breast cancer and implications for prevention. Nat Genet 2002;31(1):33-6.

6. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. Molecular portraits of human breast tumours. Nature 2000;406(6797):747-52.

7. Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, et al. Gene

expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 2001;98(19):10869-74.

8. Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 2003;100(14):8418-23.

9. Millikan RC, Newman B, Tse CK, Moorman PG, Conway K, Dressler LG, et al. Epidemiology of basal-like breast cancer. Breast Cancer Res Treat 2008;109(1):123- 39.

10. Yang XR, Sherman ME, Rimm DL, Lissowska J, Brinton LA, Peplonska B, et al. Differences in risk factors for breast cancer molecular subtypes in a population-based study. Cancer Epidemiol Biomarkers Prev 2007;16(3):439-43.

11. Nordgard SH, Johansen FE, Alnaes GI, Naume B, Borresen-Dale AL, Kristensen VN. Genes harbouring susceptibility SNPs are differentially expressed in the breast cancer subtypes. Breast Cancer Res 2007;9(6):113.

12. Kristensen VN, Borresen-Dale AL. SNPs associated with molecular subtypes of breast cancer: on the usefulness of stratified Genome-wide Association Studies

140

(GWAS) in the identification of novel susceptibility loci. Mol Oncol 2008;2(1):12-5. 13. Carey LA, Perou CM, Livasy CA, Dressler LG, Cowan D, Conway K, et al. Race,

breast cancer subtypes, and survival in the Carolina Breast Cancer Study. Jama 2006;295(21):2492-502.

14. Hankinson S, Hunter D. Breast Cancer. In: Adami H-O, Hunter D, Trichopoulos D, editors. Textbook of Cancer Epidemiology. New York: Oxford University Press, Inc.; 2002. p. 301-339.

15. Kelsey JL, Gammon MD, John EM. Reproductive factors and breast cancer. Epidemiol Rev 1993;15(1):36-47.

16. Pike MC, Spicer DV, Dahmoush L, Press MF. Estrogens, progestogens, normal breast cell proliferation, and breast cancer risk. Epidemiol Rev 1993;15(1):17-35. 17. Calle EE, Kaaks R. Overweight, obesity and cancer: epidemiological evidence and

proposed mechanisms. Nat Rev Cancer 2004;4(8):579-91.

18. Ceschi M, Gutzwiller F, Moch H, Eichholzer M, Probst-Hensch NM. Epidemiology and pathophysiology of obesity as cause of cancer. Swiss Med Wkly 2007;137(3- 4):50-6.

19. Schaffler A, Scholmerich J, Buechler C. Mechanisms of disease: adipokines and breast cancer - endocrine and paracrine mechanisms that connect adiposity and breast cancer. Nat Clin Pract Endocrinol Metab 2007;3(4):345-54.

20. Bauer KR, Brown M, Cress RD, Parise CA, Caggiano V. Descriptive analysis of estrogen receptor (ER)-negative, progesterone receptor (PR)-negative, and HER2- negative invasive breast cancer, the so-called triple-negative phenotype: a population- based study from the California cancer Registry. Cancer 2007;109(9):1721-8.

21. Morris GJ, Naidu S, Topham AK, Guiles F, Xu Y, McCue P, et al. Differences in breast carcinoma characteristics in newly diagnosed African-American and Caucasian patients: a single-institution compilation compared with the National Cancer

Institute's Surveillance, Epidemiology, and End Results database. Cancer 2007;110(4):876-84.

22. Newman B, Moorman PG, Millikan R, Qaqish BF, Geradts J, Aldrich TE, et al. The Carolina Breast Cancer Study: integrating population-based epidemiology and molecular biology. Breast Cancer Res Treat 1995;35(1):51-60.

23. Millikan R, Eaton A, Worley K, Biscocho L, Hodgson E, Huang WY, et al. HER2 codon 655 polymorphism and risk of breast cancer in African Americans and whites. Breast Cancer Res Treat 2003;79(3):355-64.

141

24. Weinberg CR, Sandler DP. Randomized recruitment in case-control studies. Am J Epidemiol 1991;134(4):421-32.

25. Weinberg CR, Wacholder S. The design and analysis of case-control studies with biased sampling. Biometrics 1990;46(4):963-75.

26. Huang WY, Newman B, Millikan RC, Schell MJ, Hulka BS, Moorman PG.

Hormone-related factors and risk of breast cancer in relation to estrogen receptor and progesterone receptor status. Am J Epidemiol 2000;151(7):703-14.

27. Nielsen TO, Hsu FD, Jensen K, Cheang M, Karaca G, Hu Z, et al.

Immunohistochemical and clinical characterization of the basal-like subtype of invasive breast carcinoma. Clin Cancer Res 2004;10(16):5367-74.

28. Livasy CA, Perou CM, Karaca G, Cowan DW, Maia D, Jackson S, et al.

Identification of a basal-like subtype of breast ductal carcinoma in situ. Hum Pathol 2007;38(2):197-204.

29. Thomas DC. Statistical Methods in Genetic Epidemiology. New York: Oxford University Press; 2004.

30. PubMed. [Database] [cited 2007; Available from: www.pubmed.gov

31. Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, et al. Haplotype tagging for the identification of common disease genes. Nat Genet 2001;29(2):233-7.

32. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature

2007;449(7164):851-61.

33. Parra EJ, Marcini A, Akey J, Martinson J, Batzer MA, Cooper R, et al. Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet 1998;63(6):1839-51.

34. Xu S, Huang W, Wang H, He Y, Wang Y, Wang Y, et al. Dissecting linkage

disequilibrium in African-American genomes: roles of markers and individuals. Mol Biol Evol 2007;24(9):2049-58.

35. The International HapMap Project. Nature 2003;426(6968):789-96.

36. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001;29(1):308-11. 37. de Bakker PI, Yelensky R, Pe'er I, Gabriel SB, Daly MJ, Altshuler D. Efficiency and

142

38. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005;21(2):263-5.

39. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 2004;74(1):106-20. 40. Devlin B, Risch N. A comparison of linkage disequilibrium measures for fine-scale

mapping. Genomics 1995;29(2):311-22.

41. Gu S, Pakstis AJ, Li H, Speed WC, Kidd JR, Kidd KK. Significant variation in haplotype block structure but conservation in tagSNP patterns among global populations. Eur J Hum Genet 2007;15(3):302-12.

42. de Bakker PI, Burtt NP, Graham RR, Guiducci C, Yelensky R, Drake JA, et al. Transferability of tag SNPs in genetic association studies in multiple populations. Nat Genet 2006;38(11):1298-303.

43. Haiman CA, Dossus L, Setiawan VW, Stram DO, Dunning AM, Thomas G, et al. Genetic variation at the CYP19A1 locus predicts circulating estrogen levels but not breast cancer risk in postmenopausal women. Cancer Res 2007;67(5):1893-7. 44. Haiman CA, Stram DO, Pike MC, Kolonel LN, Burtt NP, Altshuler D, et al. A

comprehensive haplotype analysis of CYP19 and breast cancer risk: the Multiethnic Cohort. Hum Mol Genet 2003;12(20):2679-92.

45. Tian C, Hinds DA, Shigeta R, Kittles R, Ballinger DG, Seldin MF. A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am J Hum Genet 2006;79(4):640-9.

46. Barnholtz-Sloan JS, McEvoy B, Shriver MD, Rebbeck TR. Ancestry estimation and correction for population stratification in molecular epidemiologic association studies. Cancer Epidemiol Biomarkers Prev 2008;17(3):471-7.

47. Illumina, Inc. [cited; Available from: http://www.illumina.com/

48. Shen R, Fan JB, Campbell D, Chang W, Chen J, Doucet D, et al. High-throughput SNP genotyping on universal bead arrays. Mutat Res 2005;573(1-2):70-82.

49. Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet 2006;7(10):781-91.

50. Hosking L, Lumsden S, Lewis K, Yeo A, McCarthy L, Bansal A, et al. Detection of genotyping errors by Hardy-Weinberg equilibrium testing. Eur J Hum Genet

143

51. Wigginton JE, Cutler DJ, Abecasis GR. A note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet 2005;76(5):887-93.

52. Guo SW, Thompson EA. Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics 1992;48(2):361-72.

53. Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles RA, Clayton DG, et al. Control of confounding of genetic associations in stratified populations. Am J Hum Genet 2003;72(6):1492-1504.

54. Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theor Popul Biol 2001;60(3):227-37.

55. Aldrich MC, Selvin S, Hansen HM, Barcellos LF, Wrensch MR, Sison JD, et al. Comparison of statistical methods for estimating genetic admixture in a lung cancer study of African Americans and Latinos. Am J Epidemiol 2008;168(9):1035-46. 56. Barnholtz-Sloan JS, Chakraborty R, Sellers TA, Schwartz AG. Examining population

stratification via individual ancestry estimates versus self-reported race. Cancer Epidemiol Biomarkers Prev 2005;14(6):1545-51.

57. Chakraborty R, Kamboh MI, Nwankwo M, Ferrell RE. Caucasian genes in American blacks: new data. Am J Hum Genet 1992;50(1):145-55.

58. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 2003;164(4):1567-87.

59. Shriver MD, Parra EJ, Dios S, Bonilla C, Norton H, Jovel C, et al. Skin pigmentation, biogeographical ancestry and admixture mapping. Hum Genet 2003;112(4):387-99. 60. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal

components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38(8):904-9.

61. Pritchard JK, Rosenberg NA. Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 1999;65(1):220-8.

62. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics 2000;155(2):945-59.

63. Tsai HJ, Choudhry S, Naqvi M, Rodriguez-Cintron W, Burchard EG, Ziv E. Comparison of three methods to estimate genetic ancestry and control for

stratification in genetic association studies among admixed populations. Hum Genet 2005;118(3-4):424-33.

144

64. Devlin B, Roeder K. Genomic control for association studies. Biometrics 1999;55(4):997-1004.

65. Ziv E, Burchard EG. Human population structure and genetic association studies. Pharmacogenomics 2003;4(4):431-41.

66. Marchini J, Cardon LR, Phillips MS, Donnelly P. The effects of human population

In document Single nucleotide polymorphisms and the etiology of basal-like and luminal A breast cancer : a pathway-based approach (Page 150-175)