Statistical analysis - Study population

CHAPTER 3: RESEARCH STRATEGY

3.2 Study population

3.3.5 Statistical analysis

We will first examine the distribution of each measure within the study population by comparing medians using Kruskal-Wallis test and comparing distributions of categorical measures using Chi-square tests. We will also examine the correlation of measures by calculating Pearson correlation coefficient for continuous measures and plotting boxplots with median and interquartile range for categorical vs continuous measure.

Each continuous quantitative measure (ER percent positivity, ESR1 RNA, and Rao’s quadratic entropy) will be evaluated separately as a continuous predictor of survival using Cox proportional hazards models (CPHM). First, the proportional hazards assumption will be assessed by testing for a non-zero slope in a generalized linear regression of the scaled Schoenfeld residuals on functions of time and graphical diagnostic of residuals over time [117]. For categorical ER-related measures (PAM50 intrinsic subtype and high, medium, low ROR-PT) proportional hazards assumption will be tested using Kaplan Meier curves for each category of predictor and visually assessing whether the curves are parallel. Cox proportional hazards models will then be used to estimate multivariate hazard ratios (HR) and 95% confidence intervals as a measure of association between each continuous and categorical quantitative ER measure and survival adjusting for relevant clinical characteristics (race, age at diagnosis, stage, lymph node status, treatment regimen, time to first treatment course).

We will assess whether the association between ER and survival varies as recommended by Knol et al.115_{First, we will estimate the association between ER measure and recurrence in strata defined} by race and ER status, with white women with low-risk ER feature as the single referent category. We will

also report race stratified estimates separately for black and white women. Next we will test for

significance of an interaction term of race*ER status in minimally and fully adjusted models using Wald test. Finally, we will calculate the Relative Risk Due to Interaction (RERI) which is HR(Black, High risk ER status) – HR(Black, Low risk ER status) – HR (White, High risk ER status) + 1.

We assessed for selection bias by comparing the demographic and clinical covariate distributions of women who have ER quantity and/or gene expression data available to those who have entirely missing ER biomarker data using Chi-square tests (Table 3.4 below). We found that women with entirely missing ER data tended to have smaller tumors (p=0.0012). This is not unexpected, as sufficient tumor material must be available in order to perform IHC and measure gene expression profiling. However, this selection bias would occur outside of CBCS as well, such that any woman with a tumor too small to assess with standard pathology or gene expression assay would be clinically managed without ER information. Thus, this selection bias is not a threat to the generalizability of our results. Other than tumor size, the final study cohort is not significantly different from the women excluded due to missing ER data.

Table 3.4. Selection bias among eligible CBCS3 participants with no ER data and participants with ER quantity and/or gene expression available

Characteristic Complete case

(n=1736) Missing data (n=304) P-value Age 0.2326 < 50 491 (50) 156 (51) ≥ 50 496 (50) 148 (49) Race 0.2766 White 543 (55) 180 (60) Black 444 (45) 124 (40) Grade 0.5932 1 222 (22) 77 (25) 2 476 (48) 149 (49) 3 289 (29) 78 (26) Node status 0.6377 Negative 582 (60) 194 (64) Positive 405 (41) 110 (36) Size 0.0012 < 2 cm 523 (53) 182 (60) 2 – 5 cm 385 (39) 81 (27) ≥ 5 cm 79 (8) 41 (13) 3.3.6 Power calculations

To assessfor differences in survival by Cox proportional HRs, we have 80% power to detect a HR of 0.75. This calculation assumes an exponential HR of 0.09, which was estimated by calculating the

80% for ER negative disease (n=833) [69]. It also assumes no loss to follow-up and group sizes of 600 (2,998 divided into quintiles of ER expression). Prior studies that have reported differences in survival between low ER and high ER tumors have reported HRs ranging from 0.6 – 0.81 [78, 79]. To detect difference in survival by quintiles of quantitative ER by race, we have 80% power to detect a HR of 0.65. A prior study of survival by quintiles of ER percent positivity by semi-quantitative IHC by race showed HRs ranging from 0.28 to 0.81 [78]. Power calculations were performed in SAS 9.4 (SAS Institute Inc., Cary, NC).

3.3.7 Preliminary studies

We have recently examined the ESR1 gene expression levels of ER subtypes defined clinically by IHC and found that even among ER positive tumors, a wide range of ESR1 expression values exist, with some tumors showing ESR1 levels commensurate with ER negative tumors (Figure 3.1 below). The significance of these ESR1 values in survival outcomes has yet to be determined, though these results confirm that ESR1 offers a more granular view of ER status than IHC classification.

Figure 3.1. ESR1 mRNA expression in immunohistochemistry-defined groups. Text on plot indicates median (interquartile range).

3.4 Aim 2

3.4.1 Data collection

To accomplish Aim 2, we will examine the association between high risk molecular features, race, and breast cancer recurrence using inverse probability weighted (IPW) Cox proportional hazards

regression and IPW survival curves. Clinical and treatment data, gene expression profiling, and quantitative ER have already been obtained for cases in CBCS3.

3.4.2 Exposure assessment

For Aim 2, our exposure will be race, and race jointly stratified by molecular markers of high-risk tumor subtypes. We will compare Luminal B to Luminal A tumors, high ROR-PT to low/medium ROR-PT, and low quantitative ER to high quantitative ER stratified among black and white women.

Table 3.5. Aim 2 exposure distribution among study population

3.4.3 Outcome assessment

For Aim 2, the outcome assessed will be breast cancer recurrence defined as the time from diagnosis of first primary breast cancer to subsequent recurrent breast cancer. Enrollment concluded in 2013 for CBCS3 and follow-up is ongoing. Recurrence dates were ascertained via medical record review and follow up with study participants.

3.4.4 Covariate assessment

For Aim 2, we are interested in variables that may confound, predict, or modify the association between molecular features, race, and recurrence. These include age, tumor grade, node status, and tumor size. For parsimoniously adjusted models we will consider stage a proxy for node status, size, and grade. Variables are described in detail in Table 3.6.

Table 3.6. Aim 2 covariate descriptions

Study variable Role Measurement Classification Range Age Confounder Interview self-

report

Binary 0/1

Grade Mediator Pathology report Categorical 1 – 3 Tumor size Mediator Medical record Binary 1 = < 2 cm

2 = ≥ 2 cm Node status Mediator Medical record Binary 0 = negative

1 = positive Stage Mediator Medical record Categorical I - III

Aim 2 Exposure White (n=856) Black (n=658)

Luminal A 348 (74) 221 (59) Luminal B 83 (18) 94 (25) Low/Med ROR-PT 442 (92) 315 (83) High ROR-PT 40 (8) 66 (17) High quantitative ER 690 (81) 494 (75) Low quantitative ER 166 (19) 164 (25)

3.4.5 Statistical analysis

We will use linear binomial regression to calculate relative frequency differences (RFD) and 95% confidence intervals (95% CI) as the measure of association between race and clinical covariates, adjusting for age at diagnosis.

Inverse probability weighted (IPW) Cox proportional hazards models will be used to estimate standardized hazard ratios and 95% CI for the effect of race on disease-free interval, overall and within strata defined by molecular features. 116, 117_{IPW are advantageous in that allow for adjustment for} baseline characteristics but do not require the proportional hazards assumption to be met within all cross- strata of covariates. IPW also allows for the calculation of standardized survival curves similar to Kaplan- Meier curves.

To estimate the effect of race on disease-free interval, we will use logistic regression models to calculate the probability of belonging to each group (black vs white) accounting for baseline clinical covariates. Model 1 will adjus for age at diagnosis (continuous linear). Model 2 will additionally adjustfor grade (1, 2, or 3), node status (positive vs negative), and tumor size (≤ 2 cm vs > 2cm). To estimate the effect of race jointly stratified by molecular feature, we will use multinomial logistic regression models to calculate the probability of belonging to each of the four jointly stratified group (white/low risk, black/low risk, white/high risk, black/high risk) adjusting for age (continuous linear) and stage at diagnosis (I, II, or III). These probabilities will be used to calculate stabilized inverse probability of exposure weights to estimate standardized hazard ratios, calculate 5-year disease-free interval probability, and plot

standardized survival curves. We will assess the mean and range for all weights, will optimal mean of 1 and reasonable range within 10 - 20. The proportional hazards assumption of IPW Cox models will be assessed graphically by examining the inverse probability-weighted log cumulative hazard function estimates and by Wald test of a product term between exposure group and time.

We will examine treatment covariates among grade 3 tumors, as grade is a widely assessed tumor marker and correlates strongly with high-risk molecular subgroups.101, 118_{We will assess}

chemotherapy receipt, endocrine therapy receipt, treatment delay defined as >30 days to first treatment, and health insurance status at interview. Differences will be evaluated using chi-square test and RFDs

adjusted for stage. We will perform sensitivity analyses to assess treatment differences among tumors with low quantitative ER status and high ROR-PT score.

3.4.6 Power calculations

See Aim 1 for power calculations for survival analyses.

3.5 Aim 3

3.5.1 Data collection

We will use exposure data collected from in-person interview and medical records and

quantitative ER measures collated as described for Aims 1 and 2.All exposure data has been previously collected and outcomes will be measured as in Aims 1 and 2, so no new data collection will be necessary for Aim 3.

3.5.2 Exposure assessment

For Aim 3, exposures were selected on the basis of demonstrated association with risk of breast cancer and contemporaneity to tumor progression. These data were self-reported during in-person nurse interviews. Rapid case ascertainment resulted in an average time from diagnosis to interview of 6 months [119]. All of these exposures demonstrate at least moderate validity when assessed via self-report [121- 123]. BMI was assessed by the nurse administering in-person interview, which ensured accurate report. Exposures will be categorized within the study population as indicated in Table 3.7.

Table 3.7. Aim 3 exposure description.

3.5.3 Outcome assessment

For Aim 3, the outcome assessed will be protein- and RNA-based quantitative ER measures as detailed above in exposures for Aims 1 and 2. We will assess continuous measures of ER percent positivity, ESR1 gene expression and ER intratumoral heterogeneity. The former two measures will also be dichotomized to low versus high, with low category representing the lowest quartile of expression in the study population. ER intratumoral heterogeneity will be dichotomized as high (top tertile of study population) versus low. ROR-PT score will be dichotomized to high (score > 64.71) versus low/medium per established protocol. 101_{Oncotype Dx Recurrence Score will be dichotomized to high (score > 25)} versus low/medium per established recommendations. 119

3.5.4 Covariate assessment

For Aim 3, the association between individual exposures and quantitative ER will be assessed adjusting for age at diagnosis, race, node status, and stage at diagnosis.

Exposure Measurement Classification Expected

Distribution Validity Hormone therapy use Interview self-report Ever/Never

Recency of use (<5 yrs vs ≥ 5 yrs) Duration of use (<5 years, 5 – 10

years, >10 years)

Ever use (among postmenopausal) = 45% [123] Oral contraceptive use Interview self-report Ever/Never

Recency of use (<5 yrs vs ≥ 5 yrs) Duration of use (<5 years, 5 – 10

years, >10 years)

Ever use = 81% [123]

Alcohol use Interview self-report

Ever/never < 1 year daily use 1 – 5 years daily use

6 – 10 years daily use > 10 years daily use

Ever use = 80% 82% 10% 4% 5% [121] Cigarette smoking Interview self-report Current Former Never ≤ 10 years duration 11 – 20 years > 20 years < ½ pack daily dose ½ - 1 pack daily dose

> 1 pack daily dose

18% current 27% former 56% never 13% 11% 22% 17% 20% 10% [122]

3.5.5 Statistical analysis

To evaluate the association of contemporaneous exposures with quantitative measure of ER we will use linear binomial regression to calculate relative frequency differences (RFD) and 95% confidence intervals (95% CI) as the measure of association between binary measures and exposure categories, general linear regression will be used for continuous measures. Estimates will be adjusted for age (continuous linear), race (black vs white), stage (I, II, III, or IV) and node status (positive vs negative). Associations with HT use will be restricted to post-menopausal women only.

3.5.6 Power calculations

To assess the correlation between categorical exposures and continuous measures of quantitative ER, we are 80% powered to detect a difference in means of 0.3 at alpha=0.05 with a

minimum group size of 174. Our prior study of ESR1 expression and smoking status showed a difference in mean ESR1 expression of 0.3 between never and ever smokers [18].

3.5.7 Preliminary studies

We have examined the impact of exogenous hormone use on breast tumor subtype as indicated by Oncotype Dx Recurrence Score (RS) and found that among postmenopausal women, HT users have a significantly higher frequency of intermediate RS tumors relative to low RS tumors [abstract presented at the International Society for Pharmacoepidemiology Annual Conference]. This suggests that HT use may modify tumor gene expression profiles in a clinically relevant manner. Further investigation of tumor molecular features of HT users will help clarify the mechanisms by which HT influences tumor biology and help determine tumor phenotypes which are sensitive to hormone exposure.

In document Benefield_unc_0153D_18715.pdf (Page 41-48)