2.4 Simulation study
2.4.3 Data simulated under different configurations
Sensitivity analyses are conducted in order to further elucidate the statistical properties of
the four methods for the study of etiologic heterogeneity.
First, the sensitivity of the results in Sections 2.4.1 and 2.4.2 to the prevalence of the
risk factor are explored. Additional simulations were conducted for the case of four disease
subtypes and a single binary risk factor, with data generated as described at the beginning
of Section 2.4 using 1000 controls and 1000 cases. For each setting, 1000 simulated data
sets are generated. Here settings where the risk factor prevalence is q = 0.3 or q = 0.6 are
common regression coefficients (β11 = β12 = β13 = β14) are each fixed at 0, 0.1, 0.2, 0.3,
and 0.4. Results are presented in Table 2.4, with a similar pattern of results to the null
case presented in Table 2.3, which corresponds to risk factor prevalence q = 0.3 and true
common regression coefficients fixed at 0.1. Polytomous logistic regression and the methods
of Chatterjee (2004) and Wang et al. (2015) perform similarly with respect to type I error for
the test of H0β whereas the method of Rosner et al. (2013) is anti-conservative. Polytomous
logistic regression and the method of Chatterjee (2004) perform similarly with respect to
type I error for the tests of H0γ11 and H0γ12 whereas the method of Wang et al. (2015) is
conservative and the method of Rosner et al. (2013) is again anti-conservative. Data are
next generated under the alternative hypothesis. For each of the two risk factor prevalences,
three alternative scenarios are investigated, with true values for {β11, β12, β13} fixed at
{0.2, 0.25, 0.25}, {0.2, 0.3, 0.3}, and {0.2, 0.4, 0.4} and values of β14 ranging from 0.25 to
0.85, 0.3 to 0.9 and 0.4 to 1.0, respectively. Power was calibrated for all results as described
in Section 2.4.2. Results are presented in Figure 2.2 for β14 and Figure 2.3 for γ1k. In
all configurations of parametric values and risk factor prevalences, the pattern of results
is in line with those presented in Figure 1, such that all methods have similar power after
calibration for differences in type I error.
Next, the methods other than polymotomous logistic regression were created to accom-
modate multiple tumor factors, and thus have the capacity to take advantage of dimension
reduction. A limited exploration of the expansion of the number of tumor markers to K = 4
was conducted, whereby there are M = 16 subtypes that must be evaluated separately in
the logistic regression model. Data are generated for sixteen disease subtypes formed by
cross-classification of four binary tumor markers as described at the start of Section 2.4.
Figure 2.2: Log odds ratio required to achieve various levels of power when type I error is calibrated to α = 0.05 to address whether risk factor effects differ across M = 4 disease subtypes
Figure 2.3: Log odds ratio required to achieve various levels of power when type I error is calibrated to α = 0.05 to address whether risk factor effects differ across each of the K = 2 individual tumor markers that form M = 4 disease subtypes
is q = 0.3 or q = 0.6 are separately investigated. For each simulation setting, 500 simulated
data sets were generated using 1008 controls and 1008 cases to allow for equal subdivision of
cases into M = 16 subtypes. Data are first generated under the null hypothesis, and the true
common regression coefficients (β11 = β12 = · · · = β1(16)) are each fixed at 0.05, 0.1, 0.15,
and 0.2. Results are presented in Table 2.5. A similar pattern of results as in the case of
M = 4 subtypes was seen. Data are next generated under the alternative hypothesis. For
each of the two risk factor prevalences, three alternative scenarios are investigated, with
true values β1m= {0.2, 0.4, 0.4, 0.4, 0.4, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.8, 0.8, 0.8, 0.8, 1.0}, β1m=
{0.2, 0.4, 0.4, 0.4, 0.4, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 1.2, 1.2, 1.2, 1.2, 1.2}, and β1m = {0.2, 0.4, 0.4,
0.4, 0.4, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 1.2, 1.2, 1.2, 1.2, 1.4}. Results are presented in Table 2.6.
Even as the number of subtypes increases to sixteen, a similar pattern of results is seen as
in the setting of four disease subtypes.
Finally, to investigate the setting where more than one risk factor is included, a data
example is conducted using the same data and subtypes described in Section 2.3 and in-
corporating a variety of continuous and binary risk factors of relevance to breast cancer
risk (Begg et al., 2013). Results are presented in Tables 2.7 and 2.8. The interpretation of
each risk factor must now be made in the context of adjustment for all other risk factors.
Across all risk factors, for the question of whether risk factor effects differ across disease
subtypes, polytomous logistic regression and the methods of Chatterjee (2004) and Wang
et al. (2015) result in similar parameter estimates and p-values whereas results from the
method of Rosner et al. (2013) differ slightly from the other methods. It is of interest to note
that in the context of a multivariable data analysis, the effect of oral contraceptive use is no
longer significantly different across disease subtypes (Table 2.7) whereas in the simplified
Table 2.5: Type I error for different risk factor prevalences q and true effect sizes β1mwith
M = 16 disease subtypes formed by K = 4 individual tumor markers
q 0.3 0.6
β1m 0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2
Does the risk factor effect differ with respect to subtypes?
Polytomous1 0.052 0.050 0.038 0.034 0.036 0.028 0.030 0.044 Wang2 0.052 0.050 0.038 0.034 0.036 0.028 0.030 0.044 Chatterjee2 0.052 0.050 0.038 0.034 0.036 0.028 0.030 0.044 Rosner4 0.060 0.058 0.050 0.046 0.048 0.036 0.046 0.052
Does the risk factor effect differ with respect to tumor marker 1 (γ11)?
Polytomous1 0.060 0.054 0.040 0.032 0.050 0.054 0.040 0.042 Wang2 0.050 0.042 0.032 0.032 0.042 0.032 0.028 0.034
Chatterjee3 0.058 0.052 0.040 0.032 0.048 0.050 0.042 0.040 Rosner4 0.060 0.058 0.046 0.034 0.054 0.056 0.042 0.040
Does the risk factor effect differ with respect to tumor marker 2 (γ12)?
Polytomous1 0.044 0.052 0.050 0.042 0.050 0.058 0.050 0.058 Wang2 0.036 0.042 0.040 0.032 0.036 0.032 0.038 0.038 Chatterjee3 0.044 0.050 0.048 0.040 0.046 0.054 0.048 0.054 Rosner4 0.044 0.052 0.050 0.040 0.052 0.060 0.048 0.056
Does the risk factor effect differ with respect to tumor marker 3 (γ13)?
Polytomous1 0.058 0.042 0.058 0.048 0.046 0.048 0.044 0.042 Wang2 0.044 0.040 0.042 0.038 0.038 0.036 0.032 0.030 Chatterjee3 0.058 0.042 0.056 0.044 0.042 0.048 0.042 0.038 Rosner4 0.058 0.044 0.060 0.048 0.048 0.054 0.046 0.044
Does the risk factor effect differ with respect to tumor marker 4 (γ14)?
Polytomous1 0.058 0.042 0.058 0.048 0.046 0.048 0.044 0.042
Wang2 0.044 0.040 0.042 0.038 0.038 0.036 0.032 0.030 Chatterjee3 0.058 0.042 0.056 0.044 0.042 0.048 0.042 0.038 Rosner4 0.058 0.044 0.060 0.048 0.048 0.054 0.046 0.044
1Polytomous logistic regression
2Two-stage meta-regression (Wang et al., 2015)
3Two-stage regression with simultaneous estimation (Chatterjee, 2004) 4Stratified logistic regression (Rosner et al., 2013)
Table 2.6: Power for different risk factor prevalences q and different alternative hypothesis scenarios with M = 16 disease subtypes formed by K = 4 individual tumor markers
q 0.3 0.6
Alternative scenario* 1 2 3 1 2 3
Does the risk factor effect differ with respect to subtypes? Polytomous1 0.332 0.834 0.872 0.260 0.750 0.800 Wang1 0.332 0.834 0.872 0.260 0.750 0.800 Chatterjee1 0.332 0.834 0.872 0.262 0.748 0.798 Rosner1 0.356 0.854 0.880 0.278 0.780 0.816 Does the risk factor effect differ with respect to tumor marker 1 (γ11)?
Polytomous1 0.308 0.596 0.644 0.284 0.538 0.590 Wang2 0.270 0.568 0.638 0.246 0.500 0.558 Chatterjee3 0.296 0.614 0.674 0.274 0.564 0.608
Rosner4 0.308 0.620 0.682 0.280 0.574 0.622 Does the risk factor effect differ with respect to tumor marker 2 (γ12)?
Polytomous1 0.312 0.592 0.646 0.270 0.552 0.602 Wang2 0.280 0.564 0.636 0.232 0.530 0.576 Chatterjee3 0.312 0.614 0.678 0.266 0.564 0.610 Rosner4 0.326 0.630 0.682 0.270 0.574 0.620 Does the risk factor effect differ with respect to tumor marker 3 (γ13)?
Polytomous1 0.298 0.602 0.648 0.300 0.552 0.606 Wang2 0.262 0.562 0.620 0.242 0.502 0.554 Chatterjee3 0.294 0.606 0.654 0.276 0.568 0.606 Rosner4 0.300 0.616 0.666 0.286 0.576 0.618 Does the risk factor effect differ with respect to tumor marker 4 (γ14)?
Polytomous1 0.290 0.610 0.660 0.242 0.540 0.590 Wang2 0.264 0.574 0.634 0.210 0.506 0.560 Chatterjee3 0.288 0.632 0.672 0.238 0.570 0.622 Rosner4 0.296 0.638 0.676 0.250 0.586 0.630 *Alternative scenarios: 1: β1m= {0.2, 0.4, 0.4, 0.4, 0.4, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.8, 0.8, 0.8, 0.8, 1.0} 2: β1m= {0.2, 0.4, 0.4, 0.4, 0.4, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 1.2, 1.2, 1.2, 1.2, 1.2} 3: β1m= {0.2, 0.4, 0.4, 0.4, 0.4, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 1.2, 1.2, 1.2, 1.2, 1.4} 1Polytomous logistic regression
2Two-stage meta-regression (Wang et al., 2015)
3Two-stage regression with simultaneous estimation (Chatterjee, 2004) 4Stratified logistic regression (Rosner et al., 2013)
the effect was significantly different across disease subtypes according to all subtypes.