Data simulated under different configurations

2.4 Simulation study

2.4.3 Data simulated under different configurations

Sensitivity analyses are conducted in order to further elucidate the statistical properties of

the four methods for the study of etiologic heterogeneity.

First, the sensitivity of the results in Sections 2.4.1 and 2.4.2 to the prevalence of the

risk factor are explored. Additional simulations were conducted for the case of four disease

subtypes and a single binary risk factor, with data generated as described at the beginning

of Section 2.4 using 1000 controls and 1000 cases. For each setting, 1000 simulated data

sets are generated. Here settings where the risk factor prevalence is q = 0.3 or q = 0.6 are

common regression coefficients (β11 = β12 = β13 = β14) are each fixed at 0, 0.1, 0.2, 0.3,

and 0.4. Results are presented in Table 2.4, with a similar pattern of results to the null

case presented in Table 2.3, which corresponds to risk factor prevalence q = 0.3 and true

common regression coefficients fixed at 0.1. Polytomous logistic regression and the methods

of Chatterjee (2004) and Wang et al. (2015) perform similarly with respect to type I error for

the test of H0β whereas the method of Rosner et al. (2013) is anti-conservative. Polytomous

logistic regression and the method of Chatterjee (2004) perform similarly with respect to

type I error for the tests of H0_γ11 and H0_γ12 whereas the method of Wang et al. (2015) is

conservative and the method of Rosner et al. (2013) is again anti-conservative. Data are

next generated under the alternative hypothesis. For each of the two risk factor prevalences,

three alternative scenarios are investigated, with true values for {β11, β12, β13} fixed at

{0.2, 0.25, 0.25}, {0.2, 0.3, 0.3}, and {0.2, 0.4, 0.4} and values of β₁₄ ranging from 0.25 to

0.85, 0.3 to 0.9 and 0.4 to 1.0, respectively. Power was calibrated for all results as described

in Section 2.4.2. Results are presented in Figure 2.2 for β14 and Figure 2.3 for γ1k. In

all configurations of parametric values and risk factor prevalences, the pattern of results

is in line with those presented in Figure 1, such that all methods have similar power after

calibration for differences in type I error.

Next, the methods other than polymotomous logistic regression were created to accom-

modate multiple tumor factors, and thus have the capacity to take advantage of dimension

reduction. A limited exploration of the expansion of the number of tumor markers to K = 4

was conducted, whereby there are M = 16 subtypes that must be evaluated separately in

the logistic regression model. Data are generated for sixteen disease subtypes formed by

cross-classification of four binary tumor markers as described at the start of Section 2.4.

Figure 2.2: Log odds ratio required to achieve various levels of power when type I error is calibrated to α = 0.05 to address whether risk factor effects differ across M = 4 disease subtypes

Figure 2.3: Log odds ratio required to achieve various levels of power when type I error is calibrated to α = 0.05 to address whether risk factor effects differ across each of the K = 2 individual tumor markers that form M = 4 disease subtypes

is q = 0.3 or q = 0.6 are separately investigated. For each simulation setting, 500 simulated

data sets were generated using 1008 controls and 1008 cases to allow for equal subdivision of

cases into M = 16 subtypes. Data are first generated under the null hypothesis, and the true

common regression coefficients (β11 = β12 = · · · = β1(16)) are each fixed at 0.05, 0.1, 0.15,

and 0.2. Results are presented in Table 2.5. A similar pattern of results as in the case of

M = 4 subtypes was seen. Data are next generated under the alternative hypothesis. For

each of the two risk factor prevalences, three alternative scenarios are investigated, with

true values β1m= {0.2, 0.4, 0.4, 0.4, 0.4, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.8, 0.8, 0.8, 0.8, 1.0}, β1m=

{0.2, 0.4, 0.4, 0.4, 0.4, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 1.2, 1.2, 1.2, 1.2, 1.2}, and β1m = {0.2, 0.4, 0.4,

0.4, 0.4, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 1.2, 1.2, 1.2, 1.2, 1.4}. Results are presented in Table 2.6.

Even as the number of subtypes increases to sixteen, a similar pattern of results is seen as

in the setting of four disease subtypes.

Finally, to investigate the setting where more than one risk factor is included, a data

example is conducted using the same data and subtypes described in Section 2.3 and in-

corporating a variety of continuous and binary risk factors of relevance to breast cancer

risk (Begg et al., 2013). Results are presented in Tables 2.7 and 2.8. The interpretation of

each risk factor must now be made in the context of adjustment for all other risk factors.

Across all risk factors, for the question of whether risk factor effects differ across disease

subtypes, polytomous logistic regression and the methods of Chatterjee (2004) and Wang

et al. (2015) result in similar parameter estimates and p-values whereas results from the

method of Rosner et al. (2013) differ slightly from the other methods. It is of interest to note

that in the context of a multivariable data analysis, the effect of oral contraceptive use is no

longer significantly different across disease subtypes (Table 2.7) whereas in the simplified

Table 2.5: Type I error for different risk factor prevalences q and true effect sizes β1mwith

M = 16 disease subtypes formed by K = 4 individual tumor markers

q 0.3 0.6

β1m 0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2

Does the risk factor effect differ with respect to subtypes?

Polytomous1 0.052 0.050 0.038 0.034 0.036 0.028 0.030 0.044 Wang2 0.052 0.050 0.038 0.034 0.036 0.028 0.030 0.044 Chatterjee2 0.052 0.050 0.038 0.034 0.036 0.028 0.030 0.044 Rosner4 0.060 0.058 0.050 0.046 0.048 0.036 0.046 0.052

Does the risk factor effect differ with respect to tumor marker 1 (γ11)?

Polytomous1 0.060 0.054 0.040 0.032 0.050 0.054 0.040 0.042 Wang2 _0.050 _0.042 _0.032 _0.032 _0.042 _0.032 _0.028 _0.034

Chatterjee3 0.058 0.052 0.040 0.032 0.048 0.050 0.042 0.040 Rosner4 0.060 0.058 0.046 0.034 0.054 0.056 0.042 0.040

Does the risk factor effect differ with respect to tumor marker 2 (γ12)?

Polytomous1 0.044 0.052 0.050 0.042 0.050 0.058 0.050 0.058 Wang2 0.036 0.042 0.040 0.032 0.036 0.032 0.038 0.038 Chatterjee3 0.044 0.050 0.048 0.040 0.046 0.054 0.048 0.054 Rosner4 0.044 0.052 0.050 0.040 0.052 0.060 0.048 0.056

Does the risk factor effect differ with respect to tumor marker 3 (γ13)?

Polytomous1 0.058 0.042 0.058 0.048 0.046 0.048 0.044 0.042 Wang2 0.044 0.040 0.042 0.038 0.038 0.036 0.032 0.030 Chatterjee3 0.058 0.042 0.056 0.044 0.042 0.048 0.042 0.038 Rosner4 _0.058 _0.044 _0.060 _0.048 _0.048 _0.054 _0.046 _0.044

Does the risk factor effect differ with respect to tumor marker 4 (γ14)?

Polytomous1 _0.058 _0.042 _0.058 _0.048 _0.046 _0.048 _0.044 _0.042

Wang2 0.044 0.040 0.042 0.038 0.038 0.036 0.032 0.030 Chatterjee3 0.058 0.042 0.056 0.044 0.042 0.048 0.042 0.038 Rosner4 0.058 0.044 0.060 0.048 0.048 0.054 0.046 0.044

1_{Polytomous logistic regression}

2_{Two-stage meta-regression (Wang et al., 2015)}

3_{Two-stage regression with simultaneous estimation (Chatterjee, 2004)} 4_{Stratified logistic regression (Rosner et al., 2013)}

Table 2.6: Power for different risk factor prevalences q and different alternative hypothesis scenarios with M = 16 disease subtypes formed by K = 4 individual tumor markers

q 0.3 0.6

Alternative scenario* 1 2 3 1 2 3

Does the risk factor effect differ with respect to subtypes? Polytomous1 0.332 0.834 0.872 0.260 0.750 0.800 Wang1 0.332 0.834 0.872 0.260 0.750 0.800 Chatterjee1 0.332 0.834 0.872 0.262 0.748 0.798 Rosner1 0.356 0.854 0.880 0.278 0.780 0.816 Does the risk factor effect differ with respect to tumor marker 1 (γ11)?

Polytomous1 0.308 0.596 0.644 0.284 0.538 0.590 Wang2 0.270 0.568 0.638 0.246 0.500 0.558 Chatterjee3 _0.296 _0.614 _0.674 _0.274 _0.564 _0.608

Rosner4 0.308 0.620 0.682 0.280 0.574 0.622 Does the risk factor effect differ with respect to tumor marker 2 (γ12)?

Polytomous1 0.312 0.592 0.646 0.270 0.552 0.602 Wang2 0.280 0.564 0.636 0.232 0.530 0.576 Chatterjee3 0.312 0.614 0.678 0.266 0.564 0.610 Rosner4 0.326 0.630 0.682 0.270 0.574 0.620 Does the risk factor effect differ with respect to tumor marker 3 (γ13)?

Polytomous1 0.298 0.602 0.648 0.300 0.552 0.606 Wang2 0.262 0.562 0.620 0.242 0.502 0.554 Chatterjee3 0.294 0.606 0.654 0.276 0.568 0.606 Rosner4 0.300 0.616 0.666 0.286 0.576 0.618 Does the risk factor effect differ with respect to tumor marker 4 (γ14)?

Polytomous1 0.290 0.610 0.660 0.242 0.540 0.590 Wang2 _0.264 _0.574 _0.634 _0.210 _0.506 _0.560 Chatterjee3 0.288 0.632 0.672 0.238 0.570 0.622 Rosner4 0.296 0.638 0.676 0.250 0.586 0.630 *Alternative scenarios: 1: β1m= {0.2, 0.4, 0.4, 0.4, 0.4, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.8, 0.8, 0.8, 0.8, 1.0} 2: β1m= {0.2, 0.4, 0.4, 0.4, 0.4, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 1.2, 1.2, 1.2, 1.2, 1.2} 3: β1m= {0.2, 0.4, 0.4, 0.4, 0.4, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 1.2, 1.2, 1.2, 1.2, 1.4} 1_{Polytomous logistic regression}

2_{Two-stage meta-regression (Wang et al., 2015)}

3_{Two-stage regression with simultaneous estimation (Chatterjee, 2004)} 4_{Stratified logistic regression (Rosner et al., 2013)}

the effect was significantly different across disease subtypes according to all subtypes.

In document Statistical methods for the study of etiologic heterogeneity (Page 44-51)