Simulations were conducted under Asssumptions 3.1–3.7. For each simulated data set n principal strata vectors ZP0 were simulated using a multinomial random number generator
with parameterθ = (θ00, θ01, θ10, θ11) withθij = Pr[ZP0 = (i, j)]; (θ10= 0 under Assumption
3.5). The parameterθ01is the proportion of the population that are in the compliers principal
strata and provides a measure of the strength of the instrument in determining treatment allocation. The randomized treatment assignment R was simulated by randomly permuting a vector of size n containing 0 p0n times and 1 for the p1n remaining entries. The random
variable Z was determined based on R and ZP0. Censoring times C were generated using
a uniform random number generator on the interval (CR, CR + ∆CR). The time to the first eventT(Z) =T was simulated by sampling from the distribution defined by an overall hazard ofPJ
j=1λ
j
rz(t) where eachλjrz(t) is a Weibull hazard of the formκγ(γt)κ−1for various scenarios as detailed in Table 3.1. The event indicator ∆ was simulated by sampling from a multinomial random variable with Pr[∆ = j|T] = λjrz(T)/PJ
j=1λ
j
rz(T) for j = 1,2. If the subject was censored, ∆ was set to 0. All results are based on 5,000 Monte Carlo simulations, p0 = 0.5, C0 = 4, ∆C0 = 6, C1 = 3, ∆C1 = 3, w(t) = 1 and t0 = min{maxi(Xi|Ri = 0), maxi(Xi|Ri = 1)}.
Naive “as treated” analysis about (3.1) and (3.3) might entail computing the estimators
˜
δ(t) =Sb11(t)−Sb00(t) and ˜δj(t) =Fb00j (t)−Fb11j (t) (3.5)
where Sbrz(t) and Fb
j
rz(t) are the Kaplan Meier estimator of the survival function and the Aalen Johansen estimator of the subdistribution function conditional on R = r and Z = z. Pointwise confidence intervals for (3.1) and (3.3) might be computed by appealing to asymptotic normality results (Andersen et al., 1995) forSbrz(t) andFbrz(t). In this “as treated”
analysis, testing the hypotheses H0 and H0j might be accomplished using weighted Kaplan
Meier (WKM) tests as in Pepe and Fleming (1989). The coverage of the pointwise confidence intervals forδ(t) andδj(t) and power of the WIV tests forH0 andH0j in Propositions 3.1 and
3.2 are compared to the coverage of pointwise confidence intervals and the power of WKM tests in this naive analysis.
Nonproportional hazards in the treated versus the control amongst the complier principal strata are assumed in all scenarios. Scenario 1 describes a situation in which one cause (j= 2) exhibits a causal treatment effect in the complier principal strata. In this scenario, the power to rejectH02 is similar to H0 and the power ofH01 is small (though note that this scenario is
not null, i.e. H01 : δw1(t0)6= 0). Scenario 2 describes a situation in which both causes exhibit
causal treatment effects, but these effects cancel each other out such that δw(t0) = 0 (as
described in Section 3.2.3 and at the end Section 3.3.2). The power to reject H0 in Scenario
2 reflects that this test is consistent and the type I error is controlled. These opposing causal effects forj= 1 and 2 are roughly the same magnitude asδw2(t0) in Scenario 1, and the power
to reject both H1
0 and H02 in Scenario 2 is similar to the power to reject H02 in Scenario 1.
Scenario 3 describes a situation in which both causes exhibit a causal treatment effect that are the same sign and magnitude. As would be expected, the power to rejectH0 in this situation
is higher than that ofH01 orH02, which are roughly the same. Scenario 4 describes a situation in which there are no causal treatment effects in the complier principal strata for cause 1 or 2 such thatδw(t0) =δw1(t0) =δ2w(t0) = 0. Again, as expected the results here demonstrate that
measured by θ01 – the proportion of the population who are compliers) has large effects on
the power of the test, with increasing instrument strength yielding increased power.
The power for the corresponding naive weighted Kaplan Meier tests are higher, however, these tests are not unbiased. The estimated power is greater than 5% in the scenarios where the treatment effect is null (Scenario 4, and Scenario 2 for j = (all)), meaning that these tests have inflated type I error and therefore should not be used to test for the local average treatment effects in (3.1) and (3.3).
Table 3.2 shows that the IV estimators bδ(t) and bδj(t) are unbiased and that the variance
estimators accurately estimate the true variance (as indicated by the ratio of the average estimated variance and the empirical standard error). The coverage of the IV pointwise confidence intervals exhibit the ideal 0.95 in almost all scenarios (though there is slight over coverage in Scenario 2 for δ1(t) where the treatment effect in the compliers principal strata has the opposite sign of that of the difference between the always treated and never treated principal strata). On the other hand, the naive “as treated” estimators ˜δ(t) and ˜δj(t) have higher bias and the coverage for the corresponding confidence intervals is poor in several scenarios (e.g., see Scenario 2, j = 1 or Scenario 4, for all j). The power to reject H0j(t) : δj(t) = 0 based on the IV pointwise confidence intervals gives similar results as what was seen in Table 3.1, particularly for t= 5. These tests are again unbiased as indicated by the null scenarios yielding estimated power of approximately 5%. However, testingH0j(t) using a naive analysis again results in inflated type I error.