• No results found

The Standard Model

8.3 Simulation Study

Methods

We performed a simulation study to investigate the performance of the cluster-summary method in scenarios with different period effects and intracluster correlation coefficients (ICCs), and for several SWT designs.

We used local-authority-level data on uptake of NHS health checks in England in 2013-2014, available from Public Health England [18]. Health checks were offered to all adults aged 40-74 every five years by general practices (GPs) and third parties to assess risk of diabetes, heart disease, kidney disease, stroke, and dementia [19]. The mean of the local authority-level percentage of patients accepting health checks when offered was 49% in the first quarter of 2013; this increased to 54% in the last quarter. At the start of 2014, the mean was 46%;

this increased to 56% in the last quarter.

Data generation

We used these health check data to generate four scenarios (Figure 8.1). Details of how we generated the scenarios to be used in the simulation study from these data are given in S2.

We simulated period effects that were common to all clusters, and period effects that varied between clusters to the degree that was observed in the data to check that the cluster-summary method remained unbiased and gave correct confidence interval coverage in a range of scenarios.

We simulated two ICC scenarios to assess the power of the analysis with dif-ferent values of ICC. For one we used the between-cluster variability observed

Chapter 8. Paper C: Robust Analysis of Stepped-Wedge Trials using Cluster-Level Summaries Within Periods

Figure 8.1: Simulation study scenarios secular trends and ICC. Based on NHS health-check uptake in England

in the data (ICC=0.08 in the first quarter of 2013, hereafter referred to as

‘high ICC’) and for another we used one-fifth of the observed between-cluster variability (ICC=0.02 in the first quarter of 2013, hereafter referred to as ‘low ICC’).

The four scenarios were therefore: (1) common period effects and high ICC, (2) common period effects and low ICC, (3) varying period effects and high ICC, and (4) varying period effects and low ICC.

When the period effects varied between the clusters, the between-cluster vari-ance changed over time. Therefore, the ICC changed over time. Over the two years the ICC varied between 0.06 and 0.19 for the high ICC and varying period effects scenario, as observed in the data, and between 0.01 and 0.04 for the low ICC and varying period effects scenario.

Trial designs

We simulated SWTs in each of these four scenarios that assessed the effect of an intervention designed to increase the acceptance of health checks. The simulated intervention effect had an odds ratio of 1.3 favouring the intervention (log odds ratio=0.26).

We simulated four trial designs for each of the four scenarios to assess how the numbers of sequences, the number of clusters per sequence, and the total num-ber of clusters affected the power of the cluster-summary method. The four

Chapter 8. Paper C: Robust Analysis of Stepped-Wedge Trials using Cluster-Level Summaries Within Periods

trial designs had either 3 or 11 sequences with either 3 or 11 clusters per se-quence (Figure 8.2). This resulted in four trials design with a total of 9 clusters (3 sequences with 3 clusters per sequence), 33 clusters (3 sequences with 11 clusters per sequence, or 11 sequences with 3 clusters per sequence), or 121 clusters (11 sequences with 11 clusters per sequence). Unlike the mixed-effects model, the cluster-summary method requires clusters in both the control and intervention condition at each period so our trial designs began after the first sequence switched to the intervention, and finished before the final sequence switched to the intervention (Figure 8.2).

Figure 8.2: Trial schematics used in simulation study

The total number of observations for each cluster across the trial was selected from a log-normal distribution (µ = 5.3, σ2 = 0.25) regardless of the trial design; this gave a median cluster size of 200 (IQR 143 - 281) with observations evenly distributed across the periods. In scenarios with common period effects and high ICC, this would give the smallest trial (9 clusters) approximately 31% power to detect an odds ratio of 1.3 with the standard model, and the largest trial (121 clusters) approximately 100% power [20].

Each of the four trial designs for each of the four scenarios was simulated 1,000 times, allowing us to estimate coverage of 95% confidence intervals to within 1.4%.

Chapter 8. Paper C: Robust Analysis of Stepped-Wedge Trials using Cluster-Level Summaries Within Periods

Analysis methods and evaluation

We analysed each simulated trial using the cluster-summary method and the standard model. The cluster-summary method calculated an odds ratio for comparability with the standard model, although this is unlikely to be the measure of choice in practice. We compared the two analysis methods in terms of bias, coverage, and power for each trial design and scenario, in line with recommendations from Burton et al. [21].

We calculated the proportion of standard models that converged and the pro-portion of cluster-summary analyses that required the heuristic adjustment.

Bias was calculated as the deviation of the mean of the estimated intervention effect log odds ratio from the true log odds ratio. Effect estimates within half a standard deviation of the true effect were considered unbiased. Below this cut off, bias has been shown to have little effect on the type-one error rate [21, 22]. We compared the variability of the estimates given by each analysis method using the ratio of the variances. The coverage of the 95% confidence intervals was calculated as the proportion of simulations with p>0.05 against the true effect, i.e. the proportion of confidence intervals that contained the true effect. We calculated the power to detect an effect at 5% significance as the proportion of simulations with p<0.05 against no intervention effect.