• No results found

Remaining methodological gaps in sample size methods for CRTs

6.2. RESULTS Level

1

Level

2

Level

3

School

Classroom

Pupil

Pupil

Pupil

Classroom

Pupil

Pupil

Pupil

Figure 6.3: Graphical representation of the cluster randomised three-level design

The three-level design lends itself to an individual-level analysis by mixed model or GEE and simple design effects have been proposed for these methods for binary or continuous outcomes by Teeren-

stra79, 80 and Heo.81 For the three-level design two ICCs are required, in this example one for

students within schools and the second for students within classrooms. The calculation of the opti- mal sample sizes at each level under a cost constraint has also been considered by Moerbeek106 and Konstantopoulos.103, 182

Most methods discussed so far for three-level designs have assumed that randomisation will take place at the highest level e.g. school. If randomisation were to occur at the second-level these designs can be thought of as multi-center cluster randomised trials. Cunningham and Johnson have proposed a simple design effect for randomisation at the lower levels.207 The same is not true for longitudinal CRTs where outcomes are measured at specific time points within subjects, within clusters. If randomisation were to occur at the second-level the design would be equivalent to a longitudinal multi-center trial.

6.2. RESULTS

The most obvious development of the design effect for three-level trials would be to incorporate variable cluster sizes and other adaptations to the standard design as described in Table 6.2.

6.2.4

Emerging themes

The majority of methods identified in the review derived a sample size calculation to reach a pre- specified level of power for a superiority analysis of the primary outcome. Exceptions to this were those methods based on a non-inferiority design or those which optimised the cluster size and number of clusters to provide the maximum precision under a fixed budget constraint. In the update to the review four additional motivations to sample size calculations emerged and are briefly described here. The evidence-based perspective: Rotondi and Donner took an evidence-based approach to sample size determination. The appropriate sample size is derived based upon its potential impact on the literature i.e. the trial should be large enough to establish whether there is a treatment effect on its own but to also provide a definitive answer when used in subsequent meta-analysis.195

Powering for tests of mediation: Mediation analysis is undertaken in order to explain the process by which the intervention affects response. Using simulation methods Hox et al derive the lowest number of clusters required to accurately test and estimate mediation in cluster randomised trials both when maximum likelihood and Bayesian methods are used in estimation.211

Powering for cost-effectiveness: In 2014 Manju et al considered optimal sample sizes at the individual and cluster levels under a cost constraint where the outcome is the cost-effectiveness of treatments on a continuous scale.212 Their approach uses a maximin design and therefore is robust to miss- specification of the parameters such as the ICC.

Powering for a pre-specified confidence interval width : The final approach taken by Pornprasert- manit and Schneider is described as the accuracy in parameter estimation (AIPE) approach.213 This method helps researchers to find the smallest sample size that will ensure that the confidence interval around the treatment effect will be sufficiently narrow to be informative. Their methods are also extended to include a covariate and deal with unequal cluster sizes.

6.3. DISCUSSION

The remaining emerging themes centred on aspects of the design: three-arm trials; factorial trials; and the dog-leg design.

Three-arm trial: For a three-arm cluster randomised trial the simplest approach to sample size esti- mation would be to assume that three independent comparisons between the groups are to be made and the maximum sample size is then used for each treatment group.33 Methods of calculation based upon an overall test of treatment effect have recently been proposed.214

Factorial trial: Two methods for factorial designs have been proposed since the original review. The first by Dziak in 2012 for continuous outcomes can accommodate a pre-test measure of the outcome.215 The second approach in 2015, also for continuous outcomes, by Lemme et al calculates the optimal numbers at each unit in order to minimise the variance of the treatment effect estimator under a total budget constraint and heterogeneous variances across treatment groups. The authors conclude that the 2x2 factorial design is quite robust against heterogeneity of variance and any loss in efficiency can be compensated by the addition of one of two clusters per treatment group. The dog-leg design: The stepped-wedge design is a form of cross forward design where all clusters cross-over to the intervention arm at some point during the trial. These designs often require a large number of individuals, as repeated cross sectional samples are taken from each cluster. An incomplete cross forward design can reduce the number of individuals required as it leaves gaps in the assessment schedule in some of the arms. The dog-leg design has been proposed as the simplest incomplete cross forward design by Hooper and Bourke which can potentially reduce the number of individuals required and aid researchers to meet ethical and financial requirements for limiting the number of research participants. The name dog-leg comes from the pictorial representation made by the assessment schedule.216

6.3

Discussion

In the original review of sample size methods 85 papers were identified published over the 33 years spanning 1978 and 2011. When this review was updated in August 2015 an additional 28 papers were identified published over the 4 years between 2012 and 2015 (see appendix vi for details). This shows the methodology is still increasing.

6.3. DISCUSSION

Papers which made reference to a particular trial as the motivation behind the proposed sample size method were in the minority. Therefore, this rapid increase in methodology may not necessarily reflect a trend towards more varied and/or complex designs but instead may indicate that methods are being developed which are yet to have practical applications.

The focus of my thesis has been on sample size and analysis methods for ordinal outcomes. In this chapter I have taken a side step from this and presented a very broad overview of all the methodology available. I now describe some of the gaps in the methodology that I consider most striking or that I can see would be of interest to the applied statistician. Statisticians with a more detailed knowledge of specific areas for example time-to-event data or longitudinal designs may identify more specific issues that I have not raised here.

For the standard parallel group trial methods are available across a range of outcome measures: continuous, binary, count, ordinal, time-to-event and rates. In this thesis I have further developed the design effect method for ordinal outcomes to provide guidance around its use. However, for variations to this design and alternative design choices the methodology almost solely centres on binary and continuous outcomes.

In the vast majority of methods homogeneity of the between-cluster correlation across treatment groups is assumed and has rarely been challenged. It would be interesting to look further into this to see how reasonable this assumption is in different situations.

Cluster randomised trials with longitudinal designs produced some of the most complex sample size methods. This complexity makes it difficult to identify how they should be implemented and understand the differences between them and the situations to which they can be applied. Further work to consider a comparison of these methods and provide simple advice on their use is needed. For outcomes which are not binary or continuous and designs other than the standard parallel-group trial many of the sample size methods require estimates of additional parameters. For example for time-to-event outcomes an ICC must be defined and estimated, when cluster size is variable a coefficient of variation in cluster size is required, or when attrition is expected an estimate for the probability that the outcome is missing and an ICC of the missing data mechanism are needed.

6.3. DISCUSSION

These parameters are not yet well established or routinely reported. Finding appropriate estimates to use is therefore one of the biggest barriers to the practical application of these methods. There is scope for much work to be done in this area. Summaries of these parameters from real life data across a range of health areas are needed. More awareness amongst researchers and journal reviewers about the need to report these estimates for the design of future trials would also be helpful. This is the first comprehensive review of sample size methodology for cluster randomised trials. The full details, including formulae for the methods identified in this review have been described in the associated publication.60 Given that sample size methods appear to still be expanding. I plan to publish future updates to this review. A strength of the results presented here are that the areas where there is the biggest need for sample size development are immediately highlighted. A thorough critique and comparison of the methods within each section was beyond the scope of this thesis but may reveal some further areas that warrant development.

In this chapter I have highlighted several avenues for future exploration in sample size calculations for cluster randomised trials. One of the most useful aspects of my research on ordinal outcomes was my review of published clustered randomised trials (Chapter Three). This provided great insight into whether there was a need to develop methods for ordinal outcomes and the characteristics of these trials then guided the development of the sample size method, tailoring it to the needs of researchers to make it more practical. Before embarking on the development of any of the methodological gaps highlighted in this chapter I would strongly advise researchers to conduct similar reviews of CRTs to inform their research so that we see more pragmatic methodology, which is actually needed, being developed. My research into ordinal outcomes also raised several questions about situations which are not uncommon such as the appropriate analysis method if the number of clusters is small, what reasonable estimates of the ICC may be and what to do if non-proportional odds are suspected. These issues will be discussed further in the final chapter but I think they highlight the fact that there are still many questions of a very practical nature worth exploring before we move on to develop more complex methods to deal with design variations.