analysis method - Sample size calculations for cluster randomised trials, with a focus on ordin

The data generation model assumed a probit regression model for simplicity. However, use of the logit link is a more popular analysis method. Using the ANOVA estimate of the ICC in the design effect the simulations for the four-level outcome variable were repeated with analysis via a logit link rather than probit. The results are summarised in Table 5.13. All empirical powers were above or very close to the calculated 90%. The difference in empirical to expected power ranged from -1.6% to 2.4% . The standard error of calculated power ranged from 0.84 to 1.01.

5.8. RESULTS

Table 5.13: Empirical power when using the ANOVA estimate of the ICC for the ordinal outcome in the design effect for sample size calculation followed by an analysis with a logit model. For each combination of cluster size and ICC: 1000 datasets generated with a 90% target level of power

Fixed Design Parameters log odds=0.493 log odds=0.887

C Empirical power C Empirical power

Cluster size ρa(ρl) Design effect (SE, ˆθ− θ) (SE, ˆθ− θ)

5 0.01 (0.01) 1.04 62 88.4 (1.01, -1.6) 20 89.7 (0.96, -0.3) 0.07 (0.08) 1.28 76 90.8 (0.91, 0.8) 25 91.9 (0.86, 1.9) 0.14 (0.16) 1.56 93 90.9 (0.91, 0.9) 30 91.0 (0.91, 1.0) 0.21 (0.25) 1.84 109 90.6 (0.92, 0.6) 35 92.2 (0.85, 2.2) 0.46 (0.53) 2.84 169 90.2 (0.94, 0.2) 54 89.8 (0.96, -0.2) 10 0.01 (0.01) 1.09 33 91.9 (0.86, 1.9) 11 93.9 (0.76, 3.9) 0.07 (0.08) 1.63 49 91.6 (0.88, 1.6) 16 92.1 (0.85, 2.1) 0.14 (0.16) 2.26 67 90.6 (0.92, 0.6) 22 90.9 (0.91, 0.9) 0.21 (0.25) 2.89 86 92.1 (0.85, 2.1) 28 91.2 (0.90, 1.2) 0.46 (0.53) 5.14 153 88.8 (0.99, -1.2) 49 90.6 (0.92, 0.6) 50 0.01 (0.01) 1.49 9 92.8 (0.82, 2.8) 3 92.7 (0.82, 2.7) 0.07 (0.08) 4.43 27 91.5 (0.88, 1.5) 9 92.2 (0.85, 2.2) 0.14 (0.16) 7.86 47 90.8 (0.91, 0.8) 15 92.1 (0.85, 2.1) 0.21 (0.25) 11.29 67 89.8 (0.96, -0.2) 22 92.5 (0.83, 2.5) 0.46 (0.53) 23.54 140 90.7 (0.92, 0.7) 45 92.4 (0.84, 2.4)

Notes. Assuming a 4-level ordinal outcome with proportions 0.20, 0.50, 0.20 and 0.1 in the control group and C clusters per group. Empirical power, ˆθ is calculated as the proportion of fitted probit models with a treatment effect significant at the 5% level. ˆθ− θ represents the absolute difference between empirical and nominal power

5.9. DISCUSSION

5.9 Discussion

5.9.1 Main findings

To explore the relationship between ICC estimators

With a small number of clusters results showed that the ICC on the latent response tended to be largest followed by the ANOVA and kappa-type ICCs. As the level of clustering increased so too did the difference between the ANOVA and Kappa-type estimates with the ANOVA estimate being consistently larger. With a large number of clusters results also showed that the ICC on the latent response was largest, however, the ANOVA and kappa-type ICCs were almost identical. This was expected as it has been shown that these two estimators are asymptotically equivalent as the number of clusters increases.125

For each scenario investigated as the number of ordinal categories increased so too did the estimated ANOVA and kappa-type ICCs. This was expected because as the number of ordinal categories increases the variable more resembles a continuous variable and therefore we would expect the ICC to tend towards the ICC calculated on an assumed underlying continuous variable.

By comparing two possible patterns in the expected proportions across categories for a 4-level ordinal outcome the observed ANOVA ICC was shown to depend upon both the number of categories but also the proportions observed in each category. The two patterns of proportions explored both had a fairly even spread in the proportions expected in each category and the difference in observed ANOVA ICCs for the two categorisations was small. Larger deviations from an even spread of proportions across categories might have a more substantial impact on the ANOVA ICC.

To determine which ICC results in an adequately powered trial

The use of the ANOVA ICC estimate in the design effect resulted in adequately powered trials. The empirical power was within 2% for 3-, 4- and 5-level outcomes. The efficiency of Whiteheads method increases with the number of ordinal categories and is most efficient when the proportions in each ordinal category are evenly spread. However, once you go beyond five categories further efficiency

5.9. DISCUSSION

gains are marginal. In my simulations as the number of categories increased the proportions in each ordinal category subsequently became more evenly spread. The ANOVA ICC estimate appeared to be conservative when the spread was less even. Hence, I saw a slight decrease in power as the number of categories increased but became more evenly spread.

The largest differences between nominal and empirical power were in situations with a small number of clusters, which was expected due to the inflated Type I error rates for these situations.

Use of the ICC of the underlying latent variable in the design effect resulted in overly conservative sample sizes, having an additional 1.6% to 5.3% power over the required 90%.

To determine the effect of non-proportional odds on power

I considered the situation where a minor deviation to proportional odds occurred. For the 4-level ordinal outcome, there are three possible ways to dichotomise the outcome to calculate the log- odds of being in category q or better. I assumed that two of these were the same and one was slightly lower, but all indicated a beneficial treatment effect. I deemed this a minor deviation to the proportional odds assumptions. For this situation a sample size based upon an average estimate of log-odds was shown to result in a marginally underpowered trial. Power calculations based on the smallest log-odds were overly conservative.

There are alternative situations which might also be classified as minor deviations from proportional odds for example the situation where one odds ratio is slightly larger than the other two. Use of the average log odds in the design effect for these situations was not explored and may not necessarily result in a marginally underpowered trial.

The use of the design effect sample size approach is not recommended for situations in which major deviations to proportional odds occur such as all the log odds being very different, or some log-odds showing inconsistency around the effect of treatment.

5.9. DISCUSSION

Sample size using the ANOVA ICC and analysis via a logit link also resulted in adequately powered trials, analysis with the probit link was slightly more conservative. Due to the similarity in shape of the logistic and normal distributions the fit of a model using either of these links should be similar and therefore this result was expected.

Despite the difference in interpretation and magnitude between a random effects and GEE model the significance of the treatment effect is likely to be similar.130 _{Therefore I expect the results of my} simulation to be applicable if a GEE model was used to analyse the data. As Stata cannot be used to fit GEE models for ordinal outcomes I did not test this within my simulations. Given more time this could have been done using an alternative software package, such as SAS.

5.9.2 Strengths and limitations

There are several strengths to the research described within this chapter. The simulation study was planned and reported following the most commonly used guidance for reporting simulation studies and is therefore described in sufficient detail for this work to be fully reproduced by others. The scenarios chosen for the simulations are largely reflective of the characteristics of real life cluster randomised trials that have used ordinal outcomes, except that designs with a small number of clusters were excluded. Appropriate analysis methods when the number of clusters is small require some adjustment to account for the inflated Type I error and these methods have not been well established for ordinal outcomes. To explore different analysis methods for clustered ordinal outcomes when the number of clusters is small was beyond the scope of this thesis. Some approaches that might be appropriate are discussed in the final chapter of this thesis alongside some practical guidance with regard to sample size calculation in general for clustered ordinal outcomes.

I explored the relationship between ICC estimators under the simplifying assumption of no treatment effect. This assumption implies that the ICC is the same in both treatment arms and thus using these ICC estimates in sample size calculations would be equivalent to using an ICC estimate based on control data alone. However, like binary data I identified from the simulation study that the value of the ANOVA ICC for the ordinal outcome was dependent upon the proportions observed in each ordinal category. This would suggest that in the presence of a treatment effect the ANOVA ICC estimates would be different across treatment arms and in these situations it is more appropriate to

5.9. DISCUSSION

use a pooled ANOVA ICC estimate in the sample size calculation.147 _{To assess the implication that} my assumption of no treatment effect may have had on my simulation results I re-ran one of the simulation scenarios from Table 5.4 this time assuming the same log-odds values of 0.493 and 0.887 that were used in the later sample size calculations (4-level outcome, 100 clusters of size 50 with an underlying latent ICC of 0.53). The results showed that for these treatment effects there was only a minor difference in the observed ANOVA ICCs across the treatment groups and the pooled ANOVA ICC estimate was the same as, or very close to, the ANOVA ICC estimate calculated when assuming no treatment effect. Therefore my initial assumption is unlikely to have affected my results significantly.

My simulations generated clustered ordinal data using the latent variable approach. However, as discussed in Section 5.3 this is not the only method that might be used for data generation and it is unclear as to whether a different method would have any impact upon my findings.

The ANOVA ICC was shown to depend upon both the number of ordinal categories and the proportions observed in each category. In this research I have considered scenarios where there is a fairly even spread in the proportions expected in each category, this is the situation for which Whitehead states his method is most efficient. In situations where this is not the case the estimate of the ANOVA ICC may be less similar to the latent variable ICC and therefore the performance of the ANOVA ICC in the design effect may be affected.

5.9.3 Comparison with other work

Gao125_{investigated analysis strategies for clustered ordinal data, with a focus on adjusted Cochran-} Armitage tests. She considered the use of both the kappa-type ICC and ANOVA ICCs in the test statistic, and showed that the Cochran-Armitage test had greatest power with the kappa-type ICC estimate. Gao saw similar results to mine in that the ANOVA and kappa-type ICC estimates were asymptotically equivalent as the number of clusters increased. Her work used datasets generated using marginal models and hence the relationship with the ICC on the underlying continuous variable was not included.

5.9. DISCUSSION

size calculations with ordinal outcomes analysed via a random effects model.

5.9.4 Implications

In this research I have identified that using the ANOVA estimate of the ICC, calculated by assigning equally spaced numerical values to the ordinal outcome, within the design effect for sample size calculations results in adequately powered trials. However, before implementing this method the researcher must ask themselves two questions: Is a good estimate of the ICC available? and is the assumption of proportional odds reasonable? Both of these elements will impact upon the power of the trial. With only minor deviations to the proportional odds assumptions the use of the design effect and an analysis which assumes proportional odds may result in only marginal over or under powering. Major deviations are likely to require alternative analysis methods and sample size would be best estimated through simulation. The sensitivity of the sample size calculation to the range of plausible ICCs should be examined. If no reasonable estimates are available researchers might consider a sample size calculation and analysis based upon the dichotomised version of the outcome, for which ICC estimates may be more readily available.

With binary outcomes the overall prevalence of the observed endpoint is a single proportion. There- fore in simulation studies of binary outcomes it is straightforward to examine several proportions to gain insight into emerging patterns as the prevalence increases for example 10%, 30% and 70%. For ordinal outcomes the situation is less straightforward. The number of ways that the proportions can occur across the ordinal categories is numerous and it is not easy to try and systematically explore all possible combinations or identify patterns. In Whitehead’s work for individually randomised trials he considered two patterns. The first where there was an even spread across categories and the second where one category was dominant. He showed that an even spread across categories was a more efficient design. In this research I have focused on scenarios where there is generally an even spread of the proportions expected in each ordinal category. For situations in which there is a less even spread, perhaps one or more categories are dominating, the ordinal outcome looks less like a continuous outcome and hence the ANOVA estimate of the ICC may be less similar to the ICC on the latent variable. Therefore using the ANOVA estimate of the ICC in the design effect for these situations may not perform as well. Simulation studies should be conducted in these situations to

5.9. DISCUSSION

confirm the sample size required.

This work has provided some guidance for sample size calculation for those designing trials with ordinal outcomes. However, in order to move forward estimates of the required ANOVA ICC are needed. I would therefore recommend that authors reporting results with ordinal outcomes report the ANOVA ICC and also provide the estimates of each log-odds in order that the reader may evaluate the assumption of proportional odds.

My work will impact those working in fields where ordinal outcomes are prevalent. However, there are still many other design aspects of cluster randomised trials for which the corresponding sample size development is still lacking. These areas are identified and discussed in the next chapter. In the final chapter I bring all my research together to discuss the future of sample size calculations for CRTs and formulate clear practical guidance for sample size calculations with ordinal outcomes.

Chapter 6

Remaining methodological gaps in

In document Sample size calculations for cluster randomised trials, with a focus on ordinal outcomes. (Page 164-172)