Analysis of ordinal outcomes
4.1 Analysis methods for non-clustered data
4.1
Analysis methods for non-clustered data
An ordinal outcome, as defined in chapter one, is a variable which consists of a set of categories that can be ordered or ranked, for example disease severity (mild, moderate or severe). The absolute difference between the categories is often unmeasurable or unknown.
Alan Agresti has made a large contribution to the work on the analysis of ordinal outcomes. The first version of his book on the analysis of ordered categorical data was published in 1984 and a second edition, published in 2010, included developments for clustered data. Agresti has contributed to several reviews of methods available to analyse ordinal data with the earliest in 1989 and the latest in 2005.90, 126–128 These articles highlight the substantial methodological development that has occurred in recent years in the analysis of ordinal outcomes. In the most recent review paper invited researchers were given the opportunity to discuss the work and give their thoughts on the future direction of ordinal outcome research, power and sample size calculations for clustered data, the focus of this thesis, was raised as an area for development. Lall et al have also reviewed ordinal regression models applied specifically to health-related quality of life outcomes.129
With an ordinal outcome one can either ignore or incorporate the ordinality in the data into the analysis. Methods which ignore the ordinal nature of the data are described here for completeness, as they are commonly undertaken, but are not recommended as they will likely give different results to an analysis that incorporates ordinality.130
4.1.1
Methods which ignore ordinality
If ignoring the ordinal nature of the data the outcome would alternatively be considered to be nominal, continuous or could be reduced to a binary outcome. Treating the outcome as nominal means treating the categories as if there is no natural ordering among them. The Pearson’s Chi- squared test is commonly used to analyse nominal data. However, it has been shown that when used to analyse an ordinal outcome the Chi-squared test can produce different conclusions to those made using analyses that take the ordering into account.130
Another simple approach which ignores the ordinality is to combine adjacent categories to reduce the ordinal outcome to a binary variable, to which standard statistical methods such as the chi-squared
4.1. ANALYSIS METHODS FOR NON-CLUSTERED DATA
test or logistic regression can be applied. This produces a valid analysis but dichotomisation of the outcome results in a loss of information, more power can be gained by retaining the full ordinal variable. The power of this dichotomisation approach has been examined using data from the CRASH trial which investigated the efficacy of corticosteroids in traumatic brain injury patients.131 The primary outcome variable in this trial was the 5-point Glasgow Outcome Scale which was dichotomised as unfavourable (dead, vegetative, severe disability) or favourable (moderate disability, good recovery). The analysis of the binary outcome was non-significant, yet the analysis of the ordinal outcome with proportional odds regression was highly significant. The authors attributed this difference in conclusions to the increased statistical power of the ordinal approach. In previous simulation studies the authors had also explored the effects of non-proportionality, where a significant treatment effect was present in only one category cut-off. They reported that surprisingly the results showed that even when the assumption of proportionality was not met the ordinal analysis assuming proportional odds had more power than the binary approach (dichotomised at the point of the significant treatment effect).
According to the authors trials in traumatic brain injury have traditionally been powered on the dichotomous variable of a favourable versus unfavourable outcome. The increased power from an ordinal analysis implies that powering on the ordinal outcome could reduce sample sizes. However, the authors investigating the CRASH data did not advise going down this route because they argue that trials in critical care medicine are generally under powered due to a systematic overestimate of the treatment effect size during the design. Instead the increased efficiency of an ordinal analysis should aid in the detection of smaller treatment effects for the same sample size. However, I would argue that under powering in the strictest sense refers to a trials inability to recruit and measure the required number of individuals as indicated by the sample size calculation. The issue here seems to be that the sample size calculation, in particular the way that the minimum clinically important difference is chosen, is not adequate. Investigators may be too optimistic about the expected treatment effect or the treatment effects are chosen to provide a sample size requirement that is attainable to recruit.
In contrast to the above recommendations on sample size for critical care medicine the recommen- dation for stroke trials is to conduct the design and analysis using approaches which utilize the
4.1. ANALYSIS METHODS FOR NON-CLUSTERED DATA
ordinality of the data, as opposed to a dichotomous approach based on stroke/no stroke, as the re- duction in sample size can reduce the competition for patients between trials and reduce the cost and complexity of the trial itself.132 For individually randomised trials Whitehead’s method of sample size calculation for ordinal outcomes has been shown to produce sample sizes that are on average 28% smaller than those for a binary version of the outcome.57 The impact on power when using more than two categories is large. However, additional power gains are marginal once the number of ordinal categories goes beyond five.2
The final approach ignoring the ordinal nature of the data assigns scores to the ordinal categories and assumes the variable is continuous and analysed using methods such as ANOVA or linear regression. Most commonly equally spaced scores are used across the ordinal categories, although other scoring systems may be used. Aside from how to assign scores the biggest problem with this approach is that the created variable often violates the normality assumption required for many analyses, more so when the sample size is small. The t-test and ANCOVA however, have been shown to be robust to the normality assumption (i.e. to produce significance levels close to nominal levels) using simulation with three-, four- and five-level ordinal data.133, 134 To avoid the assumption of normality non-parametric methods such as the Mann-Whitney-U test can be used.
Walters et al have explored methods for sample size and analysis within the context of quality of life data. Their results suggest that when the outcome has a limited number of discrete values (less than 7) and/or the proportion of cases at either of the bounds is high Whiteheads method of sample size performs well. However, where seven or more populated categories are present and the proportion of cases at the bounds is low then sample size and analysis methods based on the simplifying assumption of an assumed continuously distributed variable may be used.135, 136
4.1.2
Methods that incorporate ordinality
In this chapter I assume that the ordinal outcome is a discrete measure of an underlying continuous variable and therefore I focus on the model often most appropriate for this situation, the proportional odds model (also referred to as ordered logistic regression) which is described by McCullagh.137 Lall et al have reviewed ordinal regression models applied to health-related quality of life assessments, and include discussion of the stereotype model that can be useful in situations where the categories
4.1. ANALYSIS METHODS FOR NON-CLUSTERED DATA
are not assumed to be a discrete version of an underlying continuous variable. The ordinality of the response is assessed within the model.129
Model formulation
Let us assume an ordinal response variable with k ordered categories q = 1, 2, . . . , k and Yj be the categorical response for the j0th individual. Y
j takes the value q if the response is in category q.
The probability of an individual j being in category q is πjq and the cumulative probability of being in category q or below, denoted Pjq is given by
P (Yj ≤ q) = Pjq= πj1+ πj2+ . . . + πjq
We assume that the ordinal response is a crude measure of some underlying continuous distribution which is unknown and unmeasurable (referred to as a latent response), Yj∗. The ordinal variable is obtained by chopping Y∗
j into categories using a series of cut points αq where q = 1, . . . , k− 1. Figure 4.1 illustrates an unobserved latent response for a four-level ordinal outcome. A value of Y∗ < α
1 corresponds to a response in the first category, values between α1 and α2 correspond to a response in the second category and so on. The cumulative probability of being in category q or below, denoted Pjq is now given by
P (Y∗
j ≤ αq) = P (Yj ≤ q) = Pjq (q = 1, 2. . . . k− 1)
If we know the distribution of the latent response this cumulative probability can be easily calculated. The most common choice for the distribution of the latent response is a logistic distribution with mean µ and variance, π2
3 . The cumulative distribution function for the standard logistic distribution is:
F (x) = P (X≤ x) = 1
1+e−(x−µ) or equivalently F (x) =
e(x−µ)
1+e(x−µ)
Therefore if we assume the underlying latent response Y∗
4.1. ANALYSIS METHODS FOR NON-CLUSTERED DATA