2.5 Conclusions and discussion
3.1.2 Methodologies for analysing longitudinal data
During the last years, several modelling approaches have been developed by many authors in order to deal with the previous mentioned challenges associated with lon-gitudinal models. In this section, we will enumerate the most widely used method-ologies when analysing longitudinal data. We will also introduce them shortly, and we will define the advantages and disadvantages of each one.
Reduction: the reduction approach, referred as derived variable by some authors (Hedeker and Gibbons, 2006), is based on the reduction of the repeated
mea-surements into a summary variable. Indeed, once reduction has been per-formed, this approach is no longitudinal any more, since there is only one observation per subject. The main problem of this approach is that the un-certainty in the derived variable is proportional to the number of repeated measurements for which it was computed. In unbalanced cases each sub-ject has a different number of measurements and hence different uncertainties.
Consequently, the homoscedasticity assumption of the model is not ensured.
Moreover, the reduction of the repeated measures decreases the number of ob-servations and hence, there is a considerably loss of statistical power. Finally, the reduction of the outcome does not allow the inclusion of time-dependent variables in the model, as the temporal aspect of the data is removed.
Analysis of variance: the analysis of variance (ANOVA) for repeated measure-ments (Winer, 1971) is used to compare three or more group means where the participants are the same in each group. The model assumes compound symmetry which implies constant variances and covariances over time. This is an assumption that hardly will be held in longitudinal data for two different reasons. First, attrition, variances will increase over time because the number of people that response reduces. Second, it looks reasonable to assume that co-variances for proximal measurements will be larger than coco-variances for distal measurements. The model allows a different trend line per subject, however, the trends only differ in the intercept, which implies that all the subjects be-have equally over time. It looks more reasonable that subjects differ not only in the baseline, but also in the rate of change (slope) from the overall trend line.
Multivariate analysis of variance: the multivariate analysis of variance (MANOVA) was proposed for longitudinal data analysis by Bock (1985). The MANOVA model is simply an ANOVA model with several dependent variables, i.e., while the ANOVA model tests for differences in means between two or more groups, the MANOVA model tests differences in two or more vectors of means. This approach transforms the repeated measurements to orthogonal polynomial co-efficients (e.g. constant, linear, quadratic growth rates), which are used as multivariate responses in the MANOVA model. The main disadvantage of this approach is that it does not deal with missing data, so all the subjects must have the same number of repeated measurements, which is very unlikely in practise.
Mixed-effects models: mixed-effects regression models are quite widely used in dif-ferent frameworks, specially for the analysis of longitudinal data (Laird and Ware, 1982). For example, as it was introduced in Section 2.1.1 GLMMs are a general methodology that include random effects in the linear predictor of a GLM, which can easily accommodate the correlation structure of longitudinal data. We will develop these models in more detail in Section 3.2. Mixed-effects regression models include the term mixed-Mixed-effects because they consists of a fixed component (regression coefficients) and a random component (ran-dom effects). Mixed-effects regression models are quite robust to missing data and irregularly spaced measurements, furthermore, they can easily deal with time-independent and time-dependent covariates.
Generalised estimating equations: The generalised estimating equations (GEEs) (Zeger and Liang, 1982) are a general alternative to mixed-effects models, which are computationally very convenient. GEE approach extends the clas-sical GLMs (see Section 2.1.1) to the case of correlated data. They can be used to analyse a wide variety of outcomes and do not require complex numer-ical evaluation of the likelihood for nonlinear models. They model the overall mean relationship of the variables and the within-subject dependency sepa-rately. GEE models are also called marginal models, where the term marginal makes reference to the assumption that the mean response only depends on the covariates of interest and not on any random effects or previous responses.
Among the previously defined methodologies for analysing longitudinal data, the most widely used and appropriate include the mixed-effects regression approaches and GEE. The larger difference between these two approaches is that GEE models are based on quasi-likelihood estimation, and so the full likelihood of the data is not specified. Therefore, while GEE models are considered partial-likelihood meth-ods, the mixed-effects models are considered full-likelihood methods as they use all the available data from each subject. The advantage of statistical models based on partial-likelihood is that they are computationally easier and generalise quite easily to different distribution forms of the repeated outcome variables. However, they are more restrictive in their assumption regarding missing data, limiting their ap-plicability in some cases. Moreover, full-likelihood models provide subject-specific effects which are quite useful when analysing individual-within variability and when predicting future responses for a given subject or a group of subjects in hierarchical structures.
On the one hand, GEE approach calculates the marginal mean for each subject, even if some of those means have limited information due to subject drop out. Then standard errors are adjusted taking into account the correlation structure of the repeated measures over time and/or subject clustering. On the other hand, mixed effects regression approaches use all the available information to calculate subject-specific trends that would have been observed if the subjects had stayed until the end of the study. Hence, if future subject responses are related to previous mea-surements, both approaches can conclude quite different estimated mean responses at the end of the study. In fact, the main difference between both methodologies appears when the missing data are dependent on the previous observed responses for each subject. However, it is difficult to imagine that if the missing data for a given subject had been observed, the response would not have been related to previous measurements of the same subject. That is, GEE assume that the missing data are missing at random and do not depend on the previous measurements.
Therefore, in this thesis we will consider a mixed-effects approach as the most appropriate for the analysis of hierarchical or longitudinal PRO data. In fact, this chapter is based on the development of a mixed-effects model based on the beta-binomial distribution. To achieve that goal, in Section 3.2 we make a review of the existing literature describing the most used mixed-effects regression approaches.
Then, in Section 3.3 we present the description of the model we propose, the de-velopment of an estimation and inference methodology and the comparison of its performance with similar approaches in the literature. In order to show the perfor-mance of our proposal and compare it with available methodology in the literature, a simulation study is carried out in Section 3.4. Finally, with the purpose of showing the applicability of the developed methodology, we apply it in both COPD Study and Paquid Research Programme described in Section 1.3.1 and Section 1.3.2 re-spectively. We finish the chapter providing some conclusions in Section 3.6.