• No results found

2.4 Drawing inferences from incomplete data

2.4.5 Direct maximum likelihood analysis

The likelihood-based methods can be applied to incomplete longitudinal data (i.e., without any pre-processing or prior treatment of the missing values). These methods can also be applied to incomplete longitudinal data either after deletion (e.g., CC analysis) or after imputation of the missing observations (eg., MI under MAR). Under the CC analysis, since missing values are no longer present in the set of complete cases or in the imputed data set, likelihood approaches are based on the full-data likelihood (3.2) of the complete data. On the contrary, for incomplete longitudinal data, any method within a likelihood framework would require working with the observed-data likelihood (3.3). For MAR,

`i = ln Pr (Ri | Yio, ψψψ) + ln Pr (Y o i | θθθ) .

Maximum likelihood estimation would thus simply entail separate maximization of the component terms, which would translate, for instance, to fitting two maximum likelihood models: one for the observed responses and another for the non-responses conditional on the observed responses. Additional simplification arises for the case of MCAR,

`i = ln Pr (Ri | ψψψ) + ln Pr (Yio | θθθ) ,

where the model for the non-response need not be conditional on the observed responses. Moreover, if the focus of inferences lies on the response process parameters θθθ, estimation of the (conditional) non-response model (given the observed measurements) can altogether be bypassed. As such, the direct likelihood approach is no more complicated than fitting a likelihood-based model on the complete cases. Standard software procedures that allow for incomplete observations, ensuring that the correct form of the likelihood is manipulated, would be able to obtain such a solution.

Direct likelihood under non-ignorability (e.g., MNAR) is a lot less straightforward in comparison with the ignorable case. Unlike the latter, the former does not admit further simplification of the observed-data log-likelihood contributions, due to the dependence of non-response on the unobserved

outcomes,

`i = ln Li = ln

Z

Pr (Yoi, Yim | θθθ) Pr (Ri | Yo, Ym, ψψψ) dYmi . (2.1)

The integration over the missing values brings about additional levels of complexity to the direct likelihood approach for non-ignorable missingness. Also, evaluation or approximation of the inte- gral to compute `i in (2.1), especially for high dimensions of missingness, can be computationally

demanding (Molenberghs et al., 2008).

There is a little difference between the direct maximum likelihood (ML) and multiple imputation approaches under normality assumption for the observed data (Carpenter and Kenward, 2007). The likelihood-based methods can be considered as imputation methods regardless of whether missing- ness is ignored or modeled (Fitzmaurice et al., 2008, pp.401-403). When missing data is described as MAR and ignorable, likelihood-based methods are effectively imputing the missing values by mod- eling and estimating parameters for the joint distribution of the responses, Pr (Yi | Xi, θθθ). When

missing-data mechanism is described as ignorable, the likelihood-based methods impute missing values based on the marginal distribution of observed data Pr (Yo

i | θθθ) (Rubin, 1976). This means

that the maximum likelihood estimates can be obtained by maximizing the likelihood function `i(Yoi, Xi, θθθ).

Rubin (1976) showed that likelihood-based inferences can be obtained by integrating over the miss- ing responses from the joint distribution of the responses Pr (Yi | Xi, θθθ), defined by

L (θθθ) ∝ N Y i=1 Z Pr (Yoi, Yim|, Xi, θθθ) dYmi . (2.2)

Intuitively, the missing values Yim are validly predicted by the observed data via the model for conditional mean, E (Ym

i | Yoi, Xi, θθθ). This form of imputation becomes more transparent when

expectation maximization (EM) (Dempster et al., 1977) algorithm is used.

The EM algorithm is an iterative procedure which allows us to compute the maximum likelihood (ML) estimates in the presence of missing data. Each iteration of the EM algorithm consists of two processes (steps). These steps alternate between (1) filling in the missing values with their conditional means, given the observed responses and parameter estimates from the previous iteration (expectation or E-step) and (2) maximization of the likelihood from the resulting “complete data” (maximization or M-step).

The EM algorithm is closely related to the following ad hoc process of handling missing data: (1) fill in the missing values by their estimated values, (2) estimate the parameters for this completed dataset, (3) use the estimated parameters to re-estimate the missing values, and (4) re-estimate the parameters from this updated completed dataset. Informally, it proceeds as

E-step : In the E-step, the missing data are estimated, given the observed data and current es- timate of the model parameters. This is achieved by using the conditional expectation, E (Ym

i | Yio, Xi, θθθ).

M-step : In the M-step, the likelihood function is maximized under the assumption that the missing data are known. This means that the estimate of the missing data from the E-step are used in place of the actual missing data.

These steps alternate until convergence of the parameter estimates is achieved. The EM algorithm is an iterative procedure for obtaining the ML estimate of θθθ that maximizes the likelihood function 2.2. In the presence of missing data, the EM algorithm provides a natural framework for their inclusion. The algorithm achieves this by treating missing values as parameters after obtaining θθθ from Pr (Yo

i | θθθ). As discussed under MAR in Section 2.2.2.2, MAR and MCAR are often referred to

as ignorable mechanisms, ignorable in a sense that as long as one can establish that Pr (Ri | Yi, Xi)

is independent of the Ym

i , the Pr (Ri | Yi, Xi) can be ignored and valid likelihood-based analysis

can be obtained through a correctly specified joint distribution model for Pr (Yi | Xi).

2.4.6

Comparison of the direct maximum likelihood and multiple im-

putation approaches

The direct maximum likelihood (ML) method and the multiple imputation (MI) methods are known to produce efficient estimates. However, one has an advantage over the other in some scenarios. The ML method is more efficient and produces correct standard errors compared with MI. Full efficiency for MI requires an infinite number of data sets. For a given data set, ML always gives the same results, whereas MI gives a different result each time it is used. However, one can “force” the MI to give the same results by setting the “seed”. With MI, there is always potential conflict between the imputation model and the analysis model (Fitzmaurice et al., 2008). There is no conflict with ML because only one model is required. The ML do not require a model for the missing data mechanism. Rather, it “predicts” missing values implicitly by maximizing the likelihood function. When the imputation model uses the same variables in the substantive model (assumed model for the measurement process), estimates from the ML and MI methods are comparable but the estimates from ML are more efficient. On the other hand, when the imputation model uses additional variable to improves its predictive power, or when particular forms of NMAR mechanism are relevant, the MI has an advantage over the ML method (Su et al., 2011). When the data are MAR, MI can lead to consistent, asymptotically efficient, and asymptotically normal estimates. The ML requires specialized software and it may be challenging and time-consuming. Once missing data are obtained,

Although full data likelihood functions exist for marginal models (Molenberghs and Verbeke, 2006), however, under non-normal linear model setting, marginal models are computationally demanding since the likelihood function has no close form. If need be, this requirement can be avoided by spec- ifying the likelihood only partially, resulting in a semi-parametric method (Bahadur, 1961; Molen- berghs and Lesaffre, 1994; Liang and Zeger, 1986). Generally, the application of semi-parametric methods is not exclusively restricted to the area of longitudinal data, though such methods have gained popularity, particularly for the case of categorical (e.g., binary) repeated measures.

Under binary repeated measures, fully specified marginal models (Bahadur, 1961; Molenberghs and Lesaffre, 1994) exist and can be fitted; however, the intricacies can be restrictive (Sotto, 2009). As an alternative, Liang and Zeger (1986) proposed generalized estimating equations (GEE), which can be used to obtain marginal models for non-Gaussian longitudinal data, but, at the same time, avoiding the computational complexity of full likelihood. This approach does not rely on specification of a likelihood function for the repeated measurements but assumes a model for the means response and a model for variance-covariance structure. However, due to the non-likelihood nature of the GEEs, additional issues arise when data are missing. The issue with GEEs is that they are moment based estimators, and sample moments are biased when the data are MAR or NMAR. Under MAR, weighted GEE or the inverse probability weighting (IPW) (Seaman and White, 2013) provides consistent estimators (Robins et al., 1995; Robins and Rotnitzky, 1995; Scharfstein et al., 1999). The idea behind the weighted GEE is to weight each patient’s contribution in the GEEs by the inverse probability that a subject drops out at the time he/she dropped out.