We start by proposing a general strategy for building appropriate models in Section 9.3.1, that can be used by social scientists and other researchers. Although designed for analysing data from longitudinal studies, it is also applicable for cross sectional data. This approach allows the uncertainty from the missing data to be taken into account, and all relevant sources of information relating to the question under investigation to be utilised. In Section 9.3.2 we discuss how the DIC can be used for evaluating such models.
9.3.1 Proposed model building strategy
Based on the research carried out for this thesis, our proposed strategy for building Bayesian models for data with missing covariates and responses is encapsulated by the following steps.
1 Select an initial model of interest based on complete cases. This will include choosing a transform for the response, model structure and a set of explanatory variables. It is worth noting plausible alternatives at this stage, which can then be incorporated into the sensitivity analysis in step 5. 2 Add a covariate model of missingness to produce realistic imputations of any missing covariates,
taking account of possible correlation between these covariates as necessary. A latent variable approach can be used for binary or categorical variables. Check the reasonableness of this model by comparing the pattern of the imputed values with the observed values. Crucially this allows incomplete cases to be included in the model estimation.
3 Add a response model of missingness to allow informative missingness in the response. In the absence of any prior knowledge, the safest strategy is to assume a linear relationship between the probability of missingness and the response (or change in response) in the response model of missingness. More complex models of missingness with vague priors are difficult to estimate, as they are reliant on limited information from assumptions about other parts of the model.
4 To complete the formation of a base model, add additional data and/or expert knowledge into the various sub-models to help with parameter estimation, if available. Information relating to the response model of missingness has the potential to make the biggest impact, in particular regarding its functional form. Additional data may come from other, similar studies or be provided by earlier/later sweeps not under investigation.
5 Carry out a series of sensitivities to determine the robustness of any conclusions to different plausible assumptions for all parts of the model. A particularly key sensitivity is the choice of transform of the response variable in the model of interest. Expert knowledge can help with setting up the parameter sensitivity.
This approach was followed for our MCS income example, except that we carried out steps 2 and 3 in reverse, because of our greater focus on missing response data as opposed to missing covariates. However, when modelling using WinBUGS, the ordering proposed above seems more natural, since a combined model of interest and covariate model of missingness will run with missing responses assuming MAR, whereas a combined model of interest and response model of missingness will not run with missing covariates. So dealing with the covariate model of missingness first, avoids the necessity of working with a restricted dataset which excludes records with missing covariates or “filling in” the missing covariates using simplistic assumptions prior to running the model.
Our proposed strategy assumes that the covariates are MAR, but in principle step 2 can be elaborated to allow MNAR covariates. This is a possible extension that is discussed in Section 9.4. Also, step 3 is only relevant for MNAR responses, since it is not necessary to explicitly model missing response if it is MAR.
At each step, checks of model fit should be carried out to ensure that the models are plausible. Although this strategy forms a base model, the aim is not to select a single model, but a range of models encompassing different realistic assumptions. These are then used to determine the robustness of the conclusions about the substantive questions and examine the uncertainty as necessary.
This strategy is summarised diagrammatically in Figure 9.1. Note that the fit of a hold-out sample and the DIC can be used to help determine which models are plausible. Uses of the latter are described in the next section.
There are situations where it may be necessary to adapt this strategy. For example, if the dataset to be modelled is very large, or there are large numbers of covariates with missingness, then running times may be prohibitive or computational issues encountered. In these circumstances, one option is to take a two stage approach and impute some or all of the covariates prior to running the Bayesian joint model. In this case the issues surrounding multiple imputation regarding compatibility will apply. It may be possible to identify covariates where using simplistic assumptions to impute their missingness is acceptable, as we did with the region covariate, reg, in the MCS income example. If not all the covariates are correlated, another option is to split the covariate model of missingness into several smaller sub-models.
9.3.2 Proposed use of DIC in the presence of missing data
For complete data, DIC is routinely used by Bayesian statisticians to compare models, a practice facilitated by its automatic generation in WinBUGS. However, using DIC in the presence of missing data is far from straightforward. The usual issues surrounding the choice of plug-ins are heightened, and in addition we must ensure that its construction is sensible. It would be a mistake to use it
Figure 9.1: Proposed strategy for Bayesian modelling of non-ignorable missing data
BASE MODEL
ASSUMPTION
SENSITIVITY
run alternative modelswith key assumptions changed including
PARAMETER
SENSITIVITY
run model with theMoM parameters associated with the informative missingness (δ) fixed to range of plausible values Are conclusions robust? report robustness determine region of high plausibility
YES
NO
• MoI error distribution • MoI response transform • MoM functional form
recognise uncertainty elicit expert knowledge calculate DICO and MoM DICW select MoI using
complete cases add CMoM add RMoM note plausible alternatives seek additional data assess fit of hold-out sample
to select a single model, but we have identified three ways in which it has the potential to provide insight into proposed models. These may give conflicting messages, and deciding how much emphasis is placed on each requires judgement.
1 A DIC based on the observed data likelihood, DICO, can help with the choice of the model of interest. It should be used to compare joint models built with the same model of missingness but different models of interest. This DICO cannot be generated by WinBUGS, but can be calculated from WinBUGS output using other software. Daniels and Hogan (2008) provide an algorithm for its calculation, which we have adapted and implemented for both simulated and real examples. We recommend performing two sets of checks: (1) that the plug-ins are reasonable (i.e. if posterior means are used, they should come from symmetric, unimodal posterior distributions) and (2) that the size of the samples generated from the likelihoods is adequate (we suggest plotting deviance against sample length and checking for stability, as in Figure 4.8). For DICO, we must choose plug-ins that ensure consistency in the calculation of the posterior mean deviance and the plug-in deviance, so that missing values are integrated out in both parts of the DIC. This consideration makes logitp plug-ins for the selection model of missingness unsuitable.
2 The model of missingness scaled pD can indicate the level of departure from missing at random,
given the model assumptions. It comes from the conditional DIC generated by WinBUGS, DICW. However, we have found that it is not always robust to the choice of plug-ins used, and recommend caution in its interpretation if the plug-ins are suspect.
3 The model of missingness part of DICW allows comparison of the fit of the model of missingness for joint models with the same model of interest.
Although we propose using two component measures from the conditional DIC routinely generated by WinBUGS, the total DICW has asymptotic and coherency difficulties for selection models. We do
not recommend using either the WinBUGS total DIC or the WinBUGS model of interest DIC. An alternative to using DIC to compare models, is to assess model fit using a set of data not used in the model estimation, if available.