Correlated binary covariates - Example 3: MCS income

5.3 Example 3: MCS income

6.1.1 Correlated binary covariates

One strategy for modelling correlated binary covariates uses the multivariate probit model (Chib and Greenberg, 1998; Chib, 2000). In this approach each binary variable is linked to an underlying latent variable with a Normal distribution, and thresholds are specified to determine whether the original variable’s value is 0 or 1.

Let x be the set of binary covariates with missing values, x1, . . . , xp, that we wish to model, and

denote the corresponding latent variables as x?_{= {x}?

1, . . . , x?p}. Then for individual i, x?_i ∼ M V N (ν_i, Σ)

where MVN denotes a Multivariate Normal distribution, and for covariate k

x_ik =    0: x? ik≤ 0 1: x? ik> 0. (6.1)

To complete the specification of a model for imputing the missing values of binary covariates, equations are defined for calculating the means, ν_i, of the MVN distribution. These are used to predict the latent variables, x?

i, from a linear combination of some fully observed explanatory variables (see Equation

6.3 for an example), and hence to predict the x_i. In our case, the x_i being predicted are part observed and part missing. Although the model is fully defined, this leads to complications in implementation in WinBUGS. These can be overcome by implementing Equation 6.1 directly only for the missing covariates, and using constraints on the MVN distribution when the xi are observed to force their

underlying latent variables to lie in the correct interval (i.e. if x_ik is observed to be 0, then x?

ik must

is not a typical use of the probit, which is usually used as an alternative link function to the logit in a generalised linear model for binary data.

The means, νi, and covariance, Σ, are not uniquely likelihood identified. To see why, consider modelling

a single binary covariate using a probit model, such that x?_i ∼ N (νi, σ), xi is defined as in Equation

6.1, νi = η0+ P_q

j=1ηjzij and z1, . . . , zq are fully observed variables. If all the η parameters and the

standard deviation, σ, are multiplied by the same constant, then the probability of the latent variable exceeding 0 will be unchanged. Similarly, in the case of multiple binary covariates, the probability of each latent variable exceeding 0 will not change if the parameters in the equation for νi are multiplied

by a constant, ci, and the covariance matrix is set to CΣC where C is a diagonal matrix with diagonal

elements {c1, c2, . . . , cp} (Chib and Greenberg, 1998). If Σ is constrained to be in correlation form,

the parameters in the νi equations and remaining elements of Σ are identifiable.

The standard conjugate prior for a covariance matrix is the Wishart distribution, but this does not allow structure to be imposed. To set the diagonal elements of a matrix to 1, requires separate priors for each element of that matrix. The difficulty is then ensuring that the matrix generated is positive definite. For two binary variables this is easy, but it becomes non-trivial for more than two variables. The unusual shapes of the sets of correlation coefficients for three and four variables are discussed by Rousseeuw and Molenberghs (1994), who give some intuition about the constraints that exist between the correlations. Molitor et al. (2009) use this approach for imputing the missing values for two binary covariates, in an application to low birth weight and water disinfection by-products.

An alternative approach is to set the model up as a sequence of conditional univariate models, which allows separate priors to be placed on the individual elements of the covariance matrix and guarantees that the covariance matrix of the underlying joint model is always positive definite. This can be done using the following properties of a MVN distribution (see, for example, Congdon (2001), Section 2.8). If a set of variables, x, can be modelled as x ∼ M V N (ν, Σ), then x can be partitioned into two subsets of variables, x1 and x2, such that

  x1 x2   ∼ MV N     ν1 ν2   ,   Σ11 Σ12 Σ21 Σ22     .

Further, if x₁has a marginal MVN distribution, x₁ ∼ M V N (ν₁, Σ₁₁), then the conditional distribution of x2 when x1 is known is x2|x1 ∼ M V N ¡ ν2+ Σ21Σ−111(x1− ν1 ¢ , Σ22− Σ21Σ−111Σ12). (6.2) The covariance matrix can be constrained to correlation form by setting Σ11 and Σ22 to 1. Sim- ilarly, using an obvious extension of notation, it can be shown that the conditional distribution of xj|xj−1, . . . , x1 is univariate Normal with mean and variance given by particular functions of xj−1, . . . , x1, νj, . . . , ν1 and Σ11, . . . , Σ1j, . . . , Σjj. Examples of the implementation of this approach

are provided in Section 6.2. Unfortunately the required equations rapidly become very complicated as the number of variables increases.

A third option is not to restrict the covariance matrix to correlation form, so that a Wishart distribution can be used as its prior. Although the means and covariances in such a model are not likelihood identified, they will be posterior identified if proper priors are specified (Gelfand and Sahu, 1999). This approach is taken by Lunn et al. (2006), who argue that using an unrestricted covariance matrix is not a major issue provided they have no interest in making inferences about the means or covariances (personal correspondence). It has been pointed out that even with improper priors, using MCMC it is still possible to obtain convergence of a lower-dimensional subset of parameters embedded in a non-identifiable model (Gelfand and Sahu, 1999). While the unidentified parameters will not converge, the identified parameters may be very well behaved (Kass et al., 1998).

In Section 6.2.1 we compare these different strategies, as a preliminary to setting up a covariate model of missingness for our MCS income example.

In document Bayesian methods for modelling non-random missing data mechanisms in longitudinal studies (Page 118-120)