A semi-parametric model for binary variables

We consider a study involving N patients, for which a D−dimensional binary vector is recorded at T subsequent time points. We denote the collection of all binary records with Y , an array having dimensions N × D × T . Let yi,d,tand yi,t

denote a single entry and a row of binary variables of Y for patient i at time t, respectively.

5.2.1 Dynamic multivariate Probit model

We model Y introducing an array of correlated continuous latent variables, which we denote with Z, having entries zi,d,t∈ R and such that:

yi,d,t= 1if and only if zi,d,t≥ 0.

Given the condition above, we can write the likelihood of Y as Pr(Y | Θ) = Z Y {yi,d,t=1} I[0,+∞)(zi,d,t) Y {yi,d,t=0} I(−∞,0)(zi,d,t)F (dZ | Θ),

5.2. A semi-parametric model for binary variables 137

We recall that T represents the time dimension of Y and we assume for the density of F (Z | Θ), which we denote by f (Z | Θ), the following factorisation

f (Z | Θ) = N Y i=1 ( f (zi,1 | Θi,1) T Y t=2

f (zi,t | zi,t−1, Θi,t)

)

, (5.2.1)

where zi,t = (zi,1,t, . . . , zi,D,t), i.e. the row of Z corresponding to the i−th patient

at time t. The latter equation imposes a Markov structure to the distribution of the latent variables, which directly determines the distribution of Y .

A computationally convenient assumption is to assume f (·) to be a multivariate Normal distribution. This has a number of implications. First, at any given time t, the probability of observing yi,tcan be calculated as an integral un-

der a multivariate Normal distribution, as in multivariate Probit models. Sec- ondly, the Markov structure assumed in (5.2.1) can be easily accommodated within the model using appropriate autoregressive terms. Furthermore, covariates can be easily included in the model. Finally, the Normal assumption simplifies the calculations, allowing the use of standard algorithms such as the one proposed by Albert and Chib [1993] for posterior inference.

Let X denote an array of dimension N ×P ×T , containing records of P time- dependent covariates. We write xi,t to indicate the row of X corresponding to

the i−th patient at the t−th time. Recalling (5.2.1), we specify the following distribution for the latent variable at time 1:

f (zi,1 | Θi,1 = (αi,1, Λ, Σi,1)) = MNormalD(zi,1 | αi,1+ Λx0i,1, Σi,1), (5.2.2)

where MNormalD(·)is the D−dimensional Normal distribution, αi,1is a vector

of intercepts of length equal to D and Λ is a D × P matrix of regression coefficients. Equivalently, we specify the following transition density for t = 2, . . . , T : f (zi,t | zi,t−1, Θi,t = (αi,t, Λ, Γ, Σi,t)) = MNormalD(zi,t | αi,t+Λx0i,t+Γz

i,t−1, Σi,t),

(5.2.3) where Γ is a D × D matrix containing the autoregressive coefficients.

The model described above is connected with SUR models in that at a cer- tain time point, given all parameters and lagged latent components, the distributions above imply D distinct regressions on the means of the latent variables.

138 Chapter 5. Dynamic BNP model for correlated binary variables

The latter are linked together via the error distributions governed by the covari- ance matrix Σi,t.

Λand Γ are assumed to be constant across time. This simplifies interpretabil- ity of the coefficients. In particular, simple Bayesian hypothesis testing on temporal dependence among the binary variables can be performed and this is an important requirement for our motivating application. However, extensions in- cluding temporal dependent versions of the matrices of coefficients Λt and Γt

can be easily specified.

5.2.2 Nonparametric prior model

The quantities αi,t and Σi,t characterise the baseline probability of observing

yi,t. We want to employ a prior distribution which is flexible enough to capture

possible heterogeneity across patients at different time point.

Let us introduce an additional array of covariates, U . This is an N × R × T array, where ui,t is a row of U which encodes the information about the i−th

patient at time t, such as treatment arm, risk group and other common indicators which may evolve overtime. This is not uncommon in biostatistics where clinicians classify patients into different classes of risk for developing a specific disease based on clinical history and characteristics. We consider the case in which U contains binary indicators. Consequently, the yi,t’s are implicitly clus-

tered according to the various combinations of the binary covariates contained in ui,t.

We include via a flexible model the information contained in U in the baseline probability of yi,t. Using an approach similar to the one employed in

mixed-effect modelling, we assume each group defined by U to have a specific random prior distribution as follows

(αi,t, Σi,t) | Gui,t ∼ Gui,t,

where Gui,t is a discrete distribution of the form

Gui,t =

∞

h=1

5.2. A semi-parametric model for binary variables 139

where, for each ui,t,

hwh,ui,t = 1, and δa denotes a unit point mass at a. The

use of such random prior distribution implies that the latent variable density can be written as

f (zi,t | zi,t−1, Λ, Γ, Gui,t)

= Z

f (zi,t | zi,t−1, Λ, Γ, αi,t, Σi,t)dGui,t(αi,t, Σi,t)

∞

h=1

wh,ui,tMNormalD(zi,t | αh+ Λx

i,t+ Γz 0

i,t−1, Σh),

which is an infinite location-scale mixture of multivariate Normal distributions, having weights which vary according to the components in ui,t.

Specifying a prior of Gui,t is equivalent to find suitable prior distributions

for the collection of the weights, wh,ui,t, and for the locations, αhand Σh.

Starting from the weights, we need a stochastic process prior indexed at various levels of ui,t, whose realisations are distributions over an infinite di-

mensional simplex. Furthermore, it is desirable to borrow strength across the different groups of observations implied by U . Different solutions for these tasks have been developed in the field of RPMx and a survey of the major alter- natives is in Section 1.4.

In Chapter 4 we have introduced the Dependent Generalised Dirichlet Pro- cess (DGDP), a process over collections of distributions which can be consid- ered as an extension of the DDP. We opt for using the same idea of the DGDP also in this chapter, but we discuss alternative solutions in the next paragraphs. DGDP assumes a particular stick-breaking process prior for the weights of (5.2.4) where w1,ui,t = v1,ui,t and

wh,ui,t = vh,ui,t

l<h

(1 − vl,ui,t), for h = 2, 3, . . .

with

v1,ui,t, v2,ui,t, . . . | µ, φ ∼Beta(vh,ui,t | φlogit

−1

(µui,t), φ(1 −logit −1

(µui,t))),

for all combination of ui,t. In the latter equation, φ is a positive parameter and µ

is vector of real parameters. The almost sure discreteness of DGDP realisations imposes a clustering structure of the attendance visits: observations sharing

140 Chapter 5. Dynamic BNP model for correlated binary variables

the same value of α∗h and Σ ∗

hcan be interpreted as a cluster. Compared to DDP

(which will in principle produce an equivalent clustering effect of attendance visits), the DGDP assumes extra flexibility to the distribution of the partition of the observations especially in terms of number and size of clusters as a conse- quence of the richer parameterisation of the GDP compared to the traditional DP (see Section 4.2.3).

Independently from the generation of the weights, the locations, which are shared among all Gui,t for different values of ui,t, are generated as follows. We

first assume

αh ∼ MNormalD(mD, σ2αID), for h = 1, 2, . . .

Σ∗_hrequires likelihood identifiability conditions imposed by the thresholding of the latent variables. In order to avoid over-restrictive constraints on the covari- ance matrix, we follow the works by Jara et al. [2007] and Pourahmadi [1999] where the conditional variances are constrained. We write Σ∗−1_h = L∗_h0IDL∗h,

where L∗

h is a lower triangular matrix, with ones on the diagonal and uncon-

strained values on the non-zero entries. Collecting these entries in the vector ν_h∗, which has Dν = D(D − 1)/2components, we assume

νh ∼ MNormalDν(mDν, σ

νIDν), for h = 1, 2, . . .

In this chapter we assume shared locations for all ui,t, following the argument

discussed in Chapter 4. However, it is possible to extend this construction in- cluding also covariate dependent locations. This requires the use of suitable stochastic process priors indexed by the covariates.

5.2.3 Prior distribution specification

The model described above requires the specification of prior (and hyperprior) distributions for the remaining unknown parameters. Let λd,p and γd,d0 denote

a single entry in matrix Λ and Γ respectively. We assume λd,p

iid

∼ Normal(λ | mλ, σ2λ)

In document Covariate dependent random measures with applications in biostatistics (Page 136-141)