Bayesian Covariance Estimation

Bayesian Mixed Effects Models

The most common model for repeated measurements is the linear mixed effects model of Laird and Ware [1982]. For individual i with ni repeated measurements, the mixed

effects model is given by

y_i=µ_i+x_iβ+z_ib_i+e_i, i=1,...,n, (1.36)

where y_i is an ni×1 response vector; µ_i is the overall mean vector; x_i is an n_i×p matrix of fixed covariates; β is a p×1 vector of fixed effects; z_i is an n_i×k matrix of random covariates;bi is ak×1 vector of random effects; ei is anni×1 vector of random

errors. It is standard to assumebi and ei are independent of each other and both are

normally distributed with bi ∼N(0,D) and e_i ∼N(0, σ2I_n_i). Under this assumption,

y_i∣µ_i,β,D, σ2∼N(µ_i+x_iβ+z_ib_i, σ2I_n_i). For Bayesian analysis of the random effects model [Zeger and Karim, 1991; Gilks et al., 1993], we specify the priors on µ_i, β, D

andσ2 _{traditionally, as the prior for} _µ_{is chosen to be} _P_(µ)₌d_N_(µ

0, σ2µ), the prior forβ

is chosen to beP(β)=dN(0,Σ_β)and the prior for σ2 is chosen to be P(σ2)=dIG(δ₂0,γ₂0). By far the most common approach to the prior onDis to use an inverse-Wishart prior, motivated by its conjugacy property. That is,P(D)=dIW(n₀,C₀). However, there are many other choices of prior on the covariance D.

Modeling a covariance structure is one of the most difficult and important tasks in statistical analysis [Barnard et al., 2000]. A covariance matrix may have many param- eters, which are constrained by a complex requirement that the matrix is nonnegative definite. There is no standard solution to the problem of choosing a prior on the covariance matrix in the mixed effects model, or hierarchical model [Kass and Natarajan, 2006]. Directly specifying a reasonable prior for a covariance matrix is not a easy task.

The usual inverse-Wishart prior is often inadequate because it is of restrictive form due to the common degrees of freedom for all the diagonal entries ofD. It is helpful to break the covariance matrix down into several components. There are several methods based on well-known matrix decompositions. Barnard et al. [2000] worked with the variance- correlation decomposition of the covariance matrix. Boik [2002] proposed a spectral decomposition on the matrix. Another approach is to use the Cholesky decomposition of the covariance matrix [Pourahmadi, 1999; Chen and Dunson, 2003].

Inverse Wishart Prior

Suppose A is a p×p positive definite random matrix. The Wishart distribution with n0 degree of freedom is characterized by A∼W(n₀,C−₀1) if and only if A= ∑n_i₌0₁zizT_i ,

where p dimensional vectorsz1, ...,zn are i.i.d. random samples from N(0,C−01). The

diagonal elements of a Wishart matrix A are chi-square random variables. That is, aii ∼ σ_iiχ2_n, where a_ii is the ith diagonal element of A and σ_ii is the ith diagonal element of C−₀1. If D = A−1, D have an inverse Wishart distribution, denoted by D ∼ IW(n₀,C₀). The mean of inverse Wishart distribution is E(D) = _nC0

0−p−1,

which means large n0 makes the prior relatively noninformative. Based on the inverse

Wishart prior and other priors as described earlier, the conditional posterior distribution of D is P(D∣y,µ,b, σ2)=dIW(n+n₀,C₀+σ−2∑n_i₌₁bibiT) where y = (yi, ...,yn)T, µ = (µ_i, ...,µ_n)T and b = (b_i, ...,b_n)T. The inverse Wishart prior results in a strong dependence between variance and correlation: high variance implies high correlation and low variance implies low or moderate correlation. To overcome this problem, the inverse Wishart prior for D where the scale matrix is determined from the variance can be considered [Kass and Natarajan, 2006].

Variance-Correlation Decomposition

The covariance matrixDcan be decomposed asD=SRSwhereSis thek×kdiagonal matrix of standard deviations and R is the k×k correlation matrix [Barnard et al., 2000]. This decomposition has a strong practical appeal since these two factors of D

are easily interpreted in terms of standard deviations and correlations. We can write the prior on D in terms of (S,R) as P(S,R) = P(S)P(R∣S). Since S only have k-dimensional elements with component-wise nonnegativity as the only constraint, we can specify the prior for S as P(log(s))=dN(ψ,Λ) where s= (s₁, ..., s_n) are diagonal elements of S and log(s) = (log(s1), ..., log(sn)). The choice of prior for R given S is

more complicated due to the complexity of space of correlation matrices, and often a marginally uniform prior or jointly uniform prior are used forR[Barnard et al., 2000].

Spectral Decomposition

The spectral decomposition of a covariance matrix D is given by D = PΛPT = ∑q_i₌₁λieieT_i , whereΛ is a diagonal matrix of eigenvalues with theith elementλi andP

is the orthogonal matrix of normalized eigenvectors with the ith column, ei [Pourah-

madi, 2011]. There are three classes of priors on Λ and P and details can be found in [Leonard and Hsu, 1992; Yang and Berger, 1994; Daniels and Kass, 1999].

Cholesky Decomposition

The standard Cholesky decomposition of a positive-definite matrix is given by D =

LLT, where L is a unique lower-triangular matrix with positive diagonal elements. Statistical interpretation of the elements ofLis difficult in the present form. However, rescalingLto unit lower-triangular matrices by the inverse of∆=diag(δ_i, ..., δ_k)makes the statistical interpretation of the diagonal elements of Land the components of the modified Cholesky decomposition easier. Pourahmadi [1999] proposed the following

modified Cholesky decomposition

D=L∆−1∆∆∆−1LT =Ψ∆2ΨT, (1.37)

where Ψ = L∆−1. Chen and Dunson [2003] presented another modified Cholesky decomposition as follows.

D=∆∆−1LLT∆−1∆=∆ΨΨT∆, (1.38)

where Ψ=∆−1Lis obtained from L by dividing the elements of its ith row by δ_i. In this decomposition, ∆ is a diagonal matrix with elements proportional to the square roots of the diagonal elements ofD and Ψis a unit lower-triangular matrix solely de- termining its correlation matrix. This total separation of variance and correlation is one of advantages over the more traditional modified Cholesky decomposition of Pourah- madi [1999] [Pourahmadi, 2007]. Given Chen and Dunson [2003]’s decomposition, the reparameterized mixed effects model can be written as

y_i =µ_i+x_iβ+z_i∆Ψc_i+e_i, i=1,...,n, (1.39)

where ci= (c1, ..., ck)T is k×1 vector of independent standard normal latent variables. In choosing priors for ∆ and Ψ and hence for D, we permit the variance of random effects to have zero values [Chen and Dunson, 2003]. Conditionally conjugate prior distributions for ∆ and Ψ are chosen as P(∆,ψ) =P(ψ∣∆)P(∆). The prior distribution for ψ conditional on ∆ is given by P(ψ∣∆) ∝N(ψ∣ψ₀,R₀)I(ψ∈R_∆) where ∆= (δ1, ..., δk)T, ψ = (ψ_ml ∶m = 2, ..., k;l =1, ..., m−1)T and R_∆ = {ψ∶ ψ_ml =ψ_lm′ = 0 if δl=0, l=1, ..., k, m=l+1, ..., k, m′=1, ..., l−1}. Here, the prior forψ, conditional on∆, is proportional to a N(ψ₀,R₀)multiplied by an indicator function that imposes

zero on the elements ofψ corresponding to zero∆. For ∆, we assume that the δ’s are independent of each other, so that P(∆) = Πk_l₌₁P(δ_l). Let ZI −N+(π, µ, σ2) denote the density of a zero inflated half normal distribution comprised of a point mass at zero with probability π and a N(µ, σ2₎ _{density truncated below by zero. The prior}

distribution for ∆ is given by P(∆) = Πk

l=1P(δl) = Π k

l=1ZI−N+(δl∣pl0, ml0, s 2

l0) where

pl0,ml0 and s2_l0 are hyperparameters to be specified.

In document 6061.pdf (Page 37-41)