Geometric ergodicity of the Gibbs sampler for Bayesian quantile regression

(1)

Contents lists available atSciVerse ScienceDirect

Journal of Multivariate Analysis

journal homepage:www.elsevier.com/locate/jmva

Geometric ergodicity of the Gibbs sampler for Bayesian

quantile regression

Kshitij Khare, James P. Hobert

∗

Department of Statistics, University of Florida, United States

a r t i c l e i n f o

Article history:

Received 10 October 2011 Available online 7 June 2012 AMS subject classifications: primary 60J27

secondary 62F15 Keywords: Convergence rate Geometric drift condition Markov chain

Monte Carlo

a b s t r a c t

Consider the quantile regression model Y = Xβ + σϵwhere the components ofϵare i.i.d. errors from the asymmetric Laplace distribution with rth quantile equal to 0, where r∈(0,1)is fixed. Kozumi and Kobayashi (2011) [9] introduced a Gibbs sampler that can be used to explore the intractable posterior density that results when the quantile regression likelihood is combined with the usual normal/inverse gamma prior for(β, σ ). In this paper, the Markov chain underlying Kozumi and Kobayashi’s (2011) [9] algorithm is shown to converge at a geometric rate. No assumptions are made about the dimension of X , so the result still holds in the ‘‘large p, small n’’ case.

1. Introduction

In the usual quantile regression model, the conditional quantile function of Y given X

=

x takes the form Q

(

r

|

X

=

x

) =

xT

_β(

_r

₎

_{, where x is a p}

_×

_{1 vector of covariates and, for fixed r}

_∈

₍

₀

_,

₁

_{), β(}

_r

₎

_{is a p}

_×

_{1 regression parameter. The standard}

(frequentist) estimator of

β(

r

)

based on a sample of size n is the minimizer of

n



i=1

ρ

r



Yi

−

xTi

β,

(1)

where the loss function

ρ

ris defined as

ρ

r

(

u

) =

u



r

−

I

(

u

<

0

)

(see, e.g., [7]).

Yu and Moyeed [14] pointed out that the minimizer of(1)is, in fact, the maximum likelihood estimator of

β

under the fully parametric model Y_i

=

xT

i

β + ϵ

iwhere

{

ϵ

i

}

ni=1are assumed to be i.i.d. with common density given by

g

(ϵ;

r

) =

r

(

1

−

r

)



e(1−r)ϵI_R₋

(ϵ) +

e−rϵI_R₊

(ϵ),

(2)

where R+

:=

(

0

, ∞)

and R−

:=

(−∞,

0

]

. It is easy to see that this error density, which is called the asymmetric Laplace

density, has rth quantile equal to zero. (When r

=

1

/

2

,

g becomes the standard Laplace density with location and scale

equal to 0 and 1

/

2, respectively.)

In this paper, we consider a Bayesian version of a fully parametric quantile regression model in which the errors are from an unknown member of a scale family based on the asymmetric Laplace distribution. In particular, we assume that

∗_{Corresponding author.}

E-mail address:[email protected](J.P. Hobert).

(2)

Yi

=

xTi

β + σϵ

i, where

{

ϵ

i

}

ni=1are i.i.d. with common density(2)and

σ ∈

R+is an unknown scale parameter. We do not

assume that n

≥

p. Suppose that

π(β, σ)

is a proper prior density for

(β, σ )

. The posterior density of

(β, σ )

given the data,

y

=

(

y1

, . . . ,

yn

)

T, is defined to be

π(β, σ|

y

) =

f

(

y

;

β, σ ) π(β, σ )

m

(

y

)

,

where f

(

y

;

β, σ )

is the joint density of Y1

, . . . ,

Ynat the point y, that is, f

(

y

;

β, σ ) =

rn

(

1

−

r

)

n

σ

−n n



i=1



e(1−r)(yi−xTiβ)/σ_I R−

(

yi

−

xTi

β) +

e −r(yi−xT iβ)/σ_I R+

(

yi

−

xTi

β)



,

and the marginal density (normalizing constant) is given by

m

(

y

) :=



Rp



R+ f

(

y

;

β, σ) π(β, σ)

d

σ

d

β.

Unfortunately, any non-trivial prior on

(β, σ)

leads to an intractable posterior. However, Kozumi and Kobayashi [9] showed that, if the usual normal/inverse gamma prior is adopted, then there is a simple Gibbs sampler that can be used to explore the resulting posterior density. Their algorithm exploits a latent data formulation of the quantile regression model that is based on a normal/exponential mixture representation of the asymmetric Laplace distribution [8, Chapter 3].

Define

θ = θ(

r

) =

_r1−2r₍₁₋_r₎ and

τ

2

₌

_τ

2

₍

_r

_{) =}

2

r(1−r). Let

{

(

Yi

,

Zi

)}

n

i=1 be independent random pairs such that

Yi

|

Zi

=

zi

∼

N

(

xTi

β + θ

zi

,

zi

στ

2

)

and, marginally, Zi

∼

Exp

(σ)

. Straightforward calculations (provided inAppendix A)

show that the marginal density of Yiis given by



∞ 0 1

√

2

π

z

σ τ

2exp



−

1 2z

στ

2



yi

−

xTi

β − θ

z



2



1

σ

exp

{−

z

/σ}

dz

=

r

(

1

−

r

)

σ



(

yi

−

xTi

β) +

e −r(yi−xT_iβ)/σ_I R+

(

yi

−

xTi

β),

(3)

which is precisely the distribution of Yiunder the original model. This establishes the Zis as latent data. Of course, the joint

density of

{

(

Yi

,

Zi

)}

ni=1is given by f∗

(

y

,

z

;

β, σ ) =

n



i=1



1



2

π

zi

σ τ

2 exp



−

1 2zi

στ

2



yi

−

xTi

β − θ

zi



2



σ

−1_exp



−

zi

σ



,

where z

=

(

z1

, . . . ,

zn

)

T, and(3)implies that



Rn+

f∗

(

y

,

z

;

β, σ )

dz

=

f

(

y

;

β, σ).

(4)

Combining the latent data model with the prior

π(β, σ)

yields the augmented posterior density defined as

π(β, σ ,

z

|

y

) =

f

∗

₍

_y

_,

_z

_;

_{β, σ ) π(β, σ)}

m

(

y

)

.

(5)

It follows immediately from(4)that



Rn+

π(β, σ ,

z

|

y

)

dz

=

π(β, σ|

y

),

which is our target posterior density. The key fact underlying Kozumi and Kobayashi’s [9] Gibbs sampler is that, if a normal/inverse gamma prior is used for

(β, σ)

, then simulating from certain conditional densities associated with

π(β, σ,

z

|

y

)

is straightforward. Indeed, assume that

β

and

σ

are a priori independent with

β ∼

Np

(

m

,

Σ

)

and

σ ∼

IG

(α, γ )

.

(We say W

∼

IG

(

a

,

b

)

if its density is proportional to

w

−a−1_e−wb_I

R+

(w)

.) Then, given

(β, σ ,

y

)

, the components of

Z

=

(

Z1

, . . . ,

Zn

)

T are independent, and the reciprocal of Zi has an inverse Gaussian distribution. Moreover,

β|

z

, σ ,

y is

multivariate normal, and

σ|

z

, β,

y is inverted gamma. The precise forms of these conditional densities are provided in

Section2.

Let

{

(β

_m

, σ

_m

)}

∞

m=0be a Markov chain (with state space Rp

×

R+) whose dynamics are defined (implicitly) through the

following three-step procedure for moving from the current state,

(β

n

, σ

n

) = (β, σ)

, to

(β

n+1

, σ

n+1

)

.

Iteration n

+

1 of Kozumi and Kobayashi’s Gibbs sampler: 1. Draw Z

∼

π(·|β, σ ,

y

)

, and call the observed value z. 2. Draw

σ

n+1

∼

π(·|

z

, β,

y

)

.

(3)

In Section2, the Markov transition density (Mtd) of the Gibbs Markov chain,

{

(β

_m

, σ

_m

)}

∞_m₌₀, is defined and then used to establish that the chain is well behaved (i.e., Harris ergodic) and converges to the target posterior distribution. Thus, we can use this chain to construct strongly consistent estimators of intractable posterior expectations. To be specific, for k

>

0, let

Lk

(π)

denote the set of functions g

:

Rp

×

R+

→

R such that

E_π

|

g

|

k

:=



Rp



R+

|

g

(β, σ)|

k

π(β, σ|

y

)

d

σ

d

β < ∞.

Harris ergodicity implies that, if g

∈

L1

(π)

, then the estimator gm

:=

m1



m−1

i=0 g

(β

m

, σ

m

)

is strongly consistent for Eπg, no

matter how the chain is started. Of course, in practice, an estimator is only useful if it is possible to compute an associated standard error. All available methods of computing a valid asymptotic standard error for g_mare based on the existence of a central limit theorem (CLT) for g_m; that is, we require that

√

m



g_m

−

E_πg



d

→

N

(

0

, φ

2

),

for some positive, finite

φ

2_{. Unfortunately, even if g}

_∈

_L

k

(π)

for all k

>

0, Harris ergodicity is not enough to guarantee the

existence of such a CLT (see, e.g., [11,12]). The standard method of establishing the existence of CLTs is to prove that the underlying Markov chain converges at a geometric rate.

LetB

(

X

)

denote the Borel sets inX

:=

_Rp

_×

R+, and let Pm

:

X

×

B

(

X

) → [

0

,

1

]

denote the m-step Markov transition

function of the Gibbs Markov chain. That is, Pm



(β, σ ),

A



is the probability that

(β

m

, σ

m

) ∈

A, given that the chain is started

at

(β

0

, σ

0

) = (β, σ )

. Also, letΠ

(·)

denote the posterior distribution. The chain is called geometrically ergodic if there exist

a function M

:

X

→ [

0

, ∞)

and a constant

λ ∈ [

0

,

1

)

such that, for all

(β, σ) ∈

Xand all m

=

0

,

1

, . . .

, we have



Pm



(β, σ ), · −

Π

(·)



TV

≤

M

(β, σ)λ

m

_,

where

∥ · ∥

_TVdenotes the total variation norm. The relationship between geometric convergence and CLTs is simple: if the chain is geometrically ergodic and E_π

|

g

|

2+δ

_{< ∞}

_{for some}

_{δ >}

_{0, then g}

msatisfies a CLT. Moreover, because the Mtd is

strictly positive onX(see Section2), the same 2

+

δ

moment condition implies that the usual estimators of the asymptotic variance,

φ

2, are consistent [2–5]. Our main result, which is proven in Section3using a geometric drift condition, is the following.

Proposition 1. Kozumi and Kobayashi’s [9] Gibbs Markov chain is geometrically ergodic.

We note that Khare and Hobert [6] considered a simplified version of our parametric Bayesian quantile regression model in which the scale parameter,

σ

, is known. The posterior density is still intractable in that case, despite the absence of a scale parameter. However, the latent data described above can be used to build a two-step Gibbs sampler for exploring that intractable posterior [9]. Khare and Hobert [6] established geometric ergodicity of the Markov chain underlying that algorithm. It is important to note that their result is not a special case ofProposition 1.

2. The conditional densities and the Gibbs Markov chain

Implementation of Kozumi and Kobayashi’s [9] algorithm is quite simple because all three conditional densities have standard forms. Indeed, since

π(σ |

z

, β,

y

) ∝ π(β, σ ,

z

|

y

)

, it is easy to see that

σ |

z

, β,

y

∼

IG

(α

′

, γ

′

)

where

α

′

₌

_{α +}

3n 2 and

γ

′

₌

_{γ +}

n



i=1



yi

−

xTi

β − θ

zi



2 2zi

τ

2

+

n



i=1 zi

.

Now, let X be the n

×

p matrix with ith row equal to xT

i. (Note that we do not assume that n

≥

p.) Also, let U denote an n

×

n diagonal matrix whose ith diagonal element is

(σ τ

2zi

)

−1, and let l denote an n

×

1 vector of ones. Standard Bayesian

regression-type calculations show that

β|

z

, σ,

y

∼

Np

(

m′

,

Σ′

)

where m′

=



XTUX

+

Σ−1



−1



XTUy

−

θ

σ τ

2X T_l

₊

_Σ−1_m



and Σ′

=



XTUX

+

Σ−1



−1

.

Finally, it follows from(5)that the components of Z

=

(

Z1

, . . . ,

Zn

)

Tare conditionally independent given

(β, σ ,

y

)

, and

π(

zi

|

β, σ ,

y

) ∝

1

√

zi exp



−



yi

−

xTi

β

2 2zi

σ τ

2

−



θ

2

₊

₂

_τ

2



zi 2

στ

2



.

When yi

−

xTi

β =

0, this is a gamma density. Otherwise, it is the density of the reciprocal of an inverse Gaussian random

variable with parameters

µ

i

=

√

θ

2

₊

₂

τ

2

|

yi

−

xTi

β|

and

λ

_i

=

θ

2

₊

₂

_τ

2

στ

2

.

(4)

In either case, we can write

π(

zi

|

β, σ ,

y

) =



θ

2

₊

₂

τ

2 2

πσ τ

2_z i exp



−



yi

−

xTi

β

2 2zi

στ

2

+

√

θ

2

₊

₂

τ

2

_|

_y i

−

xTi

β|

σ τ

2

−



θ

2

₊

₂

_τ

2



zi 2

σ τ

2



.

Let

η

denote Lebesgue measure on Rp

_×

R+. The Gibbs Markov chain has an Mtd (with respect to

η

) given by

k

(β, σ | β

′

, σ

′

) =



Rn+

π(β|σ ,

z

,

y

) π(σ|

z

, β

′

,

y

) π(

z

|

β

′

, σ

′

,

y

)

dz

.

(6) A straightforward calculation shows that



Rp



R+

k

(β, σ | β

′

, σ

′

) π(β

′

, σ

′

|

y

)

d

σ

′d

β

′

=

π(β, σ|

y

),

so the target density is invariant. The Mtd is strictly positive, which implies that the chain is aperiodic and

η

-irreducible [10, p. 87]. Moreover, the existence of an invariant probability density together with

η

-irreducibility implies that the chain is positive Harris recurrent (see, e.g., [1]). Note also that

η

is equivalent to the maximal irreducibility measure.

3. The Gibbs Markov chain is geometrically ergodic

In this section, we proveProposition 1by establishing a geometric drift condition. In particular, we will prove the following result.

Proposition 2. There exist a

ρ ∈ [

0

,

1

)

and a finite constant L such that, for every

(β

′

_{, σ}

′

_{) ∈}

Rp

×

R+,

E



v(β, σ ) | β

′

, σ

′

 ≤

ρ v(β

′

, σ

′

) +

L

,

(7)

where the drift function is defined as

v(β, σ ) = σ +

_σ

1

+

n



i=1



yi

−

xTi

β

2

+

β

TΣ−1

β.

The reason why the geometric drift condition(7) implies geometric ergodicity of the Markov chain is laid out in

Appendix B.

Proof of Proposition 2. The expectation on the left-hand side of(7)can be broken down into three conditional expectations. Indeed, E



v(β, σ) | β

′

_{, σ}

′

_{ =}



R+



Rp

v(β, σ)

k

(β, σ | β

′

, σ

′

)

d

β

d

σ

=



Rn+





R+





Rp

v(β, σ) π(β|σ ,

z

,

y

)

d

β



π(σ|

z

, β

′

,

y

)

d

σ



π(

z

|

β

′

, σ

′

,

y

)

dz

.

(8) Here is a brief outline of the remainder of the proof. First, we develop an upper bound of the form b1

(σ ) +

c1(where c1is

constant) for the inner-most integral in(8). We then construct a function b2

(

z

, β

′

)

such that



R+b1

(σ ) π(σ|

z

, β

′

_,

_y

₎

_d

_{σ ≤}

b2

(

z

, β

′

) +

c2. Finally, we show that



Rn+b2

(

z

, β

′

_{) π(}

_z

_|

_β

′

_{, σ}

′

_,

_y

₎

_dz

_≤

_{ρ v(β}

′

_{, σ}

′

_{) +}

_c

3, and the result follows immediately.

Before we begin analyzing the inner-most integral, we need a few definitions and facts. For a vector a, define

∥

a

∥ =

√

aT_a,

and for a matrix A, define

∥

A

∥ =

sup∥x∥=1

∥

Ax

∥

. In general,

∥

a

+

b

∥

2

≤

2

∥

a

∥

2

+

2

∥

b

∥

2, and

∥

ABx

∥ ≤ ∥

A

∥∥

Bx

∥

. Of course,



n i=1



yi

−

xTi

β

2

= ∥

y

−

X

β∥

2_{and we have}

∥

y

−

X

β∥

2

≤

2

∥

y

∥

2

+

2

∥

X

β∥

2

=

2

∥

y

∥

2

+

2

∥

XΣ12Σ− 1 2

β∥

2

≤

2

∥

y

∥

2

+

2

∥

XΣ21

∥

2

∥

Σ−12

β∥

2

.

₍₉₎

It follows from(9)that

v(β, σ ) ≤ σ +

_σ

1

+

2

∥

y

∥

2

+



2

∥

XΣ12

∥

2

+

1



(5)

Now using(10)we see that



Rp

v(β, σ ) π(β|σ ,

z

,

y

)

d

β ≤ σ +

1

σ

+

2

∥

y

∥

2

₊



2

∥

XΣ12

∥

2

+

1



E



∥

Σ−12

β∥

2

|

σ ,

z

,

y

.

(11) LetX

˜

=

XΣ12. Given

(σ ,

z

,

y

)

,Σ− 1

2

β

is multivariate normal with mean



_˜

XTUX

˜

+

I



−1



˜

XTUy

−

θ

στ

2X

˜

T l

+

Σ−12_m



and covariance matrix



X

˜

T_U_X

˜

₊

_I



−1

. Therefore, lettingx

˜

idenote the ith column ofX

˜

T, we have

E



∥

Σ−12

β∥

2

|

σ ,

z

,

y



=





_˜

XTUX

˜

+

I



−1



˜

XTUy

−

θ

σ τ

2

˜

XTl

+

Σ−12_m





2

+

tr





_˜

XTUX

˜

+

I



−1



≤

2





_˜

XTUX

˜

+

I



−1X

˜

TUy



2

+

2



θ

στ

2X

˜

T_l



2

+

2



Σ− 1 2_m



_

2

₊

_tr

(

_I

)

=

2



n



i=1



_n



j=1

˜

xjx

˜

Tj

στ

2_z j

+

I



−1

˜

xiyi

σ τ

2_z i



2

+

2

θ

2

σ

2

_τ

4



X

˜

Tl



2

+

2



Σ −1₂ m



2

+

p

,

(12)

where the inequality is due to the fact that I

−



X

˜

TUX

˜

+

I



−1is non-negative definite. Now, the triangle inequality and some rearrangement yields



n



i=1



_n



j=1

˜

xjx

˜

Tj

σ τ

2_z j

+

I



−1

˜

xiyi

στ

2_z i



2

≤



_n



i=1





˜

xix

˜

Ti

σ τ

2_z i

+



j̸=i

˜

xjx

˜

Tj

σ τ

2_z j

+

I



−1

˜

xiyi

σ τ

2_z i





2

=



_n



i=1

|

yi

|





˜

xi

˜

xTi

+



j̸=i zi zj

˜

xjx

˜

Tj

+

στ

2_z iI



−1

˜

xi





2

.

(13)

We now employ the following result.

Lemma 1 ([6]). Fix n

∈ {

2

,

3

, . . .}

and p

∈

_{N, and let t}₁

, . . . ,

t_nbe vectors in Rp_{. Then} Cp,n

(

t1

;

t2

, . . . ,

tn

) :=

sup c∈Rn+ t₁T



t1t1T

+

n



i=2 cititiT

+

c1I



−2 t1 is finite.

It follows fromLemma 1that(13)is bounded above by a finite constant that we will call C . This fact combined with(12)

yields E



∥

Σ−12

β∥

2

|

σ ,

z

,

y



≤

2C

+

2

θ

2

σ

2

τ

4



X

˜

Tl



2

+

2



Σ− 1 2_m



2

+

p

.

(14)

Combining(11)with(14), we have



Rp

v(β, σ ) π(β|σ ,

z

,

y

)

d

β ≤ σ +

1

σ

+

1

σ

2



2

θ

2



2

∥

XΣ12

∥

2

+

1



∥ ˜

XTl

∥

2

τ

4



+

C′

,

(15) where C′

=

2

∥

y

∥

2

+



2

∥

XΣ12

∥

2

+

1



2C

+

2



Σ −1₂ m



2

+

p

.

The next step is to bound the integral of the right-hand side of(15)against

π(σ |

z

, β

′

,

y

)

. First, note that E



1

σ

|

β

′

_,

z

,

y



=

α

′

γ

′

=



α +

3n 2



γ +



n i=1



yi

−

xTi

β

′

₋

_θ

_z i



2 2zi

τ

2

+

n



i=1 zi



−1

≤

2

α +

3n 2

γ

.

(16)

(6)

Similarly, E



1

σ

2

|

β

′

_,

z

,

y



=

α

′

_(α

′

₊

₁

₎



γ

′



2

=



α +

3n 2



α +

3n 2

+

1



γ +



n i=1



yi

−

xTi

β

′

₋

_θ

zi



2 2zi

τ

2

+

n



i=1 zi



−2

≤

(

2

α +

3n

)(

2

α +

3n

+

2

)

4

γ

2

.

(17) Finally, E

[

σ | β

′

,

z

,

y

] =

γ

′

α

′

₋

₁

=



2 2

α +

3n

−

2



γ +

n



i=1



yi

−

xTi

β

′

₋

_θ

_z i



2 2zi

τ

2

+

n



i=1 zi



.

(18)

Now, combining(15)–(18), we have



R+





Rp

v(β, σ )π(β|σ ,

z

,

y

)

d

β



π(σ |

z

, β

′

,

y

)

d

σ ≤



2 2

α +

3n

−

2



×



_n



i=1



yi

−

xTi

β

′

₋

_θ

_z i



2 2zi

τ

2

+

n



i=1 zi



+

C′′

,

(19) where C′′

=

2

α +

3n 2

γ

+



2

θ

2

(

2

∥

XΣ12

∥

2

+

1

)∥ ˜

XTl

∥

2

τ

4

 (

2

α +

3n

)(

2

α +

3n

+

2

)

4

γ

2

+



2

γ

2

α +

3n

−

2



+

C′

.

The last step is to bound the integral of the right-hand side of(19)against

π(

z

|

β

′

_{, σ}

′

_,

_y

₎

_{. First, note that}

n



i=1



yi

−

xTi

β

′

₋

_θ

_z i



2 2zi

τ

2

+

n



i=1 zi

=

1 2

τ

2 n



i=1



yi

−

xTi

β

′



2 zi

+

 θ

2 2

τ

2

+

1



_n



i=1 zi

−

θ

τ

2 n



i=1



yi

−

xTi

β

′



.

(20)

Assume for the moment that yi

−

xTi

β

are all non-zero. Then it follows from properties of the inverse Gaussian distribution

that E

[

zi

|

β

′

, σ

′

,

y

] =

1

µ

i

+

1

λ

i

=

|

yi

−

x T i

β

′

_|

√

θ

2

₊

₂

τ

2

+

σ

′

_τ

2

θ

2

₊

₂

τ

2 and E



1 zi

|

β

′

, σ

′

,

y



=

µ

_i

=

√

θ

2

₊

₂

τ

2

|

yi

−

xTi

β

′

_|

.

Thus, the integral of(20)against

π(

z

|

β

′

_{, σ}

′

_,

_y

₎

_{is equal to}

√

θ

2

₊

₂

τ

2 2

τ

2 n



i=1

|

yi

−

xTi

β

′

_{| +}

 θ

2 2

τ

2

+

1



_n



i=1

|

yi

−

xTi

β

′

_|

√

θ

2

₊

₂

τ

2

+

n

σ

′ 2

−

θ

τ

2 n



i=1



yi

−

xTi

β

′



.

(21)

Now note that, if yi

−

xTi

β

′

₌

_{0, then the only term containing z}

ion the right-hand side of(20)is

 θ

2

τ

2

+

1



zi

which has expectation

σ

′

_/

_{2. Hence,}₍₂₁₎_{continues to hold even when y}

i

−

xTi

β

′

₌

_{0 for some (or all) i. It is clear that}₍₂₁₎

is bounded above by

√

θ

2

₊

₂

τ

2 2

τ

2 n



i=1

|

yi

−

xTi

β

′

_{| +}

 θ

2 2

τ

2

+

1



_n



i=1

|

yi

−

xTi

β

′

_|

√

θ

2

₊

₂

τ

2

+

n

σ

′ 2

+

θ

τ

2 n



i=1

|

yi

−

xTi

β

′

_|

_.

(22)

(7)

Now, using the inequality

|

x

| ≤

(

x2

+

1

)/

2 three times (twice with x

= |

yi

−

xTi

β

′

_|

and once with x

= |

yi

−

xTi

β

′

_|

_/

√

_θ

2

₊

₂

_τ

2_),

we can show that(22)is bounded above by

θ

2

₊

₂

τ

2

₊

₂

θ +

₁



n



i=1



yi

−

xTi

β

′



2 4

τ

2

+

n

√

θ

2

₊

₂

τ

2

₊

_2n

θ +

_n



θ

2

₊

₂

_τ

2



4

τ

2

+

n

σ

′ 2

.

(23) Combining(20)–(23), we have



Rn+



_n



i=1



y_i

−

xT i

β

′

₋

_θ

_z i



2 2zi

τ

2

+

n



i=1 zi



π(

z

|

β

′

, σ

′

,

y

)

dz

≤

θ

2

₊

₂

τ

2

₊

₂

θ +

₁



n



i=1



yi

−

xTi

β

′



2 4

τ

2

+

n

√

θ

2

₊

₂

_τ

2

₊

_2n

_{θ +}

_n



θ

2

₊

₂

_τ

2



4

τ

2

+

n

σ

′ 2

.

(24) Finally,(19)together with(24)yields

E



v(β, σ ) | β

′

_{, σ}

′

_{ =}



Rn+





R+





Rp

v(β, σ ) π(β|σ,

z

,

y

)

d

β



π(σ |

z

, β

′

,

y

)

d

σ



π(

z

|

β

′

, σ

′

,

y

)

dz

≤



1 2

α +

3n

−

2

 √θ

2

₊

₂

τ

2

₊

₂

θ +

₁ 2

τ

2 n



i=1



yi

−

xTi

β

′



2

+

n

σ

′



+

L

,

(25) where L

=



2 2

α +

3n

−

2



n

√

θ

2

₊

₂

τ

2

₊

_2n

θ +

_n



θ

2

₊

₂

_τ

2



4

τ

2



+

C′′

.

Now, recalling that

θ =

_r1−2r₍₁₋_r₎and

τ

2

₌

2

r(1−r), we have

√

θ

2

₊

₂

τ

2

₊

₂

θ +

₁ 2

τ

2

=

1 4

+

1

−

2r 2

+

r

(

1

−

r

)

4

≤

1 4

+

1 2

+

1 16

<

1

.

This fact in conjunction with(25)leads to

E



v(β, σ ) | β

′

, σ

′

 ≤

1 2

α +

3n

−

2



n n



i=1



yi

−

xTi

β

′



2

+

n

σ

′



+

L

=

n 2

α +

3n

−

2



_n



i=1



yi

−

xTi

β

′



2

+

σ

′



+

L

≤

n 2

α +

3n

−

2

v(β

′

_{, σ}

′

_{) +}

L

=

ρ(

n

, α) v(β

′

, σ

′

) +

L

,

where

ρ(

n

, α) =

n

/(

2

α +

3n

−

2

)

. Since n

≥

1 and

α >

0

, ρ(

n

, α) <

1 and the proof is complete.

4. Discussion

We have established the existence of a function M

:

_Rp

_×

R+

→ [

0

, ∞)

and a constant

λ ∈ [

0

,

1

)

such that, for all

(β, σ) ∈

Rp

×

R+and all m

=

0

,

1

,

2

, . . .

,



Pm



(β, σ ), · −

Π

(·)



TV

≤

M

(β, σ)λ

m

_.

This is a qualitative geometric convergence result in the sense that we have not actually identified M and

λ

. However, as explained in the Introduction, this qualitative result is enough to guarantee the existence of CLTs. On the other hand, there are techniques for constructing M and

λ

(see, e.g., [13]), and these require both a drift condition (with explicit formulas for

ρ

and L), and an associated minorization condition. We have provided an explicit formula for

ρ

. Indeed,

ρ = ρ(

n

, α) =

n

/(

2

α +

3n

−

2

)

. However, we do not have an explicit formula for L. The sole reason for this is that

(8)

Acknowledgments

The first author was supported by NSF Grant DMS-11-06084, and the second author by NSF Grants DMS-08-05860 & DMS-11-06395.

Appendix A. The marginal of Yiunder the two-stage hierarchy

Here we establish(3). First,



∞ 0 1

√

2

π

z

σ τ

2exp



−

1 2z

σ τ

2



yi

−

xTi

β − θ

z



2



1

σ

exp

{−

z

/σ}

dz

=

√

1 2

πσ

2

τ

2exp

 θ(

yi

−

xTi

β)

σ τ

2





∞ 0 1

√

zexp



−

1 2z

σ τ

2

(

yi

−

x T i

β)

2

₊

_z2

_(θ

2

₊

₂

_τ

2

₎





dz

.

Now



∞ 0 1

√

zexp



−

1 2z

σ τ

2

(

yi

−

x T i

β)

2

₊

_z2

_(θ

2

₊

₂

_τ

2

₎





dz

=



∞ 0 1

w

3/2exp



−

1 2

wσ τ

2

w

2

₍

_y i

−

xTi

β)

2

₊

_(θ

2

₊

₂

_τ

2

₎





d

w

=

√

2

πσ τ

2

√

θ

2

₊

₂

τ

2exp



−

√

θ

2

₊

₂

τ

2

_|

_y i

−

xTi

β|

σ τ

2



,

where the first equality follows from the transformation

w =

1

/

z, and the second follows from the fact that the inverse

Gaussian density integrates to unity. Now, putting things back together and using the definitions of

θ

and

τ

2_{, we see that}

the marginal density of Yiis

1

√

2

πσ

2

τ

2exp

 θ(

yi

−

xTi

β)

σ τ

2



√

2

πσ τ

2

√

θ

2

₊

₂

τ

2exp



−

√

θ

2

₊

₂

τ

2

_|

_y i

−

xTi

β|

σ τ

2



=

r

(

1

−

r

)

σ

exp

 (

1

−

2r

)(

yi

−

xTi

β)

2

σ

−

|

yi

−

xTi

β|

2

σ



=

r

(

1

−

r

)

σ



(

yi

−

x T i

β) +

e −r(yi−xT_iβ)/σ_I R+

(

yi

−

x T i

β).

Appendix B. The drift condition implies geometric convergence

Recall that the drift function is given by

v(β, σ ) = σ +

_σ

1

+

n



i=1



yi

−

xTi

β

2

+

β

TΣ−1

β.

We now show that this function is unbounded off compact sets; that is, for every d

∈

_{R, the set}

Sd

=



(β, σ ) ∈

Rp

×

R+

:

σ +

1

σ

+

n



i=1



yi

−

xTi

β

2

+

β

TΣ−1

β ≤

d



is compact. If d is such that S_d

= ∅

, then S_dis clearly compact. So assume that S_dis non-empty. Since

v(β, σ )

is continuous,

Sdis closed, so it suffices to show that

|

β

i

|

is bounded for each i

∈ {

1

,

2

, . . . ,

p

}

, and that

σ

is bounded away from both 0

and

∞

. Since

σ +

1

/σ ≤

d

, σ

is clearly contained as specified. Furthermore, sinceΣ−1_{is positive definite, the condition}

β

T_Σ−1

_{β ≤}

_{d implies that}

_|

_β

i

|

are all bounded. Hence,

v(β, σ )

is unbounded off compact sets.

Because the product

π(σ|

z

, β

′

_,

_y

_{) π(}

_z

_|

_β

′

_{, σ}

′

_,

_y

₎

_{is continuous in}

_(β

′

_{, σ}

′

₎

_{, a standard argument using Fatou’s Lemma can}

be used to show that the Gibbs Markov chain

{

(β

_m

, σ

_m

)}

∞

m=0is a Feller chain [10, p. 127]. Hence, Meyn and Tweedie’s [10]

Theorem 6.0.1 implies that all compact sets in Rp

×

_R+are petite sets for the chain. Therefore, the drift function

v(β, σ )

is unbounded off petite sets [10, p. 191]. It now follows from [10] Lemma 15.2.8 that the geometric drift condition in

(9)

References

[1] S. Asmussen, P.W. Glynn, A new proof of convergence of MCMC via the ergodic theorem, Statistics & Probability Letters 81 (2011) 1482–1485. [2] W. Bednorz, K. Łatuszyński, A few remarks on ‘‘Fixed-width output analysis for Markov chain Monte Carlo’’ by Jones et al., Journal of the American

Statistical Association 102 (2007) 1485–1486.

[3] J.M. Flegal, M. Haran, G.L. Jones, Markov chain Monte Carlo: can we trust the third significant figure? Statistical Science 23 (2008) 250–260. [4] J.M. Flegal, G.L. Jones, Batch means and spectral variance estimators in Markov chain Monte Carlo, The Annals of Statistics 38 (2010) 1034–1070. [5] G.L. Jones, M. Haran, B.S. Caffo, R. Neath, Fixed-width output analysis for Markov chain Monte Carlo, Journal of the American Statistical Association

101 (2006) 1537–1547.

[6] K. Khare, J.P. Hobert, A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants, The Annals of Statistics 39 (2011) 2585–2606.

[7] R. Koenker, Quantile Regression, in: Econometric Society Monographs, vol. 38, Cambridge Univesity Press, Cambridge, 2005.

[8] S. Kotz, T.J. Kozubowski, K. Podgórski, The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance, Birkhäuser, Boston, 2001.

[9] H. Kozumi, G. Kobayashi, Gibbs sampling methods for Bayesian quantile regression, Journal of Statistical Computation and Simulation 81 (2011) 1565–1578.

[10] S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993.

[11] G.O. Roberts, J.S. Rosenthal, Markov chain Monte Carlo: some practical implications of theoretical results (with discussion), Canadian Journal of Statistics 26 (1998) 5–31.

[12] G.O. Roberts, J.S. Rosenthal, General state space Markov chains and MCMC algorithms, Probability Surveys 1 (2004) 20–71.

[13] J.S. Rosenthal, Minorization conditions and convergence rates for Markov chain Monte Carlo, Journal of the American Statistical Association 90 (1995) 558–566.