Contents lists available atSciVerse ScienceDirect
Journal of Multivariate Analysis
journal homepage:www.elsevier.com/locate/jmvaGeometric ergodicity of the Gibbs sampler for Bayesian
quantile regression
Kshitij Khare, James P. Hobert
∗Department of Statistics, University of Florida, United States
a r t i c l e i n f o
Article history:
Received 10 October 2011 Available online 7 June 2012 AMS subject classifications: primary 60J27
secondary 62F15 Keywords: Convergence rate Geometric drift condition Markov chain
Monte Carlo
a b s t r a c t
Consider the quantile regression model Y = Xβ + σϵwhere the components ofϵare i.i.d. errors from the asymmetric Laplace distribution with rth quantile equal to 0, where r∈(0,1)is fixed. Kozumi and Kobayashi (2011) [9] introduced a Gibbs sampler that can be used to explore the intractable posterior density that results when the quantile regression likelihood is combined with the usual normal/inverse gamma prior for(β, σ ). In this paper, the Markov chain underlying Kozumi and Kobayashi’s (2011) [9] algorithm is shown to converge at a geometric rate. No assumptions are made about the dimension of X , so the result still holds in the ‘‘large p, small n’’ case.
© 2012 Elsevier Inc. All rights reserved.
1. Introduction
In the usual quantile regression model, the conditional quantile function of Y given X
=
x takes the form Q(
r|
X=
x) =
xTβ(
r)
, where x is a p×
1 vector of covariates and, for fixed r∈
(
0,
1), β(
r)
is a p×
1 regression parameter. The standard(frequentist) estimator of
β(
r)
based on a sample of size n is the minimizer ofn
i=1ρ
r
Yi−
xTiβ,
(1)where the loss function
ρ
ris defined asρ
r(
u) =
u
r
−
I(
u<
0)
(see, e.g., [7]).Yu and Moyeed [14] pointed out that the minimizer of(1)is, in fact, the maximum likelihood estimator of
β
under the fully parametric model Yi=
xTi
β + ϵ
iwhere{
ϵ
i}
ni=1are assumed to be i.i.d. with common density given byg
(ϵ;
r) =
r(
1−
r)
e(1−r)ϵIR−
(ϵ) +
e−rϵIR+(ϵ),
(2)where R+
:=
(
0, ∞)
and R−:=
(−∞,
0]
. It is easy to see that this error density, which is called the asymmetric Laplacedensity, has rth quantile equal to zero. (When r
=
1/
2,
g becomes the standard Laplace density with location and scaleequal to 0 and 1
/
2, respectively.)In this paper, we consider a Bayesian version of a fully parametric quantile regression model in which the errors are from an unknown member of a scale family based on the asymmetric Laplace distribution. In particular, we assume that
∗Corresponding author.
E-mail address:[email protected](J.P. Hobert).
0047-259X/$ – see front matter©2012 Elsevier Inc. All rights reserved.
Yi
=
xTiβ + σϵ
i, where{
ϵ
i}
ni=1are i.i.d. with common density(2)andσ ∈
R+is an unknown scale parameter. We do notassume that n
≥
p. Suppose thatπ(β, σ)
is a proper prior density for(β, σ )
. The posterior density of(β, σ )
given the data,y
=
(
y1, . . . ,
yn)
T, is defined to beπ(β, σ|
y) =
f(
y;
β, σ ) π(β, σ )
m
(
y)
,
where f
(
y;
β, σ )
is the joint density of Y1, . . . ,
Ynat the point y, that is, f(
y;
β, σ ) =
rn(
1−
r)
nσ
−n n
i=1
e(1−r)(yi−xTiβ)/σI R−(
yi−
xTiβ) +
e −r(yi−xT iβ)/σI R+(
yi−
xTiβ)
,
and the marginal density (normalizing constant) is given by
m
(
y) :=
Rp
R+ f(
y;
β, σ) π(β, σ)
dσ
dβ.
Unfortunately, any non-trivial prior on
(β, σ)
leads to an intractable posterior. However, Kozumi and Kobayashi [9] showed that, if the usual normal/inverse gamma prior is adopted, then there is a simple Gibbs sampler that can be used to explore the resulting posterior density. Their algorithm exploits a latent data formulation of the quantile regression model that is based on a normal/exponential mixture representation of the asymmetric Laplace distribution [8, Chapter 3].Define
θ = θ(
r) =
r1−2r(1−r) andτ
2=
τ
2(
r) =
2r(1−r). Let
{
(
Yi,
Zi)}
ni=1 be independent random pairs such that
Yi
|
Zi=
zi∼
N(
xTiβ + θ
zi,
ziστ
2)
and, marginally, Zi∼
Exp(σ)
. Straightforward calculations (provided inAppendix A)show that the marginal density of Yiis given by
∞ 0 1√
2π
zσ τ
2exp
−
1 2zστ
2
yi−
xTiβ − θ
z
2
1σ
exp{−
z/σ}
dz=
r(
1−
r)
σ
e(1−r)(yi−xTiβ)/σI R−(
yi−
xTiβ) +
e −r(yi−xTiβ)/σI R+(
yi−
xTiβ),
(3)which is precisely the distribution of Yiunder the original model. This establishes the Zis as latent data. Of course, the joint
density of
{
(
Yi,
Zi)}
ni=1is given by f∗(
y,
z;
β, σ ) =
n
i=1
1
2π
ziσ τ
2 exp
−
1 2ziστ
2
yi−
xTiβ − θ
zi
2
σ
−1exp
−
ziσ
,
where z
=
(
z1, . . . ,
zn)
T, and(3)implies that
Rn+
f∗
(
y,
z;
β, σ )
dz=
f(
y;
β, σ).
(4)Combining the latent data model with the prior
π(β, σ)
yields the augmented posterior density defined asπ(β, σ ,
z|
y) =
f∗
(
y,
z;
β, σ ) π(β, σ)
m
(
y)
.
(5)It follows immediately from(4)that
Rn+
π(β, σ ,
z|
y)
dz=
π(β, σ|
y),
which is our target posterior density. The key fact underlying Kozumi and Kobayashi’s [9] Gibbs sampler is that, if a normal/inverse gamma prior is used for
(β, σ)
, then simulating from certain conditional densities associated withπ(β, σ,
z|
y)
is straightforward. Indeed, assume thatβ
andσ
are a priori independent withβ ∼
Np(
m,
Σ)
andσ ∼
IG(α, γ )
.(We say W
∼
IG(
a,
b)
if its density is proportional tow
−a−1e−wbIR+
(w)
.) Then, given(β, σ ,
y)
, the components ofZ
=
(
Z1, . . . ,
Zn)
T are independent, and the reciprocal of Zi has an inverse Gaussian distribution. Moreover,β|
z, σ ,
y ismultivariate normal, and
σ|
z, β,
y is inverted gamma. The precise forms of these conditional densities are provided inSection2.
Let
{
(β
m, σ
m)}
∞m=0be a Markov chain (with state space Rp
×
R+) whose dynamics are defined (implicitly) through thefollowing three-step procedure for moving from the current state,
(β
n, σ
n) = (β, σ)
, to(β
n+1, σ
n+1)
.Iteration n
+
1 of Kozumi and Kobayashi’s Gibbs sampler: 1. Draw Z∼
π(·|β, σ ,
y)
, and call the observed value z. 2. Drawσ
n+1∼
π(·|
z, β,
y)
.In Section2, the Markov transition density (Mtd) of the Gibbs Markov chain,
{
(β
m, σ
m)}
∞m=0, is defined and then used to establish that the chain is well behaved (i.e., Harris ergodic) and converges to the target posterior distribution. Thus, we can use this chain to construct strongly consistent estimators of intractable posterior expectations. To be specific, for k>
0, letLk
(π)
denote the set of functions g:
Rp×
R+→
R such thatEπ
|
g|
k:=
Rp
R+|
g(β, σ)|
kπ(β, σ|
y)
dσ
dβ < ∞.
Harris ergodicity implies that, if g
∈
L1(π)
, then the estimator gm:=
m1
m−1i=0 g
(β
m, σ
m)
is strongly consistent for Eπg, nomatter how the chain is started. Of course, in practice, an estimator is only useful if it is possible to compute an associated standard error. All available methods of computing a valid asymptotic standard error for gmare based on the existence of a central limit theorem (CLT) for gm; that is, we require that
√
m
gm
−
Eπg
d→
N(
0, φ
2),
for some positive, finite
φ
2. Unfortunately, even if g∈
Lk
(π)
for all k>
0, Harris ergodicity is not enough to guarantee theexistence of such a CLT (see, e.g., [11,12]). The standard method of establishing the existence of CLTs is to prove that the underlying Markov chain converges at a geometric rate.
LetB
(
X)
denote the Borel sets inX:=
Rp×
R+, and let Pm
:
X×
B(
X) → [
0,
1]
denote the m-step Markov transitionfunction of the Gibbs Markov chain. That is, Pm
(β, σ ),
A
is the probability that(β
m, σ
m) ∈
A, given that the chain is startedat
(β
0, σ
0) = (β, σ )
. Also, letΠ(·)
denote the posterior distribution. The chain is called geometrically ergodic if there exista function M
:
X→ [
0, ∞)
and a constantλ ∈ [
0,
1)
such that, for all(β, σ) ∈
Xand all m=
0,
1, . . .
, we have
Pm
(β, σ ), · −
Π(·)
TV≤
M(β, σ)λ
m,
where
∥ · ∥
TVdenotes the total variation norm. The relationship between geometric convergence and CLTs is simple: if the chain is geometrically ergodic and Eπ|
g|
2+δ< ∞
for someδ >
0, then gmsatisfies a CLT. Moreover, because the Mtd is
strictly positive onX(see Section2), the same 2
+
δ
moment condition implies that the usual estimators of the asymptotic variance,φ
2, are consistent [2–5]. Our main result, which is proven in Section3using a geometric drift condition, is the following.Proposition 1. Kozumi and Kobayashi’s [9] Gibbs Markov chain is geometrically ergodic.
We note that Khare and Hobert [6] considered a simplified version of our parametric Bayesian quantile regression model in which the scale parameter,
σ
, is known. The posterior density is still intractable in that case, despite the absence of a scale parameter. However, the latent data described above can be used to build a two-step Gibbs sampler for exploring that intractable posterior [9]. Khare and Hobert [6] established geometric ergodicity of the Markov chain underlying that algorithm. It is important to note that their result is not a special case ofProposition 1.2. The conditional densities and the Gibbs Markov chain
Implementation of Kozumi and Kobayashi’s [9] algorithm is quite simple because all three conditional densities have standard forms. Indeed, since
π(σ |
z, β,
y) ∝ π(β, σ ,
z|
y)
, it is easy to see thatσ |
z, β,
y∼
IG(α
′, γ
′)
whereα
′=
α +
3n 2 andγ
′=
γ +
n
i=1
yi−
xTiβ − θ
zi
2 2ziτ
2+
n
i=1 zi.
Now, let X be the n
×
p matrix with ith row equal to xTi. (Note that we do not assume that n
≥
p.) Also, let U denote an n×
n diagonal matrix whose ith diagonal element is(σ τ
2zi)
−1, and let l denote an n×
1 vector of ones. Standard Bayesianregression-type calculations show that
β|
z, σ,
y∼
Np(
m′,
Σ′)
where m′=
XTUX+
Σ−1
−1
XTUy−
θ
σ τ
2X Tl+
Σ−1m
and Σ′=
XTUX+
Σ−1
−1.
Finally, it follows from(5)that the components of Z
=
(
Z1, . . . ,
Zn)
Tare conditionally independent given(β, σ ,
y)
, andπ(
zi|
β, σ ,
y) ∝
1√
zi exp
−
yi−
xTiβ
2 2ziσ τ
2−
θ
2+
2τ
2
zi 2στ
2
.
When yi
−
xTiβ =
0, this is a gamma density. Otherwise, it is the density of the reciprocal of an inverse Gaussian randomvariable with parameters
µ
i=
√
θ
2+
2τ
2|
yi−
xTiβ|
andλ
i=
θ
2+
2τ
2στ
2.
In either case, we can write
π(
zi|
β, σ ,
y) =
θ
2+
2τ
2 2πσ τ
2z i exp
−
yi−
xTiβ
2 2ziστ
2+
√
θ
2+
2τ
2|
y i−
xTiβ|
σ τ
2−
θ
2+
2τ
2
zi 2σ τ
2
.
Let
η
denote Lebesgue measure on Rp×
R+. The Gibbs Markov chain has an Mtd (with respect to
η
) given byk
(β, σ | β
′, σ
′) =
Rn+
π(β|σ ,
z,
y) π(σ|
z, β
′,
y) π(
z|
β
′, σ
′,
y)
dz.
(6) A straightforward calculation shows that
Rp
R+
k
(β, σ | β
′, σ
′) π(β
′, σ
′|
y)
dσ
′dβ
′=
π(β, σ|
y),
so the target density is invariant. The Mtd is strictly positive, which implies that the chain is aperiodic and
η
-irreducible [10, p. 87]. Moreover, the existence of an invariant probability density together withη
-irreducibility implies that the chain is positive Harris recurrent (see, e.g., [1]). Note also thatη
is equivalent to the maximal irreducibility measure.3. The Gibbs Markov chain is geometrically ergodic
In this section, we proveProposition 1by establishing a geometric drift condition. In particular, we will prove the following result.
Proposition 2. There exist a
ρ ∈ [
0,
1)
and a finite constant L such that, for every(β
′, σ
′) ∈
Rp
×
R+,E
v(β, σ ) | β
′, σ
′ ≤
ρ v(β
′, σ
′) +
L,
(7)where the drift function is defined as
v(β, σ ) = σ +
σ
1+
n
i=1
yi−
xTiβ
2+
β
TΣ−1β.
The reason why the geometric drift condition(7) implies geometric ergodicity of the Markov chain is laid out in
Appendix B.
Proof of Proposition 2. The expectation on the left-hand side of(7)can be broken down into three conditional expectations. Indeed, E
v(β, σ) | β
′, σ
′ =
R+
Rpv(β, σ)
k(β, σ | β
′, σ
′)
dβ
dσ
=
Rn+
R+
Rpv(β, σ) π(β|σ ,
z,
y)
dβ
π(σ|
z, β
′,
y)
dσ
π(
z|
β
′, σ
′,
y)
dz.
(8) Here is a brief outline of the remainder of the proof. First, we develop an upper bound of the form b1(σ ) +
c1(where c1isconstant) for the inner-most integral in(8). We then construct a function b2
(
z, β
′)
such that
R+b1
(σ ) π(σ|
z, β
′
,
y)
dσ ≤
b2
(
z, β
′) +
c2. Finally, we show that
Rn+b2
(
z, β
′
) π(
z|
β
′, σ
′,
y)
dz≤
ρ v(β
′, σ
′) +
c3, and the result follows immediately.
Before we begin analyzing the inner-most integral, we need a few definitions and facts. For a vector a, define
∥
a∥ =
√
aTa,
and for a matrix A, define
∥
A∥ =
sup∥x∥=1∥
Ax∥
. In general,∥
a+
b∥
2≤
2∥
a∥
2+
2∥
b∥
2, and∥
ABx∥ ≤ ∥
A∥∥
Bx∥
. Of course,
n i=1
yi−
xTiβ
2= ∥
y−
Xβ∥
2and we have∥
y−
Xβ∥
2≤
2∥
y∥
2+
2∥
Xβ∥
2=
2∥
y∥
2+
2∥
XΣ12Σ− 1 2β∥
2≤
2∥
y∥
2+
2∥
XΣ21∥
2∥
Σ−12β∥
2.
(9)It follows from(9)that
v(β, σ ) ≤ σ +
σ
1+
2∥
y∥
2+
2
∥
XΣ12∥
2+
1
Now using(10)we see that
Rpv(β, σ ) π(β|σ ,
z,
y)
dβ ≤ σ +
1σ
+
2∥
y∥
2+
2∥
XΣ12∥
2+
1
E
∥
Σ−12β∥
2|
σ ,
z,
y.
(11) LetX˜
=
XΣ12. Given(σ ,
z,
y)
,Σ− 12
β
is multivariate normal with mean
˜
XTUX˜
+
I
−1
˜
XTUy−
θ
στ
2X˜
T l+
Σ−12m
and covariance matrix
X˜
TUX˜
+
I
−1. Therefore, lettingx
˜
idenote the ith column ofX˜
T, we haveE
∥
Σ−12β∥
2|
σ ,
z,
y
=
˜
XTUX˜
+
I
−1
˜
XTUy−
θ
σ τ
2˜
XTl+
Σ−12m
2+
tr
˜
XTUX˜
+
I
−1
≤
2
˜
XTUX˜
+
I
−1X˜
TUy
2+
2
θ
στ
2X˜
Tl
2+
2
Σ− 1 2m
2+
tr(
I)
=
2
n
i=1
n
j=1˜
xjx˜
Tjστ
2z j+
I
−1˜
xiyiσ τ
2z i
2+
2θ
2σ
2τ
4
X˜
Tl
2+
2
Σ −12 m
2+
p,
(12)where the inequality is due to the fact that I
−
X˜
TUX˜
+
I
−1is non-negative definite. Now, the triangle inequality and some rearrangement yields
n
i=1
n
j=1˜
xjx˜
Tjσ τ
2z j+
I
−1˜
xiyiστ
2z i
2≤
n
i=1
˜
xix˜
Tiσ τ
2z i+
j̸=i˜
xjx˜
Tjσ τ
2z j+
I
−1˜
xiyiσ τ
2z i
2=
n
i=1|
yi|
˜
xi˜
xTi+
j̸=i zi zj˜
xjx˜
Tj+
στ
2z iI
−1˜
xi
2.
(13)We now employ the following result.
Lemma 1 ([6]). Fix n
∈ {
2,
3, . . .}
and p∈
N, and let t1, . . . ,
tnbe vectors in Rp. Then Cp,n(
t1;
t2, . . . ,
tn) :=
sup c∈Rn+ t1T
t1t1T+
n
i=2 cititiT+
c1I
−2 t1 is finite.It follows fromLemma 1that(13)is bounded above by a finite constant that we will call C . This fact combined with(12)
yields E
∥
Σ−12β∥
2|
σ ,
z,
y
≤
2C+
2θ
2σ
2τ
4
X˜
Tl
2+
2
Σ− 1 2m
2+
p.
(14)Combining(11)with(14), we have
Rpv(β, σ ) π(β|σ ,
z,
y)
dβ ≤ σ +
1σ
+
1σ
2
2θ
2
2∥
XΣ12∥
2+
1
∥ ˜
XTl∥
2τ
4
+
C′,
(15) where C′=
2∥
y∥
2+
2∥
XΣ12∥
2+
1
2C+
2
Σ −12 m
2+
p.
The next step is to bound the integral of the right-hand side of(15)against
π(σ |
z, β
′,
y)
. First, note that E
1σ
|
β
′,
z,
y
=
α
′γ
′=
α +
3n 2
γ +
n i=1
yi−
xTiβ
′−
θ
z i
2 2ziτ
2+
n
i=1 zi
−1≤
2α +
3n 2γ
.
(16)Similarly, E
1σ
2|
β
′,
z,
y
=
α
′(α
′+
1)
γ
′
2=
α +
3n 2
α +
3n 2+
1
γ +
n i=1
yi−
xTiβ
′−
θ
zi
2 2ziτ
2+
n
i=1 zi
−2≤
(
2α +
3n)(
2α +
3n+
2)
4γ
2.
(17) Finally, E[
σ | β
′,
z,
y] =
γ
′α
′−
1=
2 2α +
3n−
2
γ +
n
i=1
yi−
xTiβ
′−
θ
z i
2 2ziτ
2+
n
i=1 zi
.
(18)Now, combining(15)–(18), we have
R+
Rpv(β, σ )π(β|σ ,
z,
y)
dβ
π(σ |
z, β
′,
y)
dσ ≤
2 2α +
3n−
2
×
n
i=1
yi−
xTiβ
′−
θ
z i
2 2ziτ
2+
n
i=1 zi
+
C′′,
(19) where C′′=
2α +
3n 2γ
+
2θ
2(
2∥
XΣ12∥
2+
1)∥ ˜
XTl∥
2τ
4 (
2α +
3n)(
2α +
3n+
2)
4γ
2+
2γ
2α +
3n−
2
+
C′.
The last step is to bound the integral of the right-hand side of(19)against
π(
z|
β
′, σ
′,
y)
. First, note thatn
i=1
yi−
xTiβ
′−
θ
z i
2 2ziτ
2+
n
i=1 zi=
1 2τ
2 n
i=1
yi−
xTiβ
′
2 zi+
θ
2 2τ
2+
1
n
i=1 zi−
θ
τ
2 n
i=1
yi−
xTiβ
′
.
(20)Assume for the moment that yi
−
xTiβ
are all non-zero. Then it follows from properties of the inverse Gaussian distributionthat E
[
zi|
β
′, σ
′,
y] =
1µ
i+
1λ
i=
|
yi−
x T iβ
′|
√
θ
2+
2τ
2+
σ
′τ
2θ
2+
2τ
2 and E
1 zi|
β
′, σ
′,
y
=
µ
i=
√
θ
2+
2τ
2|
yi−
xTiβ
′|
.
Thus, the integral of(20)against
π(
z|
β
′, σ
′,
y)
is equal to√
θ
2+
2τ
2 2τ
2 n
i=1|
yi−
xTiβ
′| +
θ
2 2τ
2+
1
n
i=1|
yi−
xTiβ
′|
√
θ
2+
2τ
2+
nσ
′ 2−
θ
τ
2 n
i=1
yi−
xTiβ
′
.
(21)Now note that, if yi
−
xTiβ
′
=
0, then the only term containing zion the right-hand side of(20)is
θ
22
τ
2+
1
zi
which has expectation
σ
′/
2. Hence,(21)continues to hold even when yi
−
xTiβ
′
=
0 for some (or all) i. It is clear that(21)is bounded above by
√
θ
2+
2τ
2 2τ
2 n
i=1|
yi−
xTiβ
′| +
θ
2 2τ
2+
1
n
i=1|
yi−
xTiβ
′|
√
θ
2+
2τ
2+
nσ
′ 2+
θ
τ
2 n
i=1|
yi−
xTiβ
′|
.
(22)Now, using the inequality
|
x| ≤
(
x2+
1)/
2 three times (twice with x= |
yi−
xTiβ
′
|
and once with x
= |
yi−
xTiβ
′
|
/
√
θ
2+
2τ
2),we can show that(22)is bounded above by
θ
2+
2τ
2+
2θ +
1
n
i=1
yi−
xTiβ
′
2 4τ
2+
n√
θ
2+
2τ
2+
2nθ +
n
θ
2+
2τ
2
4τ
2+
nσ
′ 2.
(23) Combining(20)–(23), we have
Rn+
n
i=1
yi−
xT iβ
′−
θ
z i
2 2ziτ
2+
n
i=1 zi
π(
z|
β
′, σ
′,
y)
dz≤
θ
2+
2τ
2+
2θ +
1
n
i=1
yi−
xTiβ
′
2 4τ
2+
n√
θ
2+
2τ
2+
2nθ +
n
θ
2+
2τ
2
4τ
2+
nσ
′ 2.
(24) Finally,(19)together with(24)yieldsE
v(β, σ ) | β
′, σ
′ =
Rn+
R+
Rpv(β, σ ) π(β|σ,
z,
y)
dβ
π(σ |
z, β
′,
y)
dσ
π(
z|
β
′, σ
′,
y)
dz≤
1 2α +
3n−
2 √θ
2+
2τ
2+
2θ +
1 2τ
2 n
i=1
yi−
xTiβ
′
2+
nσ
′
+
L,
(25) where L=
2 2α +
3n−
2
n√
θ
2+
2τ
2+
2nθ +
n
θ
2+
2τ
2
4τ
2
+
C′′.
Now, recalling that
θ =
r1−2r(1−r)andτ
2=
2r(1−r), we have
√
θ
2+
2τ
2+
2θ +
1 2τ
2=
1 4+
1−
2r 2+
r(
1−
r)
4≤
1 4+
1 2+
1 16<
1.
This fact in conjunction with(25)leads toE
v(β, σ ) | β
′, σ
′ ≤
1 2α +
3n−
2
n n
i=1
yi−
xTiβ
′
2+
nσ
′
+
L=
n 2α +
3n−
2
n
i=1
yi−
xTiβ
′
2+
σ
′
+
L≤
n 2α +
3n−
2v(β
′, σ
′) +
L=
ρ(
n, α) v(β
′, σ
′) +
L,
where
ρ(
n, α) =
n/(
2α +
3n−
2)
. Since n≥
1 andα >
0, ρ(
n, α) <
1 and the proof is complete.4. Discussion
We have established the existence of a function M
:
Rp×
R+
→ [
0, ∞)
and a constantλ ∈ [
0,
1)
such that, for all(β, σ) ∈
Rp×
R+and all m=
0,
1,
2, . . .
,
Pm
(β, σ ), · −
Π(·)
TV≤
M(β, σ)λ
m.
This is a qualitative geometric convergence result in the sense that we have not actually identified M and
λ
. However, as explained in the Introduction, this qualitative result is enough to guarantee the existence of CLTs. On the other hand, there are techniques for constructing M andλ
(see, e.g., [13]), and these require both a drift condition (with explicit formulas forρ
and L), and an associated minorization condition. We have provided an explicit formula forρ
. Indeed,ρ = ρ(
n, α) =
n/(
2α +
3n−
2)
. However, we do not have an explicit formula for L. The sole reason for this is thatAcknowledgments
The first author was supported by NSF Grant DMS-11-06084, and the second author by NSF Grants DMS-08-05860 & DMS-11-06395.
Appendix A. The marginal of Yiunder the two-stage hierarchy
Here we establish(3). First,
∞ 0 1√
2π
zσ τ
2exp
−
1 2zσ τ
2
yi−
xTiβ − θ
z
2
1σ
exp{−
z/σ}
dz=
√
1 2πσ
2τ
2exp θ(
yi−
xTiβ)
σ τ
2
∞ 0 1√
zexp
−
1 2zσ τ
2(
yi−
x T iβ)
2+
z2(θ
2+
2τ
2)
dz.
Now
∞ 0 1√
zexp
−
1 2zσ τ
2(
yi−
x T iβ)
2+
z2(θ
2+
2τ
2)
dz=
∞ 0 1w
3/2exp
−
1 2wσ τ
2w
2(
y i−
xTiβ)
2+
(θ
2+
2τ
2)
dw
=
√
2πσ τ
2√
θ
2+
2τ
2exp
−
√
θ
2+
2τ
2|
y i−
xTiβ|
σ τ
2
,
where the first equality follows from the transformation
w =
1/
z, and the second follows from the fact that the inverseGaussian density integrates to unity. Now, putting things back together and using the definitions of
θ
andτ
2, we see thatthe marginal density of Yiis
1
√
2πσ
2τ
2exp θ(
yi−
xTiβ)
σ τ
2
√
2πσ τ
2√
θ
2+
2τ
2exp
−
√
θ
2+
2τ
2|
y i−
xTiβ|
σ τ
2
=
r(
1−
r)
σ
exp (
1−
2r)(
yi−
xTiβ)
2σ
−
|
yi−
xTiβ|
2σ
=
r(
1−
r)
σ
e(1−r)(yi−xTiβ)/σI R−(
yi−
x T iβ) +
e −r(yi−xTiβ)/σI R+(
yi−
x T iβ).
Appendix B. The drift condition implies geometric convergence
Recall that the drift function is given by
v(β, σ ) = σ +
σ
1+
n
i=1
yi−
xTiβ
2+
β
TΣ−1β.
We now show that this function is unbounded off compact sets; that is, for every d
∈
R, the setSd
=
(β, σ ) ∈
Rp×
R+:
σ +
1σ
+
n
i=1
yi−
xTiβ
2+
β
TΣ−1β ≤
d
is compact. If d is such that Sd
= ∅
, then Sdis clearly compact. So assume that Sdis non-empty. Sincev(β, σ )
is continuous,Sdis closed, so it suffices to show that
|
β
i|
is bounded for each i∈ {
1,
2, . . . ,
p}
, and thatσ
is bounded away from both 0and
∞
. Sinceσ +
1/σ ≤
d, σ
is clearly contained as specified. Furthermore, sinceΣ−1is positive definite, the conditionβ
TΣ−1β ≤
d implies that|
β
i
|
are all bounded. Hence,v(β, σ )
is unbounded off compact sets.Because the product
π(σ|
z, β
′,
y) π(
z|
β
′, σ
′,
y)
is continuous in(β
′, σ
′)
, a standard argument using Fatou’s Lemma canbe used to show that the Gibbs Markov chain
{
(β
m, σ
m)}
∞m=0is a Feller chain [10, p. 127]. Hence, Meyn and Tweedie’s [10]
Theorem 6.0.1 implies that all compact sets in Rp
×
R+are petite sets for the chain. Therefore, the drift functionv(β, σ )
is unbounded off petite sets [10, p. 191]. It now follows from [10] Lemma 15.2.8 that the geometric drift condition in
References
[1] S. Asmussen, P.W. Glynn, A new proof of convergence of MCMC via the ergodic theorem, Statistics & Probability Letters 81 (2011) 1482–1485. [2] W. Bednorz, K. Łatuszyński, A few remarks on ‘‘Fixed-width output analysis for Markov chain Monte Carlo’’ by Jones et al., Journal of the American
Statistical Association 102 (2007) 1485–1486.
[3] J.M. Flegal, M. Haran, G.L. Jones, Markov chain Monte Carlo: can we trust the third significant figure? Statistical Science 23 (2008) 250–260. [4] J.M. Flegal, G.L. Jones, Batch means and spectral variance estimators in Markov chain Monte Carlo, The Annals of Statistics 38 (2010) 1034–1070. [5] G.L. Jones, M. Haran, B.S. Caffo, R. Neath, Fixed-width output analysis for Markov chain Monte Carlo, Journal of the American Statistical Association
101 (2006) 1537–1547.
[6] K. Khare, J.P. Hobert, A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants, The Annals of Statistics 39 (2011) 2585–2606.
[7] R. Koenker, Quantile Regression, in: Econometric Society Monographs, vol. 38, Cambridge Univesity Press, Cambridge, 2005.
[8] S. Kotz, T.J. Kozubowski, K. Podgórski, The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance, Birkhäuser, Boston, 2001.
[9] H. Kozumi, G. Kobayashi, Gibbs sampling methods for Bayesian quantile regression, Journal of Statistical Computation and Simulation 81 (2011) 1565–1578.
[10] S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993.
[11] G.O. Roberts, J.S. Rosenthal, Markov chain Monte Carlo: some practical implications of theoretical results (with discussion), Canadian Journal of Statistics 26 (1998) 5–31.
[12] G.O. Roberts, J.S. Rosenthal, General state space Markov chains and MCMC algorithms, Probability Surveys 1 (2004) 20–71.
[13] J.S. Rosenthal, Minorization conditions and convergence rates for Markov chain Monte Carlo, Journal of the American Statistical Association 90 (1995) 558–566.