Contents lists available atSciVerse ScienceDirect
Journal of Multivariate Analysis
journal homepage:www.elsevier.com/locate/jmva
Parametric bootstrap methods for bias correction in linear
mixed models
Tatsuya Kubokawa
a,∗, Bui Nagashima
baFaculty of Economics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan bGraduate School of Economics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
a r t i c l e i n f o
Article history: Received 22 April 2011
Available online 16 December 2011
AMS 2010 subject classification: 62F40
62J07 62F12 62F15
Keywords:
Best linear unbiased predictor Confidence interval Empirical Bayes procedure Fay–Herriot model Second-order correction Linear mixed model
Maximum likelihood estimator Mean squared error
Nested error regression model Parametric bootstrap
Restricted maximum likelihood estimator Small area estimation
a b s t r a c t
The empirical best linear unbiased predictor (EBLUP) in the linear mixed model (LMM) is useful for the small area estimation, and the estimation of the mean squared error (MSE) of EBLUP is important as a measure of uncertainty of EBLUP. To obtain a second-order unbiased estimator of the MSE, the second-second-order bias correction has been derived based on Taylor series expansions. However, this approach is hard to implement in complicated models with many unknown parameters like variance components, since we need to compute asymptotic bias, variance and covariance for estimators of unknown parameters as well as partial derivatives of some quantities. A similar difficulty occurs in the construction of confidence intervals based on EBLUP with second-order correction and in the derivation of second-order bias correction in the Akaike Information Criterion (AIC) and the conditional AIC. To avoid such difficulty in the derivation of second-order bias correction in these problems, the parametric bootstrap methods are suggested in this paper, and their second-order justifications are established. Finally, performances of the suggested procedures are numerically investigated in comparison with some existing procedures given in the literature.
©2011 Elsevier Inc. All rights reserved.
1. Introduction
The linear mixed models (LMM) and the model-based estimates including empirical best linear unbiased predictor (EBLUP) or the empirical Bayes estimator (EB) have been recognized useful in small area estimation. The typical models used for the small area estimation are the Fay–Herriot model and the nested error regression model (NERM), and the usefulness of EBLUP is illustrated by Fay and Herriot [15] and Battese et al. [4]. For a good review and account on this topic, see [16,26,24]. When EBLUP is used to estimate a small area mean based on real data, it is important to estimate the mean squared error (MSE) since it can assess how much EBLUP is reliable. Asymptotically unbiased estimators of the MSE with the second-order bias correction have been derived based on the Taylor series expansion by Prasad and Rao [25], Datta and Lahiri [11], Datta et al. [12], Das et al. [8], Kubokawa [20] and others. A drawback of this method is that it is harder to compute the second-order bias, variance and covariance of estimators of more unknown parameters including variance components, and that it is troublesome to derive partial derivatives of some matrices with respect to unknown parameters. To avoid
∗Corresponding author.
E-mail addresses:[email protected](T. Kubokawa),[email protected](B. Nagashima). 0047-259X/$ – see front matter©2011 Elsevier Inc. All rights reserved.
this difficulty, Butar and Lahiri [5] proposed the parametric bootstrap method, which is easy to implement, since we do not have to compute the second-order bias, variance and partial derivatives. For some recent results including nonparametric methods, see [22,17,7].
In the construction of a confidence interval based on EBLUP with the second-order accuracy, we are faced with a similar problem. Basu et al. [3], Datta et al. [9], Kubokawa [19] derived such confidence intervals using the Taylor series expansion. To avoid the difficulty in derivation of second-order moments, Chatterjee et al. [6], Hall and Maiti [18] proposed the confidence intervals using the parametric bootstrap method.
A similar difficulty occurs in evaluating the bias terms ofAICand conditionalAIC. The Akaike Information Criterion (AIC) originated from [1,2] is recognized very useful for selecting models in general situations, and it is also useful for selecting variables in LMM. When unknown parameters in the model are estimated by the maximum likelihood estimator, the penalty term, which is a kind of bias, is known to be 2
×
pfor dimensionpof unknown parameters. When the unknown parameters in LMM are estimated by other estimators, however, Kubokawa [21] showed that the penalty term includes partial derivatives of the estimator and the covariance matrix. Concerning the conditionalAIC, on the other hand, Liang et al. [23] Vaida and Blanchard [29] proposed the conditionalAICin LMM, but their derivations were limited to the cases that the parameters in LMM are partly known. Recently, Kubokawa [21] derived the second-order bias correction for the conditionalAIC, but it is harder to compute in more complicated models.In this paper, we treat the problems mentioned above, and provide useful procedures based on the parametric bootstrap methods to avoid the computational difficulties. In Section2, we suggest the MSE estimator, the confidence interval,AIC
and the conditionalAICusing the parametric bootstrap. Concerning the MSE estimation, Butar and Lahiri [5] estimated the third term of the MSE, denoted byg3, based on the parametric bootstrap, while in this paper, we consider to estimate the
second-order approximation ofg3using the parametric bootstrap method. A similar approach applies to the confidence
interval, and we estimate the second-order correction term based on the parametric bootstrap method. This is different from the parametric bootstrap procedure suggested by Chatterjee et al. [6] who obtained two end-points of a confidence interval based on a distribution generated by the parametric bootstrap sampling. Simulation and empirical studies are given in Section3. The proofs for the second-order justifications of the proposed procedures are given in theAppendix.
2. MSE estimation, confidence interval andAICbased on the parametric bootstrap method
2.1. Linear mixed model and the parametric bootstrap method
Consider the following general linear mixed model.
[1]Model1. AnN
×
1 observation vectoryof the response variable has the modely
=
Xβ
+
Zv+
ϵ
,
(1)whereX andZ areN
×
pandN×
Mmatrices, respectively, of the explanatory variables,β
is ap×
1 unknown vector of the regression coefficients,vis anM×
1 vector of the random effects, andϵ
is anN×
1 vector of the random errors. Here, vandϵ
are mutually independently distributed asv∼
NM(
0,
G(
θ
))
andϵ
∼
NN(
0,
R(
θ
))
, whereθ
=
(θ
1, . . . , θ
q)
′is aq-dimensional vector of unknown parameters, andG
=
G(
θ
)
andR=
R(
θ
)
are positive definite matrices. Then,yhas a marginal distributionNN(
Xβ
,
Σ(
θ
))
forΣ
=
Σ(
θ
)
=
R(
θ
)
+
ZG(
θ
)
Z′.
Throughout the paper, we often drop
(
θ
)
inG(
θ
)
,R(
θ
)
,Σ(
θ
)
and others for notational convenience.For simplicity, it is here assumed thatX is of full rank. However, the results given in this paper can be extended to the case thatXis not of full rank by modifyingXand
β
as follows. Letrbe a rank ofX, and suppose thatr<
p. Then,Xcan be written as X=
P
Er 0 0′ 0
Q,
wherePandQare, respectively,N
×
Nandp×
porthogonal matrices andEr=
diag(λ
1, . . . , λ
r)
for positive constantsλ
i’s. LetP=
(
P1,
P2)
andQ′=
(
Q1′,
Q′
2
)
forN×
rmatrixP1andr×
pmatrixQ1. Thus,Xβ
can be rewritten asX
β
=
P1ErQ1β
.
Hence, all the results given in this paper still hold if we replace
(
X,
β
,
p)
with(
P1Er,
Q1β
,
r)
.The unknown parameters in Model 1 are
β
andθ
. Whenθ
is known, the regression coefficients vectorβ
is estimated by the generalized least squares estimator given by
β
(
θ
)
=
(
X′Σ(
θ
)
−1X)
−1X′Σ(
θ
)
−1y.
The parameter
θ
consists of variance components and others, and it is estimated by consistent estimator
θ
based onywhichcan be constructed by maximum likelihood, restricted maximum likelihood and other methods. Then,
β
is estimated byWe next consider the following conditional linear mixed model giveny. (2)Model2. AnN
×
1 random vectory∗has the modely∗
=
X
β
(
θ
)
+
Zv∗+
ϵ
∗,
(2)whereXandZare the same matrices as given in(1), and giveny, the random vectorsv∗and
ϵ
∗are conditionally mutually independently distributed asv∗|
y∼
NM
(
0,
G(
θ
))
andϵ
∗|
y∼
NN(
0,
R(
θ
))
.Before stating the main results, we briefly explain the intuitive idea of the parametric bootstrap method based on Model 2. Letg
(
θ
)
be a differentiable function withg(
θ
)
=
O(
1)
. Althoughg(
θ
)
is an asymptotically unbiased estimator ofg(
θ
)
, ingeneral, there exists a second-order bias. Then, we need to approximate the expectationE
[
g(
θ
)
]
up toO(
N−1)
. It is supposedthat the approximation is given by
E
[
g(
θ
)
] =
g(
θ
)
+
bg(
θ
)
+
O(
N−3/2),
(3)wherebg
(
θ
)
is a continuously differentiable function withO(
N−1)
. Then,E
[
g(
θ
)
−
bg(
θ
)
] =
E[
g(
θ
)
] −
E[
bg(
θ
)
]
= {
g(
θ
)
+
bg(
θ
)
} −
bg(
θ
)
+
O(
N−3/2)
=
g(
θ
)
+
O(
N−3/2).
(4)Using Model 2, from(3), we can see that
E∗
[
g(
θ
∗
)
|
y] =
g(
θ
)
+
bg(
θ
)
+
Op(
N−3/2),
whereE∗
[·|
y]
is the conditional expectation with respect to Model 2 giveny, and the calculation of
θ
∗
is the same as that of
θ
except that
θ
∗
is calculated based ony∗instead ofy. Hence from(4), it is seen that
E
2g(
θ
)
−
E∗[
g(
θ
∗)
|
y]
=
E
g(
θ
)
−
E∗[
g(
θ
∗)
−
g(
θ
)
|
y]
=
E[
g(
θ
)
−
bg(
θ
)
] +
O(
N −3/2)
=
g(
θ
)
+
O(
N−3/2).
(5) Thus, we obtain the second-order unbiased estimator 2g(
θ
)
−
E∗[
g(
θ
∗
)
|
y]
.Since, in general, the functionbg
(
θ
)
can be derived by the Taylor series expansion, it includes partial derivatives withrespect to
θ
i,i=
1, . . . ,
q, and moments of estimator
θ
. It is clear that the second-order unbiased estimator 2g(
θ
)
−
E∗
[
g(
θ
∗
)
|
y]
is free from differentiations or moments of
θ
. This idea was used by Butar and Lahiri [5], Chatterjee et al. [6]and others in the framework of small area estimation, and we heavily employ the parametric bootstrap method in this paper.
2.2. Estimation of MSE of EBLUP
Based on the parametric bootstrap method, we first derive an estimator of MSE of EBLUP for the general scalar quantity
µ
=
a′β
+
b′v,
whereaandbbep
×
1 andM×
1 vectors of fixed constants. It is noted that the marginal and the conditional distributions ofygivenvare, respectively,y
∼
NN(
Xβ
,
Σ(
θ
)),
y
|
v∼
NN(
Xβ
+
Zv,
R(
θ
)).
Let
µ
v=
GZ′Σ−1(
y−
Xβ
)
andΣv
=
G−
GZ′Σ−1ZG. Since the conditional distribution ofvgivenyisv
|
y∼
NM(
µ
v,
Σv),
(6)the conditional expectationE
[
µ
|
y]
is written as
µ
B
(
β
,
θ
)
=
a′β
+
s
(
θ
)
′(
y−
Xβ
),
(7)wheres
(
θ
)
=
Σ(
θ
)
−1ZG(
θ
)
b. This can be interpreted as the Bayes estimator ofµ
in the Bayesian context. The generalized least squares estimator
β
(
θ
)
is substituted into
µ
B
(
β
,
θ
)
to get the estimator
µ
EB(
θ
)
=
µ
B(
β
(
θ
),
θ
)
=
a ′
β
(
θ
)
+
s(
θ
)
′(
y−
X
β
(
θ
)),
(8)which is the best linear unbiased predictor (BLUP) of
µ
. When an estimator
θ
is available forθ
, we can estimateµ
by theempirical (or estimated) best linear unbiased predictor (EBLUP)
µ
EB(
θ
)
, which is also called an empirical Bayes estimator inThe MSE function of EBLUP
µ
EB(
θ
)
isMSE(
θ
,
µ
EB(
θ
))
=
E[{
µ
EB(
θ
)
−
µ
}
2]
, which can be decomposed asMSE(
θ
,
µ
EB(
θ
))
=
g1
(
θ
)
+
g2(
θ
)
+
g3(
θ
)
as shown in [25,11], whereg1(
θ
)
=
E[{
µ
B(
β
,
θ
)
−
µ
}
2]
,g 2(
θ
)
=
E[{
µ
EB(
θ
)
−
µ
B(
β
,
θ
)
}
2]
and g3(
θ
)
=
E[{
µ
EB(
θ
)
−
µ
EB(
θ
)
}
2]
. The termsg1
(
θ
)
andg2(
θ
)
can be rewritten asg1
(
θ
)
=
b′(
G(
θ
)
−1+
Z′R(
θ
)
−1Z)
−1b,
g2
(
θ
)
=
(
a−
X′s(
θ
))
′(
X′Σ(
θ
)
−1X)
−1(
a−
X′s(
θ
)).
Using the argument as in(5), we can estimateg1
(
θ
)
+
g2(
θ
)
by2
{
g1(
θ
)
+
g2(
θ
)
} −
E∗[
g1(
θ
∗
)
+
g2(
θ
∗
)
|
y]
.
Forg3
(
θ
)
, in this paper we use the estimator given byg3∗
(
θ
)
=
E∗
{
s(
θ
∗)
−
s(
θ
)
}
′Σ(
θ
)
{
s(
θ
∗)
−
s(
θ
)
}
y
.
(9)Thus, we get the estimator
mse∗
(
θ
,
µ
EB(
θ
))
=
2{
g1(
θ
)
+
g2(
θ
)
} −
E∗
g1(
θ
∗)
+
g2(
θ
∗)
y
+
g3∗(
θ
).
(10)In theAppendix, we shall show thatE
[
mse∗(
θ
,
µ
EB(
θ
))
] =
MSE(
θ
,
µ
EB(
θ
))
+
O(
N−3/2)
under some conditions.2.3. Confidence interval based on EBLUP
We next construct a confidence interval of
µ
=
a′β
+
b′vbased on EBLUP which satisfies the nominal confidence level with the second-order accuracy. Sincemse∗(
θ
,
µ
EB(
θ
))
is an asymptotically unbiased estimator of the MSE of EBLUP, it isreasonable to consider the confidence interval of the form
IEB
(
θ
)
:
µ
EB(
θ
)
±
zα/2
max{
mse∗(
θ
,
µ
EB(
θ
)),
0}
,
(11)where zα/2 is the 100
×
α/
2% upper quantile of the standard normal distribution. However, the coverage probabilityP
[
µ
∈
IEB(
θ
)
]
cannot be guaranteed to be greater than or equal to the nominal confidence coefficient 1−
α
. To addressthe problem, we consider the correction function given by
h∗1
(
θ
)
=
1+
zα/22 8g1(
θ
)
2 E∗
{
g1(
θ
∗)
−
g1(
θ
)
}
2
y
,
(12)which is an asymptotically unbiased estimator ofh1
(
θ
)
given in(A.12). Then, the corrected confidence interval is provided byI1CEB∗
(
θ
)
:µ
EB(
θ
)
±
zα/2
1+
h∗1(
θ
)
max{
mse∗(
θ
,
µ
EB(
θ
)),
0}
.
(13)A drawback ofI1CEB∗
(
θ
)
is that it cannot give an interval whenmse∗(
θ
,
µ
EB(
θ
))
takes a negative value. A simulation experimentgiven in Section3.2shows that such a shortcoming occurs in an extreme case. Thus, we suggest the alternative corrected confidence interval I2CEB∗
(
θ
)
:
µ
EB(
θ
)
±
zα/2
1+
h∗2(
θ
)
g1(
θ
)
+
g2(
θ
),
(14) where h∗2(
θ
)
=
h ∗ 1(
θ
)
+
g1(
θ
)
−
E∗[
g1(
θ
∗)
|
y] +
g∗3(
θ
)
2g1(
θ
)
.
(15)In theAppendix, it can be shown thatP
[
µ
∈
ICEB∗i
(
θ
)
] =
1−
α
+
O(
N−3/2)
fori=
1,
2.2.4. AIC and conditional AIC
We here provide the Akaike Information Criterion (AIC) and conditionalAICbased on the parametric bootstrap method. [1]AIC∗
1and AIC
∗
2. Let us define theAkaike Information(AI) by
AI
(
θ
)
= −
2
{
logfm(
y|
β
(
y),
θ
(
y))
}
fm(
y|
β
,
θ
)
fm(
y|
β
,
θ
)
d
ydy,
(16)where
β
(
y)
=
β
(
θ
)
and
θ
(
y)
are estimators based ony, andfm(
y|
β
,
θ
)
is a marginal density function ofygiven by−
2 logfm(
y|
β
,
θ
)
=
Nlog(
2π)
+
log|
Σ(
θ
)
| +
(
y−
Xβ
)
′Σ(
θ
)
−1(
y−
Xβ
).
Akaike’sAIC can be derived as an asymptotically unbiased estimator ofAI
(
θ
)
, namely,E[
AIC] =
AI(
θ
)
+
o(
1)
asN→ ∞
. WhenAIC is an exact unbiased estimator ofAI(
θ
)
, it is called the exactAIC, which was suggested by Sugiura [28], but in general, it is difficult to get in LMM.Define∆∗1
(
y)
and∆∗2(
y)
by ∆∗ 1(
y)
= −
2E∗
u∗′P(
θ
∗)
u∗
y
,
∆∗ 2(
y)
=
E∗
u∗′
Σ ∗−1 u∗−
tr[
Σ
Σ ∗−1]
y
,
(17) whereu∗=
y∗−
X
β
(
θ
)
,
Σ=
Σ(
θ
)
,
Σ ∗=
Σ(
θ
∗)
and P(
θ
)
=
Σ(
θ
)
−1X(
X′Σ(
θ
)
−1X)
−1X′Σ(
θ
)
−1.
Then, we suggest two kinds ofAICgiven byAIC1∗
= −
2 logfm(
y|
β
(
θ
),
θ
)
+
2p−
∆∗2(
y),
AIC2∗
= −
2 logfm(
y|
β
(
θ
),
θ
)
−
∆∗1(
y)
−
∆∗2(
y).
(18) In theAppendix, it can be shown thatE
[
AICi∗] =
AI(
θ
)
+
O(
N−1/2)
fori=
1,
2.[2]cAIC∗.AIC is derived from the marginal (or unconditional) distribution ofy, and it measures the prediction error of the predictor based on the marginal distribution. Since the marginal distribution is integrated out with respect to random effects, EBLUP does not appear inAIC. Vaida and Blanchard [29] proposed the conditionalAICas a criterion incorporating EBLUP.
The conditionalAIC is derived as an (asymptotically) unbiased estimator of theconditional Akaike information(cAI) defined by
cAI
(
θ
)
= −
2
log
{
f(
y|
v(
y),
β
(
y),
θ
(
y))
}
f(
y|
v,
β
,
θ
)
f(
y|
v,
β
,
θ
)
f(
v|
θ
)
d
ydydv,
where
β
(
y)
=
β
(
θ
)
and
v(
y)
=
v(
θ
)
=
G(
θ
)
Z′Σ(
θ
)
−1{
y−
X
β
(
θ
)
}
are estimators based ony, andf(
y|
v,
β
,
θ
)
andf(
v|
θ
)
,respectively, are a conditional density function ofygivenvand a marginal density function ofv. Note that
−
2 logf(
y|
v,
β
,
θ
)
=
Nlog(
2π)
+
log|
R(
θ
)
| +
(
y−
Xβ
−
Zv)
′R(
θ
)
−1(
y−
Xβ
−
Zv).
(19) Define∆∗ c1(
y)
and∆ ∗ c2(
y)
by ∆∗ c1(
y)
= −
2
E∗
u∗′
Σ −1
RP(
θ
∗)
u∗+
tr[
R∗
Σ ∗−1]
y
+
N−
2tr[
R
Σ −1]
,
∆∗ c2(
y)
=
E∗
u∗′(
2
Σ −1
RΣ
∗−1−
Σ −1
R
R ∗−1
R
Σ −1)
u∗−
tr[
R ∗−1(
2
R−
R
Σ −1
R)
]
y
+
2N−
2tr[
Σ
−1
R]
,
(20) for
R=
R(
θ
)
and
R∗=
R(
θ
∗)
. Then, we propose the conditionalAICgiven bycAIC∗
= −
2 logf(
y|
v(
θ
),
β
(
θ
),
θ
)
−
∆∗c1(
y)
−
∆∗c2(
y).
(21)In theAppendix, it can be shown thatE
[
cAIC∗] =
cAI(
θ
)
+
O(
N−1/2)
.3. Simulation and empirical studies
In this section, we investigate performances of the proposed procedures through simulation and empirical studies.
3.1. Fay–Herriot model and procedures used for comparison
In the present and next subsections, we treat the Fay–Herriot model and compare the proposed procedures with ones given in the literature by simulation. The basic area level model proposed by Fay and Herriot [15] is described by
ya
=
x′aβ
+
v
a+
ε
a,
a=
1, . . . ,
k,
(22)wherekis the number of small areas,xais ap
×
1 vector of explanatory variables,β
is ap×
1 unknown common vectorof regression coefficients, and
v
a’s andε
a’s are mutually independently distributed random errors such thatv
a∼
N(
0, θ)
and
ε
a∼
N(
0,
da)
for knownda’s. LetX=
(
x1, . . . ,
xk)
′,y=
(
y1, . . . ,
yk)
′, and letvandϵ
be similarly defined. Then,the model is expressed in vector notations asy
=
Xβ
+
v+
ϵ
andy∼
N(
Xβ
,
Σ)
, whereΣ=
Σ(θ)
=
θ
Ik+
DforD
=
diag(
d1, . . . ,
dk)
. In this model,R=
DandG=
θ
Ik.For an estimator
θ
ˆ
ofθ
, Model 2 in(2)is described aswhere
v
a∗’s andε
∗a’s are mutually independently distributed random errors such thatv
∗a|
y∼
N(
0,
θ)
ˆ
andε
a∗∼
N(
0,
da)
for knownda’s. The estimators
θ
ˆ
∗and
β
∗
(
θ
ˆ
∗)
can be obtained fromy∗a,a
=
1, . . . ,
k, by using the same techniques used toobtain
θ
ˆ
and
β
(
θ)
ˆ
.[1]Estimation of MSE of EBLUP.It is supposed that we want to predict
µ
s=
x′sβ
+
v
sfor some indexsamong 1, . . . ,
k,namely, the vectorsaandbused in Section2.2correspond toa
=
xsandb=
jswherejs=
(
0, . . . ,
0,
1,
0, . . . ,
0)
′, namely,thes-th element is one, and the other elements are zero. EBLUP of
µ
sis written as
µ
EB s=
µ
EB
s
(
ys;
β
(
θ),
ˆ
θ)
ˆ
=
x′s
β
(
θ)
ˆ
+ {
1−
γ
s(
θ)
ˆ
}
(
ys−
x′s
β
(
θ)),
ˆ
for
γ
s(θ)
=
ds/(θ
+
ds)
, and the functionss(θ)
,g1(θ)
andg2(θ)
are expressed ass(θ)
= {
1−
γ
s(θ)
}
js,g1(θ)
=
ds{
1−
γ
s(θ)
}
andg2
(θ)
=
γ
s(θ)
2x′s(
X′Σ−1X
)
−1xs.
Concerning the estimation of the MSE of EBLUP
µ
EBs
(
ys;
β
(
θ),
ˆ
θ)
ˆ
, we here handle the following four estimators: One isthe MSE estimator based on the parametric bootstrap method given by
mse∗
(
θ,
ˆ
µ
EB s)
=
2{
g1(
θ)
ˆ
+
g2(
θ)
ˆ
} −
E∗
g1(
θ
ˆ
∗)
+
g2(
θ
ˆ
∗)
y
+
g3∗(
θ),
ˆ
(24) which is from(10), whereg3∗(
θ)
ˆ
=
E∗
{
γ
s(
θ
ˆ
∗)
−
γ
s
(
θ)
ˆ
}
2(
θ
ˆ
+
ds)
|
y
. Butar and Lahiri [5] suggested another MSE estimator based on the parametric bootstrap method given by
mseBL
(
θ,
ˆ
µ
EBs)
=
2{
g1(
θ)
ˆ
+
g2(
θ)
ˆ
} −
E∗
g1(
θ
ˆ
∗)
+
g2(
θ
ˆ
∗)
y
+
E∗
{
µ
EB s(
ys;
β
(
θ
ˆ
∗),
θ
ˆ
∗)
−
µ
EB s(
ys;
β
(
θ),
ˆ
θ)
ˆ
}
2|
y
,
(25) Prasad and Rao [25] and Datta and Lahiri [11] suggested theMSEestimator based on the Taylor series expansion given bymse
(
θ,
ˆ
µ
EBs)
=
g1(
θ)
ˆ
+
g2(
θ)
ˆ
+
2{
γ
s(
θ)
ˆ
3/
ds}
V(
θ)
ˆ
− {
γ
s(
θ)
ˆ
}
2B(
θ),
ˆ
(26)forB
(θ)
=
E[ ˆ
θ
ĎĎ]
andV(θ)
=
Var(
θ
ˆ
Ď)
. This can be also derived from(A.4). In this model, an exact unbiased estimator of theMSEcan be derived by using the Stein identity and [10] provided the unbiased estimator given by
mseE
(
θ,
ˆ
µ
EBs)
=
ds−
2ds∂
∂
ys
γ
s(
θ)
ˆ
{
ys−
x′s
β
(
θ)
ˆ
}
+ {
γ
s(
θ)
ˆ
}
2{
ys−
x′s
β
(
θ)
ˆ
}
2.
(27)[2]Corrected confidence interval. We here treat the following four confidence intervals based on EBLUP: The confidence intervals(13)and(14)with the correction terms using the parametric bootstrap method written by
I1CEB∗
(
θ)
ˆ
:
µ
EB(
θ)
ˆ
±
zα/ 2
1+
h∗1(
θ)
ˆ
max{
mse∗(
θ,
ˆ
µ
EB),
0}
,
(28) I2CEB∗(
θ)
ˆ
:
µ
EB(
θ)
ˆ
±
zα/2
1+
h∗2(
θ)
ˆ
g1(
θ)
ˆ
+
g2(
θ),
ˆ
(29) wheremse∗(
θ,
ˆ
µ
EB)
is given in(24)andh∗ 1(
θ)
ˆ
=
(
1+
zα/22)
{
8g1(
θ)
ˆ
2}
−1E∗[{
g1(
θ
ˆ
∗)
−
g1(
θ)
ˆ
}
2
y]
andh∗2(
θ)
ˆ
=
h∗1(
θ)
ˆ
+ {
g1(
θ)
ˆ
−
E∗[
g1(
θ
ˆ
∗)
|
y] +
g3 ∗(
θ)
ˆ
}
/
{
2g1
(
θ)
ˆ
}
. As a confidence interval based on the Taylor series expansion, we treat the confidence interval with the correction term, given byICEB
(
θ)
ˆ
:
µ
EB(
θ)
ˆ
±
zα/2
1
+
h(
θ)
ˆ
mse
(
θ,
ˆ
µ
EB),
(30)wheremse
(
θ,
ˆ
µ
EB)
is given in(26)andh(θ)
=
(
z2α/2
+
1)
{
8θ
2(θ
+
ds)
2}
−1d2sVar(
θ
ˆ
Ď)
. The confidence interval proposedby Chatterjee et al. [6] is different from ours. As seen from(6), the conditional distribution of
µ
s given yisµ
s|
y∼
N
(
µ
s(
ys, θ,
β
), σ
2
s
(θ))
, whereµ
s=
x′sβ
+
v
s,
µ
s(
ys, θ,
β
)
=
x′
s
β
+
θ(θ
+
ds)
−1(
ys−
x′sβ
)
andσ
s2(θ)
=
dsθ(θ
+
ds)
−1.Thus,
σ
s(θ)
−1{
µ
s−
µ
s(
ys, θ,
β
)
} ∼
N(
0,
1)
. Although this suggests to construct a confidence interval from the distributionof
σ
s(
θ)
ˆ
−1{
µ
s−
µ
s(
ys,
ˆ
θ,
β
(
θ))
ˆ
}
, the distribution is not normal. DefineL(
z)
andL∗(
z)
by L(
z)
=
P
σ
s(
θ)
ˆ
−1{
µ
s−
µ
s(
ys,
θ,
ˆ
β
(
θ))
ˆ
} ≤
z
,
L∗(
z)
=
P
σ
s(
θ
ˆ
∗)
−1{
µ
∗s−
µ
s(
y ∗ s,
θ
ˆ
∗,
β
∗(
θ
ˆ
∗))
} ≤
z
,
whereµ
∗ s=
x ′s
β
(
θ)
ˆ
+
v
s∗. Chatterjee et al. [6] proved thatL(
z)
can be approximated byL∗(
z)
with the second-orderaccuracy, and proposed the confidence interval of
µ
sgiven byICLL
=
µ
s(
ys,
θ,
ˆ
β
(
θ))
ˆ
−
q1σ
s(
θ),
ˆ
µ
s(
ys,
θ,
ˆ
β
(
θ))
ˆ
−
q2σ
s(
θ)
ˆ
,
(31)[3]AIC. Two kinds of Akaike Information Criteria based on the parametric bootstrap method are given by AIC1∗
= −
2 logfm(
y|
β
(
θ),
ˆ
θ)
ˆ
+
2p−
∆∗2(
y),
AIC2∗= −
2 logfm(
y|
β
(
θ),
ˆ
θ)
ˆ
−
∆ ∗ 1(
y)
−
∆ ∗ 2(
y),
(32) where∆∗ 1(
y)
= −
2E∗[
u∗′P(
θ
ˆ
∗)
u∗
y]
and∆∗2(
y)
=
E∗[
u∗′
Σ ∗−1 u∗−
tr[
Σ
Σ ∗−1]
y]
for
Σ=
Σ(
θ
)
,Σ
∗=
Σ(
θ
∗)
and u∗=
y∗−
X
β
(
θ)
ˆ
. On the other hand,AICbased on the Taylor expansion is described asAIC
= −
2 logfm(
y|
β
(
θ),
ˆ
θ)
ˆ
−
∆(θ),
ˆ
(33)where∆(θ)
= −
2p−
E
tr[
∇
y∇
′yθ
ˆ
Ď]
. This was derived by Kubokawa [21].
[4]Conditional AIC. The conditionalAICbased on the parametric bootstrap method is given by
cAIC∗
= −
2 logf(
y|
v(
θ),
ˆ
β
(
θ),
ˆ
θ)
ˆ
−
∆∗c1(
y)
−
∆∗c2(
y),
(34) where∆∗c1(
y)
= −
2{
E∗
u∗′
Σ−1 DP(
θ
ˆ
∗)
u∗+
tr[
D
Σ ∗−1]
y
+
k−
2tr[
D
Σ −1]}
and∆∗c2(
y)
=
2E∗
u∗′
Σ−1 D
Σ ∗−1 u∗
y
−
2tr[
D
Σ −1]
. Kubokawa [21] derived the conditionalAICbased on the Taylor series expansion given ascAIC
= −
2 logf(
y|
v(
θ),
ˆ
β
(
θ),
ˆ
θ)
ˆ
−
∆c
(
θ),
ˆ
(35)where∆c
(θ)
= −
2ρ(θ)
−
2tr[
DΣ−1E[
∇
y∇
′yθ
ˆ
Ď]]
.[5]Estimation of
θ
. As an estimator ofθ
, in this paper, we use the truncated Fay–Herriot estimatorθ
ˆ
=
max{
θ
0,
k−1}
,where
θ
0is the solution of the equationLFH(θ
0)
=
0, whereLFH(θ
0)
=
y′(
Σ(θ
0)
−1−
P(θ
0))
y−
(
k−
p)
. For the truncatedFay–Herriot estimator,B
(θ)
=
2{
ktr[
Σ−2] −
(
tr[
Σ−1]
)
2}
/(
tr[
Σ−1]
)
3,V(θ)
=
2k/(
tr[
Σ−1]
)
2,∇
yθ
ˆ
=
2Σ−1y/
y′Σ−2yI(
θ >
ˆ
k−1)
,∆(θ)= −
2(
p+
1)
and∆c
(θ)
= −
2ρ(θ)
−
2tr[
Σ−2D]
/
tr[
Σ−1]
. For other estimators including ML, REML,Prasad–Rao and modified Fay–Herriot estimators, see [20] who summarized their biases and variances up to second order.
3.2. Simulation results
We investigate the performances of the proposed procedures by simulation and compare them with some existing procedures given in the literature. For the purpose, we adopt part of the simulation framework of [12] for our study. We consider the Fay–Herriot model(22)withk
=
15,θ
=
1 and twodi-patterns: (a) 0.7, 0.6, 0.5, 0.4, 0.3; (b) 4.0, 0.6, 0.5, 0.4,0.1, which correspond to patterns (a) and (c) of [12]. Pattern (a) is less variable indi-values, while pattern (b) has larger
variability. There are five groupsG1
, . . . ,
G5and three small areas in each group. The sampling variancesdiare the same forarea within the same group.
[Simulation experiment I].Let us consider the case thatx′
a
β
=
0 for simplicity as handled in [6]. Then,µ
s=
v
s,
µ
EB
s
= {
1−
γ
s(
θ)
ˆ
}
ysandMSE(θ,
µ
s(
ˆ
θ))
=
ds−
dsγ
s(θ)
+
E[{
γ
s(
θ)
ˆ
−
γ
s(θ)
}
2y2s]
forγ
s(θ)
=
ds/(θ
+
ds)
. We preparethe true values ofMSE
(θ,
µ
s(
θ))
ˆ
in advance, which can be computed based on 100,000 simulated data. The relative bias andthe risk functions ofMSEestimatormsesare given by
Bs
(θ,
mses)
=
E
mses−
MSE(θ,
µ
s(
ˆ
θ))
/
MSE(θ,
µ
s(
θ)),
ˆ
Rs(θ,
mses)
=
E
mses−
MSE(θ,
µ
s(
θ))
ˆ
2
/
{
MSE(θ,
µ
s(
θ))
ˆ
}
2.
For confidence intervalCIsof
µ
s=
v
s, the coverage probability and the expected length ofCIsare given byCPs
(θ,
CIs)
=
P
µ
s∈
CIs
,
ELs(θ,
CIs)
=
E
Length ofCIs
.
These values are computed as average values based on 10,000 simulation runs where the size of the bootstrap sample is 1000. Further, those values are averaged over areas within groupsGi,i
=
1, . . . ,
5. The simulation results are generatedusing Ox version 5.10 (see [13]) and the Arfima package version 1.00 [14].
Concerning theMSEestimation, we handle the four estimatorsmse,mse∗,mseBL andmseEgiven in(24)–(27), which are referred as TLap, PBap, PBbl and ExactU, respectively. The values of their relative biases 102
×
Bs
(θ,
mses)
and risks102
×
Rs(θ,
mses)
by simulation are reported inTable 1. Since the ExactU is an unbiased estimator, it is clear that the valuesof the bias of ExactU are small, but it has very large risks. This means that estimating theMSEunbiasedly does not necessarily lead to improvement of the risk, but rather yields large variability in general. Investigating the relative biases and risks of TLap, PBap and PBbl in details, we can see that those values of TLap, PBap and PBbl are reasonably small. In comparison of their relative risks, it is seen that TLap has a slightly smaller risk than PBap, which has also a little bit smaller risk than PBbl,
Table 1
Values of relative biases and risks of the fourMSEestimators forθ=1.
Bias102×B(θ,mse) Risk102×R(θ,mse)
di MSE TLap PBap PBbl ExactU TLap PBap PBbl ExactU
Pattern (a) G1 0.7 0.441 −0.28 −0.20 −0.24 0.37 4.3 6.8 8.0 140.0 G2 0.6 0.402 −0.09 0.36 0.22 −1.02 3.4 6.0 7.2 108.3 G3 0.5 0.357 0.12 1.06 1.06 −0.13 2.5 5.1 6.5 86.8 G4 0.4 0.305 0.40 1.96 2.00 −0.56 1.5 4.2 5.7 63.6 G5 0.3 0.246 0.89 3.25 3.52 −0.54 0.6 3.3 5.2 43.5 Pattern (b) G1 4.0 0.854 −1.69 −2.57 −2.16 1.97 22.3 22.7 24.9 2785.6 G2 0.6 0.403 −0.38 0.25 0.09 0.26 4.6 6.6 7.9 115.3 G3 0.5 0.358 −0.17 0.93 0.86 −0.37 3.5 5.6 7.0 87.7 G4 0.4 0.306 0.07 1.82 1.92 −0.13 2.4 4.5 6.2 64.9 G5 0.1 0.094 2.10 6.86 8.08 −0.11 0.2 1.6 4.6 11.3 Table 2
Values of coverage probability and expected length of the four confidence intervals with nominal confidence coefficient 95% forθ=1. Cov. prob.102×CP Expected lengthEL
di TLap PB1 PB2 PBcll TLap PB1 PB2 PBcll Pattern (a) G1 0.7 96.2 95.0 95.2 95.4 2.85 2.72 2.73 2.82 G2 0.6 96.3 95.2 95.4 95.5 2.72 2.59 2.60 2.68 G3 0.5 96.2 95.2 95.4 95.5 2.56 2.44 2.45 2.51 G4 0.4 96.1 95.3 95.4 95.3 2.36 2.26 2.27 2.31 G5 0.3 96.2 95.6 95.7 95.5 2.11 2.03 2.03 2.05 Pattern (b) G1 4.0 96.2 94.7 95.8 95.2 4.05 3.95 3.99 4.15 G2 0.6 96.1 95.1 95.3 95.2 2.66 2.60 2.61 2.67 G3 0.5 96.1 95.1 95.3 95.2 2.50 2.45 2.45 2.50 G4 0.4 96.2 95.4 95.5 95.4 2.30 2.26 2.27 2.30 G5 0.1 95.6 95.8 95.9 95.3 1.25 1.26 1.26 1.23
but their differences are little. Thus, in this simulation experiment, the estimators PBap and PBbl based on the parametric bootstrap method are as good as TLap in both patterns (a) and (b).
Concerning the interval estimation, we handle the four confidence intervalsICEB,ICEB∗
1 ,I
CEB∗
2 andICLLgiven in(28)–(31),
which are referred as TLap,PB1,PB2and PBcll, respectively. The values of their coverage probabilities 102
×
CPand expectedlengthsELby simulation are reported inTable 2. In both patterns, the four confidence intervals satisfy the nominal confidence level, but TLap has a tendency to take slightly larger coverage probability than 95%. It is also seen thatPB1andPB2have
smaller expected lengths than TLap and PBcll exceptG5in pattern (b). The difference betweenPB1andPB2 appears in
the coverage probability atG1in pattern (b). In fact,mse∗sometimes takes negative values in this case, and the resulting confidence intervalPB1cannot construct an interval, which yields 94.7% coverage probability atG1, slightly smaller than
the nominal confidence coefficient.PB2and PBcll are free from such a drawback. Thus, these observations show that the
confidence intervalPB2has a good performance in this simulation experiment.
Although simulation results are not reported here, we have investigated the performances of the parametric bootstrap proceduresPB1,PB2and PBcll with the Prasad–Rao estimator for
θ
. The simulation results show that their performances arenot bad in pattern (a), but for pattern (b), their coverage probabilities are much smaller than 95%. As defined in(23), the conditional distribution ofy∗
agivenyisN
(
x′
a
β
(
θ),
ˆ
θ
ˆ
+
da)
, and the variability ofy∗strongly depends on the estimateθ
ˆ
. Thisimplies that we need to use an estimator of
θ
with higher precision. Thus, in this simulation experiment, we employ the Fay–Herriot estimator instead of the Prasad–Rao estimator.Since [6] used 1000 bootstrap sample size, we have treated the same sample size. It may be interesting to investigate what happens in the case of 100 bootstrap sample size. Although the simulation result is omitted here, it tells us that in the case of 100 bootstrap sample size, the coverage probability of PBcll is lower than 95%, butPB1andPB2still satisfies
the nominal level. This suggests that 1000 bootstrap sample size is necessary for PBcll, butPB1andPB2work well in the
simulation experiment with 100 bootstrap sample size.
[Simulation experiment II].We next investigate the performances of AIC and conditionalAIC derived in the previous sections through simulation and compare them in terms of the relative frequencies of selecting the true model.
Table 3
Relative frequencies (%) of candidate models selected by the five criteriaAIC,AIC∗
1,AIC
∗
2,cAICandcAIC
∗
fork=15 andθ =1: the dimension of a full model isp=7 and the true model is (3).
(m) Pattern (a) Pattern (b)
AIC AIC∗ 1 AIC ∗ 2 cAIC cAIC ∗ AIC AIC∗ 1 AIC ∗ 2 cAIC cAIC ∗ (1) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.6 (2) 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.3 3.2 (3) 61.7 59.8 65.2 68.1 75.6 59.0 57.0 64.8 62.5 66.2 (4) 13.2 13.3 12.0 13.0 10.3 13.0 13.3 11.2 13.7 11.5 (5) 9.2 9.4 7.8 7.9 5.1 10.0 10.3 8.1 9.8 6.5 (6) 7.2 7.7 6.5 5.5 3.8 8.4 8.8 7.1 7.0 5.0 (7) 8.6 9.7 8.2 5.2 5.0 9.4 10.3 8.6 6.5 5.7 (3) + (4) 74.9 73.1 77.2 81.1 85.9 72.0 70.3 76.0 76.2 77.7
Letx1
, . . . ,
xkbe generated fromNp(
0,
Σx)
whereΣx=
(
1−
ρ
x)
Ip+
ρ
xJpforρ
x=
0.
1, whereJp=
jpjp′forjp=
(
1, . . . ,
1)
′,ap-vector of ones. In this experiment, we consider a class of the nested models denoted by
(
m)
, which is described by(
m)
y=
Xβ
(m)+
Zv+
ϵ
,
for m=
1, . . . ,
p,
where
β
(m)=
(β
1, . . . , β
m,
0, . . . ,
0)
′, andvandϵ
are mutually independent random variables havingv∼
Nk(
0, θ
I)
andϵ
∼
Nk(
0,
D)
. It is noted that the firstmcomponents ofβ
(m)are not zero and the others are zero. Suppose that the truemodel is given by
(
p∗)
y=
Xβ
∗+
v+
ϵ
,
where 1
≤
p∗≤
pandβ
∗=
(β
1, . . . , β
p∗,
0, . . . ,
0)
′forβ
ℓ=
2(
−
1)
ℓ+1{
1+
1.
3ℓ+1}
, 1≤
ℓ
≤
p∗. We here handle the case thatθ
=
1,p=
7,p∗=
3 andk=
15. Thus, the true model is(
3)
, and the candidate models are(
1), . . . , (
7)
. Corresponding to the model(
m)
, we generate random variables from the parametric bootstrap modely∗=
X
β
(m)
+
Zv∗+
ϵ
∗, wherev∗ andϵ
∗are defined below(23).We compare selection criteria in the sense of relative frequency of selecting the true model
(
3)
. The criteria we examine are the existing proceduresAICandcAICbased on the Taylor series expansion, and the proposed onesAIC∗1,AIC
∗
2andcAIC
∗
based on the parametric bootstrap method. For each criterion and each candidate model
(
m)
, the number of selecting the model(
m)
is counted for 10,000 data set. We thus obtain the relative frequencies of the model(
m)
selected by the criteria by dividing the number by 10,000. These relative frequencies are reported inTable 3, where the last column denotes the sum of two relative frequencies of (3) and (4). Although the model (4) is not true, it includes the true model (3), so that it may be not bad to look at the sum of the two relative frequencies. FromTable 3, it is seen thatAIC,AIC1∗andAIC2∗have similar performances, andAIC∗2is slightly better thanAICandAIC
∗
1. Concerning the conditionalAIC,cAIC
∗performs similarly tocAIC, andcAIC∗is better thancAIC. In this simulation experiment, it is seen that among the five criteria,cAIC∗is superior in light of relative frequency of selecting the true model.
3.3. Example of posted land price data
We apply the proposed procedures to the posted land price data along the Keikyu train line which connects the suburbs in Kanagawa prefecture to the Tokyo metropolitan area. This data set was used by Kubokawa [19] who gave confidence intervals based on EBLUP.
A data set of the posted land price data in 2001 and their covariates are available for 48 stations on the Keikyu train line, and we consider each station as a small area, namely,k
=
48. For theath station, there are data ofnaland spots, where theaverage ofna’s is 3.73. Since those who live in the suburbs take this line to work or study in Tokyo on weekdays, it may be
expected that the land price depends on the distance from Tokyo. Forb
=
1, . . . ,
na, we use five kinds of observationsyab,TRNa,DSTab,FOOTabandFARab, whereyabdenotes the value of the posted land price (Yen in hundred of thousands) perm2
of thebth spot,TRNais the time to take by train from the stationato the Tokyo station around 8:30 in the morning,DSTab
is the geographical distance from the spotbto the nearby stationa,FOOTabis the time to take on foot from the spotbto
the nearby stationaandFARabdenotes the floor-area ratio of the spotb. As regressor variables, we consider nine variables
FARab,TRNa,TRNa2,DSTab,DSTab2,FOOTab,FOOTab2,TRNa
×
DSTabandTRNa×
FOOTab, which are denoted byx1,x2,x3,x4,x5,x6,x7,x8andx9. Also, the constant term is denoted byx0.
To analyze the data, we employ the nested error regression model (NERM), which is described as
yab
=
x′abβ
+
v
a+
ε
ab,
a=
1, . . . ,
k,
b=
1, . . . ,
na,
(36)wherex′abis a vector of values of the explanatory variables
(
x0,
x1, . . . ,
x9)
,N=
ka=1na, and
v
a’s andε
ab’s are mutuallyindependently distributed as
v
a∼
N(
0, σ
v2)
andε
ab∼
N(
0, σ
2)
for unknown variance componentsσ
v2andσ
2. For theunknown variance components, we here use the Prasad–Rao estimators given by Prasad and Rao [25]. We consider nested error regression model(36)with the regressorsx0–x9as a full model.