Small Area Estimation Under Linear and Generalized
Linear Mixed Models With Time and Area Effects
Ayoub Saei, Ray Chambers
Abstract
This is the theory component of the report on small area estimation theory that was prepared as part of Southampton’s involvement in the EURAREA “Enhancing small area estimation techniques to meet European Needs” project IST 2000-26290 in the European Union’s Fifth Research And Technological Development Framework Programme.
Small Area Estimation Under Linear and Generalized
Linear Mixed Models With Time and Area Effects
Ayoub Saei and Ray Chambers
Southampton Statistical Sciences Research Institute
University of Southampton
Highfield, Southampton
SO17 1BJ
United Kingdom
September 2003
Abstract
This is the theory component of the report on small area estimation theory that
was prepared as part of Southampton’s involvement in the EURAREA
“Enhancing small area estimation techniques to meet European Needs” project
IST 2000-26290 in the European Union’s Fifth Research And Technological
1 Estimation Under a Linear Mixed Model With Time and Area Effects
Let the vector y = {ydti} consist of the population values of the survey variable of interest Y.
The subscripts d, t and i (d = 1,2,…,D; t = 1,2,…,T; i = 1,2,…,Ndt) represent area, time and
unit respectively. Let X be the matrix of population values of the auxiliary covariates Xdti.
Assuming that the
) (dt j
u , j = 1, 2, …, J represents random effects that are related to area and
time, the objective is to estimate/predict the value of the vector θ =ay for a given matrix a. Initially we assume that the values in y follow a linear mixed model with random area and time effects of the form:
(1.1) ydti =β0+ ′ X dtiβt+ uj(dt)
j=1
J
∑
+edti.The vector βt contains regression coefficients at time t = 1, 2, .. T, β0 is the intercept, the
random effects {uj(dt)}and {edti} are assumed to be mutually independent and to follow normal
distributions with zero means and respective variances. The (dt) in uj(dt) indicates that these
random effects are related to the specific area d and the specific time t. Linear models with spatially correlated area effects and linear models with independent and autocorrelated time effects are special cases of model (1.1) with J = 1 and J = 2 respectively. Put
] ,... ,
[ 1′ ′2 ′ ′
= u u uJ
u , and e=[edti] and let Zjbe the “incidence” matrix for the random effects
vector uj. The model (1.1) can then be written in matrix form as
(1.2) y = Xβ +Zu + e
where Z =[Z1, Z2,…,ZJ]. The random vectors uj are assumed to be distributed as multivariate
normal with zero mean vectors and variance-covariance matrices σj
2Ω
j(ρ), where ρ is a vector
of correlation/covariance parameters. The vector e is a normal error vector with zero mean and variance σ2W, where W is a known square matrix of order N and Ωj(ρ) is a matrix of
order equal to the rank of the matrix Zj. The covariance matrix of y is then
σ2(W+ ϕ
jZjΩjZ ′ j) j=1
J
∑
=σ2(W+ZΩ ′ Z )=σ2Σwhere ϕj =σ2j /σ2and Ω = diag(ϕjΩj(ρρ)).
These subscripts will be used below to denote conformable partitions of other vectors and matrices. Thus, the model (1.2) is partitioned conformably as
(1.3) ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + + + = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ r r r r r r r s r s e e e u Z X e u Z X e e u Z Z X X y y r s s s s s s s η η β β β
where X = [XsXr], Z = [ZsZr], ηs=Xsβ +Zsuand ηr =Xrβ +Zru. Similarly, the matrix a is
partitioned conformably as a = [asar]. The vector-valued parameter of interest θ =ay can
then be written
(1.4) θ =asys+aryr.
The first term in (1.4) depends only on the sample values and is known after the sample is observed. The second term, which depends on the non-sample values, is unknown. The estimate or predicted value of θ, say θ ˆ , is then obtained by replacing yr with its predicted
value in (1.4). Let E(yr|u) = ηr = Xrβ +Zru. The predictor of yr is then η ˆ r = Xrβˆ+Zruˆ,
where the estimates β ˆ and u ˆ are obtained by fitting the linear mixed model (1.5) ys = Xsβ + Zsu + es
to the sample data. This leads to the predicted value
(1.6) θ =ˆ asys+ary ˆ r =asys+arη ˆ r=asys+ar(Xrβ +ˆ Zru ˆ ).
2 Best Linear Unbiased Prediction (BLUP)
The linear mixed model (1.5) characterises variation between areas and variation over time in the values of the characteristic Y. The aim is to predict or estimate θ =ay using the sample values ys. For known variance components in the model (1.5), the BLUP method provides a
predictor of θ that is an unbiased linear function of the sample values ys and has minimum
variance among all other linear unbiased predictors/estimators of θ. The method requires that the covariance between the observable random variable ys and the non-observable random
variable u be known. It does not impose a normality assumption on the random effect u. Under (1.5) the covariance between u and ys is σ2Z ′ sΩ. In this case, following Henderson
(1963), the best linear unbiased predictor (BLUP) of ηr is
(2.1) η ˜ r = Xrβ +ˆ Zru ˆ
(2.2) β =ˆ (X ′ sΣs−1X
s)
−1X ′ sΣs−1y
s and ˆ u = ΩΩ ′ Z sΣs
−1(y
s−Xsβ ˆ ).
Note that β ˆ is Aitken's generalized least squares (GLS) estimator of β under (1.5). Replacing
uˆ in (2.1) by its value from (2.2), and ηˆr by η~r in the right hand side of (1.6) gives (2.3) θ =˜ asys+arη ˜ r =asys+arXrβ +ˆ arZrΩ ′ Z sΣs−1(y
s−Xsβ ˆ ).
An alternative form of (2.3) which is useful in computation can be obtained from following theorem.
Theorem: Put Ts* =(Ω−1+ ′ Z
sWs
−1Z
s)
−1=[T
sjj ′
* ]
and Σs=Ws+ZsΩ ′ Z s. Then
a) Σs−1=Ws−1−W
s
−1Z
sTs
*Z ′
sWs
−1
b) Z ′ sΣs−1= ΩΩ−1T
s
*Z ′
sWs
−1
c) Ω ′ Z sΣs−1=T
s
*Z ′
sWs
−1
d) Z ′ sΣs−1Z
s= ΩΩ
−1− ΩΩ−1T
s
*Ω−1
e) Z ′ sjΣs−1Z
sj ′ = ϕj
−1Ωj−1−ϕ
j
−2Ωj−1
Tsjj
*Ωj−1
for j = j ′
−ϕj
−1ϕ ′
j
−1Ωj−1
Tsj*j ′ Ω−j ′ 1 otherwise ⎧
⎨ ⎩
Using the above Theorem an alternative formulation of (2.3) is (2.4) θ =˜ asys+arXrβ +ˆ arZrTs*Z ′
sWs
−1(y
s−Xsβ ˆ )=asys+ary ˆ r.
The mean cross-product error (MCPE) matrix of the BLUP θ ˜ is obtained by writing
˜
θ − θθ =ar( ˆ y r−yr)=ar( ˜ η r−yr)=ar( ˜ η r−ηηr+ηηr−yr)=ar[( ˜ η r−ηηr)−(yr−ηr)]. Let X*r=arXr
and Zr*=arZr. The mean cross-product error matrix of BLUP estimator θ ˜ is then
(2.5)
MCPE( ˜ θ )=E[( ˜ θ − θθ)( ˜ θ − θθ ′ ) ]= X*r Z
r
*
(
)
E ( ˆ β −ββ)( ˆ u −u) ⎡ ⎣ ⎢
⎤ ⎦ ⎥
( ˆ β −ββ) ( ˆ u −u) ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ′ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟
X*r′ Zr *′ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟
⎟ + σ2arWra ′ r
=σ2[X
r
*− Z
r
*T
s
*Z ′
sWs
−1X
s](X ′ sΣs
−1X
s)
−1[X
r
*′ − ′ X
sWs
−1Z
sTs
*Z
r
*′]+σ2a
rWra ′ r =G1(ω)+G2(ω)+G4(ω)
where
G1(ω) = σ2Zr
*
Ts*Z*r′,
G2(ω) = σ2[Xr
*−
Z*rTs*Z ′ sWs−1Xs](X ′ sΣs−1Xs)−1[X*r′ − ′ X sWs−1ZsTs*Zr*′], G4 =σ2arWra ′ r
3 Empirical Best Linear Unbiased Prediction (EBLUP)
The BLUP development set out in the preceding section assumes variance components are known. In practice of course, this is hardly ever the case. We therefore need to estimate these parameters from the sample data. The empirical best linear unbiased prediction (EBLUP) method replaces them by sample-based estimates in the BLUP.
The two methods most frequently used to estimate variance components are maximum likelihood (ML) and residual maximum likelihood (REML). Kackar and Harville (1981) showed that the ML and REML variance component estimators are translation invariant and even functions of the data. In this section we derive the estimating equations for the variance component vector, ω =(σ2,ϕ ′ , ρ ′ ′ ) , under ML and REML.
3.1 Maximum Likelihood Estimation of the Variance Components (ML)
Maximum likelihood estimation of the regression parameter vector β and the variance components vector ω requires parametric specification of the distribution of the random variable ys. A standard assumption is that this variable has a multivariate normal distribution.
Given that assumption, we can write down the log-likelihood function generated by the observation vector ys under the general linear mixed model (1.5) as
l= −(1/2)[nln(2πσ2)+ln |Σs−1|+σ−2(y
s−Xsβ ′ ) Σs
−1(y
s−Xsβ)].
Differentiation of this log-likelihood function with respect to the parameters of this distribution leads to the ML score functions
(3.1.1) ∂l/∂β = σ−2X ′ sΣs−1(y
s−Xsβ)
(3.1.2) ∂l/∂σ2= −(1/2)[nσ−2−σ−4(y
s−Xsβ ′ ) Σs
−1(y
s−Xsβ)]
(3.1.3) ∂l/∂ϕj = −(1/2)[tr(Σs
−1
ZsjΩjZ ′ sj) −σ−2
(ys−Xsβ ′ ) Σs−1ZsjΩjZ ′ sjΣs−1(ys−Xsβ)] for j = 1, 2,…
(3.1.4) ∂l/∂ρh = −(1/2)[tr(Σs
−1Z
s(∂ΩΩ/∂ρh)Z ′ s) −σ−2
(ys−Xsβ ′ ) Σs−1Zs(∂ΩΩ/∂ρh)Z ′ sΣs−1(ys−Xsβ)] for h = 1, 2,…
Equating (3.1.1) to zero yields the ML estimating equations for β. For fixed variance components vector ω, it is clear that the MLE for this parameter is then just its GLS estimator
ˆ
β ML =(X ′ sΣs
−1
Equating the score functions in (3.1.2) – (3.1.4) to zero yield the ML estimating equations for
h
j ρ
ϕ
σ2 , and . For fixed ϕj andρh, equating (3.1.2) to zero gives the ML estimate of σ2 as
(3.1.3) σ ˆ ML2 =
n−1y ′ sΣs−1(ys−Xsβ ˆ )=n−1y ′ sWs−1(ys−Xsβ −ˆ Zsu ˆ )
where uˆ is given by (2.2) in the previous section.
The estimating equations for the variance components ϕj and ρh have no analytic
solution, and so have to be solved numerically. Various methods for computing ML estimates of ω have been used in literature. Henderson (1963, 1973, 1975), Harville (1977) and Fellner (1986, 1987) have proposed iterative procedures for calculating ML estimates under different normal variance components models. We adapt these methods in the following iterative algorithm for calculating the ML estimates of ϕj, σ2 and ρh.
1. Assign initial values to the variance components ϕj, ρh and σ2.
2. Using the current values for these variance components, calculate Σs and Ω. 3. Update β =(X ′ sΣs−1Xs)−1X ′ sΣs−1ys.
4. Update u= ΩΩ ′ Z sΣs−1(ys−Xsβ).
5. Update σ2=n−1y ′ sWs−1(ys−Xsβ −Zsu). 6. Calculate Ts* =(Ω−1+ ′ Z sWs−1Zs)−1=[Tsj*j ′ ].
7. Update ϕj =ν−j1(tr(Tsjj*Ωj−1)+σ−2u ′ jΩj−1uj) where νj is the rank of the matrix Zsj.
8 . Update ρh = fh(ρ,ϕ,Ts*,σ2,u) where fh is a known function whose specification
depends on the parameterization of Ω(ρ), and current values for variance components are used in the right hand side of this equation.
9. Return to step 2 and repeat the procedure until the values of the different parameters converge.
At convergence the MLE-based empirical best linear unbiased estimate (EBLUP) of θ i s calculatedby substituting the converged values of β and u above as the corresponding estimates in (2.1) and then computing (2.3).
3.2. Residual Maximum Likelihood Estimation of the Variance Components (REML)
transforming ys into two independent vectors y1 = K1ys and y2 = K2ys. The y1 vector has a
distribution that does not depend on the fixed effect β and hence satisfies E(K1ys) = 0, i.e.
K1X = 0, while the y2 vector is independent of y1 and satisfies K1ΣsK ′ 2 = 0. The matrix K1 is
chosen to have maximum rank, i.e. n-p, and so the rank of K2 is p. The likelihood function of
ysis then the product of the likelihoods generated by y1 and y2. Following Patterson and
Thompson (1971), the REML estimators of the variance components are then the maximum likelihood estimators of these parameters based on y1. That is, the REML method estimates
the variance components vector ω by maximising the log-likelihood function
lREML = −(1/2)[(n−p)ln2πσ2+ K
1ΣsK1 +σ −2y ′
sK1(K1ΣsK1) −1K
1ys]
where K1=Ws−1−Ws−1Xs(X ′ sWs−1Xs)−1X ′ sWs−1 for the model (1.5). Note that if K1ΣsK1 is not of full rank, then |K1ΣsK1| must be interpreted as the determinant of its linearly independent rows and columns. Given this definition of K1, the matrix K2 is defined as K2 = ′ X sΣs
−1
. The log-likelihood function defined by y2 = K2 ys is
lL = −(1/2)[pln2πσ2+lnK2ΣsK ′ 2 +σ−2(y2−E(y2)) (′ K2ΣsK ′ 2)−1(y2−E(y2))]
= −(1/2)[pln2πσ2+lnX ′ sΣs−1Xs +σ−2(ys−Xsβ ′ ) Σs−1Xs(X ′ sΣs−1Xs)−1XsΣs−1(ys−Xsβ)].
For given values of the variance components, β is then estimated by maximizing lL, leading to
ˆ
β =(X ′ sΣs−1Xs)−1X ′ sΣs−1ys.
Differentiation of lREML with respect to the variance components yields the REML estimating equations
(3.2.1)
∂lREML/∂σ2 = −(1/2)[(n−p)σ−2−σ−4y ′ sK1(K1ΣsK1)−1K1ys] ∂lREML/∂ϕj = −(1/2)[{tr(Kjϕ)−σ−2y ′
sKjϕK1(K1ΣsK1) −1K
1ys] ∂lREML/∂ρh = −(1/2)[{tr(Khρ)−σ−2y ′ sKhρK1(K1ΣsK1)−1K1ys]
where Kjϕ =K1(K1ΣsK1)−1K1∂Σs/∂ϕj and Khρ =K1(K1ΣsK1)−1K1∂Σs/∂ρh. For given values
of ϕj and ρh, equating the REML score function for σ2 in (3.2.1) to zero leads to a REML
estimate of this parameter that satisfies
(3.2.2)
ˆ σ REML
2 =(n−p)-1y ′
sK1(K1ΣsK1) −1K
1ys =(n−p)-1y ′ sΣs−1(ys−Xsβ ˆ ) =(n−p)-1y ′
sWs
−1(y
s−Xsβ −ˆ Zsu ˆ ).
Equating the REML score functions for ϕj and ρh in (3.2.1) to zero lead to non-linear
(1986, 1987) we define an iterative algorithm to calculate the REML estimates of the variance components in (1.5). It is identical to the algorithm for calculating the ML estimates defined earlier when one replaces Ts* there by Ts below.
1. Assign initial values to the variance components ϕj, ρh and σ2.
2. Using the current values for these variance components, calculate Σs and Ω 3. Update β =(X ′ sΣs−1Xs)−1X ′ sΣs−1ys.
4. Update u= ΩΩ ′ Z sΣs−1(ys−Xsβ).
5. Update σ2=(n−p)−1y ′ sWs−1(ys−Xsβ −Zsu). 6. Calculate Ts* =(Ω−1+ ′ Z sWs−1Zs)−1=[Tsj*j ′ ].
7. Calculate Ts = [Tsjj′] =
* 1
1 *
*
)
( s s s s s s
s s s
s TZ X X X X Z T
T + ′ ′Σ− − ′ .
8. Update ϕj =ν−j1(tr(TsjjΩj−1)+σ−2u ′ jΩj−1uj) where νj is the rank of the matrix Zsj.
9 . Update ρh = fh(ρ,ϕ,Ts,σ2,u) where fh is a known function whose specification
depends on the parameterization of Ω(ρ), and current values for variance components are used in the right hand side of this equation.
10.Return to step 2 and repeat the procedure until the values of the different parameters converge.
As with the MLE, the REML-based empirical best linear unbiased estimate (EBLUP) of θ is then calculatedby substituting the converged values of β and u above as the corresponding estimates in (2.1) and then computing (2.3).
3.3 Estimating the Mean Squared Error of the EBLUP
To start, we assume that the variance components are estimated via ML. The prediction error
of θˆ is θ − θθ =ˆ ar( ˆ y r−yr)=ar( ˆ η r−yr)=ar( ˆ η r−ηηr+ηηr−yr)=ar[( ˆ η r−ηηr)−(yr−ηr)]. T h e
mean cross-product error matrix is therefore MCPE(θˆ ) = E[( ˆ θ − θθ)( ˆ θ − θθ ′ ) ]. After some algebra, it can be shown that this matrix is
(3.3.1) MCPE( ˆ θ )≅MCPE( ˜ θ )+G3(ω)=G1(ω)+G2(ω)+2G3(ω)+G4(ω)
where G1(ω), G2(ω) and G4(ω) are given in section 2, and G3(ω) is the square matrix of the
of the ML estimates of the variance components γ =( ˆ ϕ ′ , ˆ ρ ′ ′ ) and, taking θ ˜ α to be αth element
of the BLUP θ ˜ , Gα ′ α =Cov(∂ ˜ θ α
∂γγ , ∂θ ˜
′ α
∂γγ ). Note that B is defined by the “γ-component” of the
inverse of the Fisher information matrix for the variance components ω. Under the model (1.5) this information matrix is
1 2
nσ-4 σ-2[ϕj
−1
(νj−rj
*
)] σ-2[ (−vj
(h)
+rj
*(h) )
j=1
J
∑
][ϕj
−2
{(νj- 2rj
*
)δjj ′ +ϕj
−2ϕ ′
j
−2 rjj ′
*
}] [ϕj
−1 (2rj
*(h)− vj
(h)− ϕ
j
−1ϕ ′
j
−1 ′
j =1
J
∑
rjj ′*(h) ]
[ ({ ϕ−j1ϕ
′
j
−1 rj*(j ′ hh ′ )
}+v(jhh ′ )− 2rj*(hh ′ )
)] ′
j =1
J
∑
j=1
J
∑
⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥where δij is a Kronecker delta,
* j
r = ϕ-1j tr(Ω−j1Tsjj* ),
rj*j ′ =tr(T
sj j′
* Ωj−1T
sjj ′
* Ω ′
j
−1)
,
) *(h j
r = tr[ϕ−j1(∂Ω−j1/∂ρh)Tsjj* , (3.3.2) v(jh)=tr[(∂ΩΩj−1/∂ρh)Ωj]
,
] )
/ (
tr[ * 1 * 1
) *( − ′ − ′ ′
′ = sjj ∂ j ∂ h sjj j
h j j
r T Ω ρ T Ω ,
v(jhh ′ )=tr[(∂ΩΩj−1/∂ρh)Ωj(∂ΩΩj−1/∂ρ ′
h )Ωj], )] / ( ) / (
tr[ * 1 * 1
) *( h j j j s h j j sj h h j j
r ′ ′ = T ′ ∂Ω−′ ∂ρ T ′ ∂Ω− ∂ρ ′ ,
] ) / ( ) / (
tr[ 1 * 1
-1 ) *( j h j sjj h j j h h j
r ′ =ϕ ∂Ω− ∂ρ T ∂Ω− ∂ρ ′ Ω
and νj is the rank of the matrix Zsj. Recollect that Xr arXr
* =
, Zr* =arZr, X*s = ′ Z sWs−1Xs and
s s s s
y* =Z′W−1y . The EBLUP of θ can then be written (3.3.3) θˆ =asys+Xr*βˆ+Z*rTs*(y*s−X*sβˆ).
Let ∆ = [∆ 1′ , ∆ ′ 2,...,∆ ′ n] ′ be the coefficient matrix of (ys*−X*sβˆ) in (3.3.3), i.e., ∆ = Zr*Ts* and, when Zα* is the αth row of the matrix Z*r, put ∂∆∆α/∂γ = ∂(Zα*Ts*) /∂γ = ∇α. The mean
cross-product error matrix of the EBLUP θ ˆ then has G3(ω)=σ2[tr(∇
αΣs*∇ ′ α ′ B)] where
Σs
*= ′ Z
sWs
−1Z
s+ ′ Z sWs
−1ZsΩ ′ Z
sWs
−1Z
s.
For model (1.5) we have ∇α = -(Zα*T
s
*⊗I
J+H)(∂Ω
−1/∂γ)T
s
*
MCPE matrix of the EBLUP θ ˆ is therefore MC ˆ P Eˆ ( ˆ θ )=G1( ˆ ω )+G2( ˆ ω )+2G3( ˆ ω )+G4( ˆ ω ), where ωˆ is the ML estimate of the variance components vector ω.
The preceding development still holds when the variance components in (1.5) are estimated via REML. The only change is that in this case the Fisher information matrix of the REML estimators σˆ2 ,ϕˆ andρˆ
is now of the form
1 2
(n-p)σ-4 σ-2
[ϕ−j1
(νj−rj)] σ-2
[ (−v(jh)+ rj(h)
)
j=1
J
∑
][ϕ−j2{(νj- 2rj)δjj ′ +ϕ−j2ϕ−j ′ 2rjj ′ }] [ϕ−j1(2rj(h)−v(jh)− ϕ−j1ϕ−j 1′ ′
j =1
J
∑
rj(j h′ )][ ({ ϕj
−1ϕ ′
j
−1 rjj ′
(hh ′ ) }+vj
(hh ′ )− 2rj
(hh ′ ) )] ′
j =1
J
∑
j=1
J
∑
⎡
⎣ ⎢ ⎢ ⎢ ⎢
⎤
⎦ ⎥ ⎥ ⎥ ⎥
where νj, vj
(h)
, v(jhh ′ )
are defined in (3.3.2) and rj, rj
(h)
, rj(hh ′ )
, rj(j h′ )
and rj(j h′ h ′ )
are obtained by
simply by deleting the star (*) from all notation used in (3.3.2).
4 Application to Unit Level Linear Mixed Models
In this section we illustrate the application of the preceding theory to a number of commonly used unit level linear mixed models. These are appropriate when individual level data (both for Y and X) are available within the small areas.
4.1 A Model with IID Area Effects and IID Time Effects
A special case of (1.1) is a linear mixed model with independent and identically distributed (iid) area effects and corresponding iid time effects. Note that this corresponds to an assumption of a constant covariance between two observations from the same small area at two different points in time. In term of the general model (1.1), we have two random components, an area effect and a time effect. The model in this special case is
(4.1.1) ydti =β0+ ′ X dtiβt+u1t +u2d +edti.
The u1t and u2d are assumed to be mutually independent and to follow normal distributions
with zero means and respective variances. The edti is a random error, independent of random
components u1t and u2d and is assumed to follow a normal distribution with zero mean and
variance σ2. Let u1 =[u11 ,u12,...u1T]′ , u2=[u21, u22,...u2D] ′ and e=[edti] and let Z1 and Z2 to
be the incidence matrices for the random effect vectors u1 and u2 respectively. The model
(4.1.2) y = Xβ +Z1u1+ Z2u2 + e.
Here Z1=Z1*//Z*2//...//ZD* where // denotes the “stacking” operator, with
Zd
* =
diag(1N
dt;t = 1,L,T), where 1Ndt is a vector of dimension Ndt with all elements equal to
one. Similarly, Z2 =diag(1N
d;d = 1,L,D), where Nd = t=1Ndt
T
∑
. The random vectors u1,u2and e are assumed to be distributed as multivariate normal with zero mean vectors and variance-covariance matrices given by σ12IT, σ2
2
ID and σ2INrespectively, where N is the sum
of the population sizes at t = 1, 2, …, T and the matrices ID, IN and IT are identity matrices
with dimensions equal to D, N and T respectively. In the context of the notation introduced in
section 1, we therefore have J = 2, ϕ1= σ1 2
σ2, ϕ2= σ 2 2
σ2, Ω1=IT, Ω2=ID, W = IN and therefore
Ω = ϕ1IT 0
0 ϕ2ID
⎡ ⎣ ⎢
⎤
⎦ ⎥ and Σ =IN +ϕ1Z1Z 1′ +ϕ2Z2Z ′ 2. Note that there is no parameter ρ. Setting
u=(u 1′ ,u ′ 2) ′ and Z = (Z1, Z2), (4.1.2) further simplifies to
(4.1.3) y = Xβ +Zu + e
with corresponding model for the sample values ys given by
(4.1.4) ys= Xsβ + Zsu + es.
In this case Σs=In+ϕ1Zs1Z ′ s1+ϕ2Zs2Z ′ s2, where n denotes the overall number of sample values of Y (across all times and all areas) and Zsj denotes the sample version of Zj.
The ML and REML estimation algorithms described in sections 3.1 and 3.2 and the MCPE estimator described in section 3.3 can then be used (with these versions of Ω and Σs) to
calculate the EBLUPs of any set of linear combinations of the values in the population vector
y. Note that in both the ML and REML estimation algorithms we ignore the step that updates
ρh since this parameter does not exist in the model (4.1.1). We also have ν1 = T and ν2 = D.
The observed information matrix for the ML estimates of ω =(σ2,ϕ1,ϕ2) ′ is then
ML Iˆ = 1
2 nσ ˆ -4
ˆ
σ -2 [ ˆ ϕ j
−1
(νj−r ˆ j
* )] [ ˆ ϕ j
−2
{(νj- 2 ˆ r j
*
)δjj ′ + ˆ ϕ j
−2ϕ ˆ ′
j
−2ˆ r jj ′
* }]
⎡ ⎣
⎢ ⎤
⎦ ⎥
where a “hat” denotes an ML estimate, δjj ′ =Kroneker delta, r ˆ j
*
= ϕ jˆ -1
tr(T ˆ sj*j ′ ), for j = 1, 2
and ˆ r j*j ′ =tr( ˆ T
sj j′
* T ˆ
sjj ′
* )
. Note that the estimate of the matrix B needed to evaluate the G3 term of
hand corner of the inverse of IˆML above. The estimated MCPE matrix for the ML-based θ ˆ is
therefore MC ˆ P Eˆ ( ˆ θ )=G1( ˆ ω )+G2( ˆ ω )+2G3( ˆ ω )+G4( ˆ ω ), where
G1(ωˆ ) = σ2Zr
*ˆ
T s*Z*r′
G2(ωˆ ) = σ ˆ 2[Xr
*−
Z*rT ˆ s*Z ′ sXs](X ′ sΣ ˆ −s1Xs)−1[X*r′ − ′ X sZsT ˆ s*Z*r′]
G4( ˆ ω )=σ ˆ 2
ara ′ r G3( ˆ ω )=σ ˆ 2[tr( ˆ ∇
αΣ ˆ s
*∇ ˆ ′ ′ α B ˆ )]
ˆ Σ s
*= ′
Z sZs+ ′ Z sZsΩ ′ ˆ Z sZs
ˆ
∇ α= -(Zα*T ˆ s
* ⊗I 2)(∂Ω ˆ
−1/∂γ ˆ ) ˆ T
s
*
∂Ω ˆ −1/∂ˆ γ
= −diag ϕ ˆ 1−2I
T ⊗ 1 0 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ , ˆ ϕ 2
−2I
D⊗ 0 1 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎧
⎨ ⎩
⎫ ⎬ ⎭ .
Here θ =ˆ ( ˆ θ α), ω =ˆ ( ˆ σ 2, ˆ ϕ 1, ˆ ϕ 2) ′ and γ =ˆ ( ˆ ϕ 1, ˆ ϕ 2) ′ . The only change when calculating the
estimate of the MCPE matrix for the REML-based θ ˆ is the estimate of B. This is given by the 2 × 2 submatrix in the bottom right hand corner of the inverse of the estimated information matrix for the REML estimates of the variance components, which itself is defined by
ˆ
I REML= 1
2
(n-p) ˆ σ -4 σ ˆ -2 [ ˆ ϕ −j1
(νj−r ˆ j)] [ ˆ ϕ −j2
{(νj- 2 ˆ r j)δjj ′ + ˆ ϕ −j2ϕ ˆ ′
j
−2ˆ r jj ′ }]
⎡ ⎣
⎢ ⎤
⎦ ⎥
where r ˆ j = ϕ jˆ -1
tr(T ˆ sjj), for j = 1, 2 and r ˆ jj ′ =tr( ˆ T sj j′ T ˆ sjj ′ ).
A common application is to predict the population totals of Y in each of the small areas at time T, in which case θ = ay where a=diag[a ′ d;d=1,2,L,D], with ad the Nd vector with
zeros everywhere except for the last NdT of its elements, which are one.
4.2 A Model with IID Area Effects and Autocorrelated Time Effects
Ω1=
1 1−ρ2 ⎛ ⎝
⎜ ⎞ ⎠ ⎟
1 ρ . . ρT−1
ρ 1 . . .
. . . . .
. . . . ρ
ρT−1
. . ρ 1
⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥
where ρ is the correlation parameter. The inverse of this matrix is Ω1−1=IT +ρ2E+ρF, where
E is a diagonal matrix of dimension T with the diagonal elements of (0,1,1,...1,0) and F is a
T×T matrix with elements on the principle diagonal equal to zero, elements in the diagonals immediately above and below the principle diagonal equal to minus one and all other
elements zero. In this case Ω = ϕ1Ω1 0 0 ϕ2ID
⎡ ⎣ ⎢
⎤
⎦ ⎥ so Σs=In+ϕ1Z1Ω1Z ′ 1+ϕ2Z2Z ′ 2.
ML and REML estimation of the parameters β and ω =(σ2,ϕ1,ϕ2,ρ ′ ) of this model follows along the same lines as in section 4.1 above. The main difference is that in this case H
= 1 (so we drop the h subscript) and we need to update the estimate of ρ in the iterative estimation process. This is done by replacing step 8 in the ML algorithm (see section 3.1) by the identity
ρ = − ϕ1
−1
[tr(Ts*11F)+σ−2u 1′ Fu1]
{(2 /(1−ρ2)+2ϕ1−1[tr(Ts*11E)+σ−2u 1′ Eu1]}.
Step 9 in the corresponding REML algorithm (see section 3.2) also uses this identity, but with Ts*11 replaced by Ts11.
Estimation of the MCPE of the small area estimates requires the Fisher information matrix for the estimated variance components ω =ˆ ( ˆ σ 2, ˆ ϕ
1, ˆ ϕ 2, ˆ ρ ′ )
ˆ
I ML = 1
2
nσ ˆ -4 ˆ
σ −2ˆ
ϕ 1 -1
(ν1−r ˆ 1*) σ ˆ −2ˆ
ϕ 2 -1
(ν2−r ˆ 2*) σ ˆ −2
( ˆ r 1*(1)-v1(1)) ˆ
ϕ 1 -2(ν
1−2 ˆ r 1 *)+ϕ ˆ
1 -4r ˆ
11
* ϕ ˆ
1 -2ϕ ˆ
2 -2r ˆ
12
* ϕ ˆ
1 -1(2 ˆ r
1 *(1)-v
1 (1)−ϕ ˆ
1 -1r ˆ
11 *(1))
ˆ ϕ 2
-2(ν 2−2 ˆ r 2
*)+ϕ ˆ 2 -4r ˆ
22
* −ϕ ˆ
1 -1ϕ ˆ
2 -2r ˆ
12 *(1)
( ˆ ϕ 1-2r ˆ 11
*(11)+v 1 (11)−2 ˆ r
1 *(11)) ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥
where the individual terms are defined in (3.3.2). In order to calculate them we replace
unknown parameter values by ML estimates and make use of the fact that ∂Ωˆ1−1/∂ρˆ =
F E+
ρˆ
2 in this case. The matrix B can be calculated as the 3 × 3 submatrix in the bottom
right hand corner of the inverse of the corresponding estimate of this information matrix. Estimation of the MCPE matrix of the MLE-based EBLUP θ ˆ under this model then follows along exactly the same lines as in section 4.1, the only difference being that we now calculate,
ˆ
∇ α = -(Zα*T ˆ s
*⊗I 3)(∂Ω ˆ
−1/∂γ ˆ ) ˆ T
s
where γ =ˆ ( ˆ ϕ 1, ˆ ϕ 2, ˆ ρ ′ ) , and
∂Ω ˆ −1/∂ˆ γ
= −diag{[( ˆ ϕ 1−2Ω ˆ 1
−1⊗E 1)−ϕ ˆ 1
−1(∂Ω ˆ 1
−1/∂ρ ˆ )⊗E 3], ˆ ϕ 2
−2I
D ⊗E2}
where
E1 =
⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 0 0 1
, E2 =
⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 0 1 0
, E3 =
⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 1 0 0 .
The only change when calculating the corresponding estimate of the MCPE matrix for the REML-based version of θ ˆ is estimation of B. This is calculated as the 3 × 3 submatrix in the bottom right hand corner of the inverse of the estimated Fisher information matrix for the REML estimates of the variance components under this model, given by
ˆ
I REML= 1
2
(n−p) ˆ σ -4 σ ˆ −2ϕ ˆ 1
-1(ν
1−r ˆ 1) σ ˆ
−2ϕ ˆ 2 -1(ν
2−r ˆ 2) σ ˆ
−2( ˆ r 1
(1)-v 1
(1))
ˆ ϕ 1 -2
(ν1−2 ˆ r 1)+ϕ ˆ 1 -4ˆ
r 11 ϕ ˆ 1 -2ϕ ˆ
2 -2ˆ
r 12 ϕ ˆ 1 -1
(2 ˆ r 1 (1)
-v1 (1)−ϕ ˆ
1 -1ˆ
r 11 (1)
) ˆ
ϕ 2 -2
(ν2−2 ˆ r 2)+ϕ ˆ 2-4ˆ
r 22 −ϕ ˆ 1-1ϕ ˆ 2 -2ˆ
r 12(1)
( ˆ ϕ 1-2r ˆ 11(11)+v1(11)−2 ˆ r 1(11)) ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥
where we refer again to (3.3.2) for definitions of the various components of this matrix, and note that this requires removal of all star (*) superscripts in (3.3.2).
4.3 A Model with a Time Varying Area Effect
The AR(1) model in introduced in section 4.2 requires a reasonable number of observations from different times to obtain an efficient estimate of the correlation parameter ρ. If there are only data available for a few time periods then the ML and REML estimates of this correlation parameter will be negatively biased. Furthermore, the model assumes that the area effects do not themselves evolve over time, with the only change over time arising because of the evolving (global) time effect. An alternative AR(1) model that uses more information and reduces biases and allows area random effects to vary over time is therefore of interest. Such a model is specified by
(4.3.1) ydti =β0+ ′ X dtiβt+udt+edti
where the {udt} are random effects that follow independent AR(1) processes for d = 1, 2,…,D.
As usual, the area level random effects udt and the individual level random effects edti are
e=[edti] and let Z to be the incidence matrix for the random effect vector u. The model (4.3.1) can then be written in matrix form as
(4.3.2) y = Xβ +Zu + e.
The random vectors u and e are distributed as independent multivariate normal with zero mean vectors and covariance matrices given by σu2Ω1 and σ2IN respectively. Assuming that
the random effects for the same area and different points in time can be modelled as a realisation of an AR(1) process, and the same process applies in all areas, the matrix Ω1 is then
Ω1=
1 1−ρ2 ⎛ ⎝
⎜ ⎞ ⎠ ⎟ ID ⊗
1 ρ . . ρT−1
ρ 1 . . .
. . . . .
. . . . ρ
ρT−1
. . ρ 1
⎡
⎣ ⎢ ⎢ ⎢ ⎢
⎤
⎦ ⎥ ⎥ ⎥ ⎥
where ρ is the (common) autocorrelation parameter. The matrix Ω1 is a block-diagonal matrix with Ω1−1=diag(IT +ρ2E+ρF), where E and F were defined in section 4.2. The variance-covariance of y is then σ2IN +σu2ZΩ1Z = ′ σ2Σ where Σ =IN+ZΩ ′ Z , Ω =ϕΩ1 and
ϕ = σu
2
/σ2. The corresponding model for the sample values is therefore (4.3.3) ys= Xsβ + Zsu + es
with the covariance of ys given by σ2Σs where Σs=In+ZsΩ ′ Z s. Here n is the total sample
size.
Again, we observe that ML and REML estimation of the parameters β and
ω =(σ2,ϕ,ρ ′ )
of this model follow the same lines as in section 4.1. In this case we update the estimate of ρ in the iterative estimation process by replacing step 8 in the ML algorithm (see section 3.1) by the identity
ρ = − ϕ−1[tr(Ts
*
F*)+σ−2u F′ *u]
{(2D/(1−ρ2)+2ϕ−1[tr(Ts*E*)+σ−2u E′ *u]}.
Here E* and F* are block-diagonal matrices of order D made up of blocks E and F
respectively. Step 9 in the corresponding REML algorithm (see section 3.2) also uses this identity, but with Ts* replaced by Ts, where Ts = Ts
*+
Turning now to estimation of the MCPE of the small area estimates, we note that the estimated Fisher information matrix for the MLEs of the variance components ω =ˆ ( ˆ σ 2, ˆ ϕ , ˆ ρ ′ )
in this case is
ML Iˆ = 1
2
nσ ˆ -4 σ ˆ −2ϕ ˆ −1(ν −r ˆ 1
*) σ ˆ −2( ˆ r 1
*(1)-v 1 (1))
ˆ
ϕ −2(ν −2 ˆ r 1
*)+ϕ ˆ −4r ˆ 11
* ϕ ˆ −1(2 ˆ r 1
*(1)-v 1
(1)−ϕ ˆ −1r ˆ 11
*(1))
( ˆ ϕ −2r ˆ 11
*(11)+v 1
(11)−2 ˆ r 1 *(11)) ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥
where (3.3.2) provides definitions of the individual terms. This matrix can be evaluated by replacing unknown parameter values by ML estimates and making use of the fact that
∂Ω ˆ
1 −1/∂ρ ˆ
= 2 ˆ ρ E*+F*
in this case. As usual, we estimate of B by the 2 × 2 submatrix in the
bottom right hand corner of the inverse of this estimated information matrix, with estimation of the MCPE matrix of the MLE-based θ ˆ under this model then as in section 4.1, except that
ˆ
∇ α = -(Zα*T ˆ s
*⊗I 2)(∂Ω ˆ
−1/∂γ ˆ ) ˆ T
s
*
where γ =ˆ ( ˆ ϕ , ˆ ρ ′ ) , and
∂Ω ˆ −1/∂ˆ γ
= −diag ϕ ˆ −2Ω ˆ 1 −1⊗ 1
0 ⎡
⎣ ⎢ ⎤ ⎦ ⎥ −ϕ ˆ −1(∂Ω ˆ 1
−1/∂ρ ˆ )⊗ 0
1 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎧ ⎨ ⎩ ⎫ ⎬ ⎭ .
The corresponding estimate of the MCPE of the REML-based EBLUP θ ˆ differs only in computation of the estimate of the matrix B. Here this is obtained as the 2 × 2 submatrix in the bottom right hand corner of the inverse of the estimated Fisher information matrix for the REML estimates of the variance components, which is given by
ˆ
I REML= 1
2
(n−p) ˆ σ -4 σ ˆ −2ˆ
ϕ −1
(ν −r ˆ 1) σ ˆ −2
( ˆ r 1(1)-v1(1)) ˆ
ϕ −2
(ν −2 ˆ r 1)+ϕ ˆ −4ˆ
r 11 ϕ ˆ −1
(2 ˆ r 1(1)-v1(1)−ϕ ˆ −1ˆ
r 11(1)) ( ˆ ϕ −2r ˆ 11(11)+v1(11)−2 ˆ r 1(11)) ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ .
As usual, (3.3.2) provides definitions of individual terms, after star (*) superscripts used there are discarded.
4.4 A Model with Spatial Correlated Area Effects
points in this case, we drop the ‘t’ subscript and assume all our data relate to a single time point. Generalization to the models with both spatially and temporally correlated random effects is straightforward.
Let ydi denote the ith population value for a characteristic of interest within an area d (i =
1,2,…Ni; d = 1,2,…D). The vector xdi represents the corresponding values of auxiliary
population information (covariates). The objective is to estimate/predict the value of the small area characteristic θ which is a linear function of the population values ydi. Let ud be the dth
area effect, ydi be the population response variable. The model of interest is then
(4.4.1) ydi = β0 +x ′ diβ+ ud + edi
where β0 is an intercept, β is a vector of regression coefficients, the edi are independent
random errors with E(edi) = 0 and Var(edi) = σ2 and the ud are normally distributed variables
with zero mean and covariances given by (4.4.2) Cov(ud,ud ′ )=σu2f(Dist(d,d ′ );ρ)
where ρ is an unknown parameter and Dist(d,d ′ )) is an appropriate measure of the “distance” between areas d and d′.
Let y = {ydi} be the vector of population values of the response variable, with ys
denoting the values observed in the sample, X ={xdi} be the matrix of regression variables
(covariates), with Xs denoting the corresponding sample values, and u and e the random area
effect and error vectors respectively. Let Z be the incidence matrix for the random effect vector u. The population model (4.4.1) can then be written
(4.4.3) y = Xβ +Zu + e.
The error vector e and area effect vector u have independent multivariate normal distributions with zero mean vectors and covariance matrices given σ2INand σu2Ω1 respectively, with the
matrix Ω1 reflecting the spatial autocorrelation of the area effects. For example, this is achieved via a model of the form
Ω1 = [1+δdd ′ exp(Dist(d, d )′
ρ )]−
1
⎡ ⎣ ⎢
⎤ ⎦ ⎥
where ρ is an unknown parameter and δdd ′ is zero for d = d ′ and 1 otherwise.The model for
the sample vector ys is
where the incidence matrix Zs=
1n
1 . . 0
0 . . 0
0 . . 0
0 . . 1n
D ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥
where 1n
d is a vector of dimension nd with all
elements equal to one. It follows that the covariance matrix of ys is then σ2Σs where
Σs=In+ZsΩ ′ Z s, Ω =ϕΩ1 and ϕ = σu2/σ2.
Given this set-up we are back with the model (1.1) and so can apply the ML and REML estimation theory set out in section 3 to prediction of θ. In this context we observe that both the ML estimation algorithm described in section 3.1 and the REML estimation algorithm described in section 3.2 still apply, with the updating step 8 in the ML algorithm (step 9 in the REML algorithm) replaced by:
Put ρnew=ρold +θ(∂l/∂ρold), where
∂l/∂ρ = −(1/2)[(tr(Ω1−1(∂ΩΩ1/∂ρ)))−(ϕ−1tr(Ω1−1Ts*Ω1−1(∂ΩΩ1/∂ρ))) −σ−2ϕ−1u ˆ ′ Ω
1 −1(∂ΩΩ
1/∂ρ)Ω1 −1u ˆ ]
and θ is the (3,3) element of the inverse of the information matrix of the ML/REML estimate of the variance components ω =ˆ ( ˆ σ 2, ˆ ϕ , ˆ ρ ′ ) .
Note that for the spatial correlation model defined above, we have
∂Ω1/∂ρ =
Dist(d,d ′ )
ρ2 δdd ′ exp
Dist(d,d ′ ) ρ ⎛ ⎝
⎜ ⎞ ⎠ ⎟ 1+δdd ′ exp Dist(d,d ′ ) ρ ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ −2 ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ .
For estimation of the MCPE matrix in this case, we observe that the estimated Fisher information matrix of the ML estimates of the variance components ω =ˆ ( ˆ σ 2, ˆ ϕ , ˆ ρ ′ ) is
ˆ
I ML= 1
2
nσ ˆ -4 σ ˆ −2ϕ ˆ −1(ν −r ˆ 1
*) σ ˆ −2( ˆ r 1
*(1)-v 1 (1))
ˆ
ϕ −2(ν −2 ˆ r 1
*)+ϕ ˆ −4r ˆ 11
* ϕ ˆ −1(2 ˆ r 1
*(1)-v 1
(1)−ϕ ˆ −1r ˆ 11
*(1))
( ˆ ϕ −2r ˆ 11
*(11)+v 1
(11)−2 ˆ r 1 *(11)) ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥
while that of the REML estimates is
ˆ
I REML= 1
2
(n−p) ˆ σ -4 σ ˆ −2ϕ ˆ −1(ν −r ˆ
1) σ ˆ
−2( ˆ r 1
(1)-v 1
(1))
ˆ
ϕ −2(ν −2 ˆ r 1)+ϕ ˆ
−4r ˆ 11 ϕ ˆ
−1(2 ˆ r 1
(1)-v 1
(1)−ϕ ˆ −1r ˆ 11
(1))
( ˆ ϕ −2r ˆ 11
(11)+v 1
(11)−2 ˆ r 1 (11)) ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ .
In both cases see (3.3.2) for the definitions of the various components, remembering that in the REML case we need to discard the star (*) superscripts. Note also that evaluation of these components depends on ∂Ω ˆ 1−1/∂ρ ˆ and is therefore dependent on the actual spatial correlation
of the matrix B defined by the 2 × 2 submatrix in the bottom right hand corner of the inverse of the estimated information matrix for either the ML or REML estimates, we can then estimate the MCPE matrix of the EBLUP θ ˆ following the same steps as outlined in section 4.3 above.
5 Application to Area Level Linear Mixed Models
There are many applications of small area estimation where the survey information is available at area level rather that at individual level. For example, survey-based estimates of small area averages (the so-called direct estimates) are available, but individual level data are not. The theory developed in sections 1 – 3 can be adapted to this situation provided it is reasonable to assume these direct estimates can also be modelled linearly. We now illustrate this via application of area level versions of the models considered in section 4.
5.1 A Model with IID Area and IID Time Effects
Let the vector y = {ydt; t = 1,2,…, T; d = 1,2,…,D} consist of the direct survey estimates of
the survey variable Y. The subscripts d and t (t = 1,2,…,T; d = 1,2,…,D) represent area and time respectively. Let X be the matrix of the auxiliary covariates Xdt, all measured at area level. Assuming that u1t and u2d represent time and area effects respectively, let ηdt denote the
mean of Y in small area d at time t (i.e. after conditioning on these effects). The objective then is to predict the value of θ=aη, where η = (ηdt), given the following model
(5.1.1) ηdt =β0+ ′ X dtβt+u1t+u2d.
The vector βt contains regression coefficients at time t = 1,2, .., T, β0 is the intercept and the
random effects u1t and u2d are assumed to be mutually independent and normally distributed
with zero means and variances as defined below. To illustrate, let T = 2 and D = 3. The a
matrix that defines the small area means at last time period (t = 2) is then
a =
0 1 0 0 0 0
0 0 0 1 0 0
0 0 0 0 0 1
⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥ .
The linear predictor ηdt is related to direct estimator ydtvia following model
where edtrepresents sampling error. This is often referred to as a Fay-Herriot-type model. Let
u1=[u11, u12,...u1T] ′ , u2 =[u21, u22,...u1D] ′ and e=[edt] and let Z1 and Z2 be the incidence
matrices for the vectors u1 and u2 respectively. Put Z ′ 1=1D ⊗IT =(Z ′ 1 *, Z ′
2 *,...,Z ′
D
*)
with
Zd* =I
T for d=1,L,D) and Z2 =ID⊗1T =diag(1T; d=1,L,D) where 1T is a vector of
dimension T with all elements equal to one and ID and IT are identity matrices of order D and
T respectively. The model (5.1.2) can then be written in matrix form as (5.1.3) y = Xβ +Z1u1+ Z2u2 + e.
The random vectors u1,u2 and e are assumed to be mutually independent with covariance
matrices given by σ12IT, σ2 2
ID and σ2W respectively. The matrix W is a known positive
definte square matrix of order n = T× D. Typically W is a diagonal matrix with elements that are functions of the sample sizes ndt. Here ndt denotes the sample size in small area d at time t.
The covariance matrix of y is then Var(y)=σ2W+σ 1
2Z
1Z 1′ +σ2 2Z
2Z ′ 2. Setting u=(u ′ 1,u ′ 2) ′ and
Z = (Z1, Z2), the model (5.1.3) further simplifies to
(5.1.4) y = Xβ +Zu + e
in which case Var(y) = σ2Σ where Σ =W+ϕ1Z1Z 1′ +ϕ2Z2Z ′ 2 and ϕj =σ2j/σ2.
ML and REML estimation of the parameters of (5.1.4), as well as calculation of the corresponding EBLUP and its estimated MCPE matrix, then follows along exactly the same lines as in section 4.1. The only difference is that we replace ys and Σs there by y and Σ here.
Note that this model can only be fitted if D > 1 and T > 1. If either of these conditions are not met, then σ2 is not identifiable. In such a case we need to estimate this parameter using other methods and then substitute this estimate in the preceding development.
5.2 A Model with IID Area and Autocorrelated Time Effects
Here we extend the model in the previous section to allow the time effect to be the outcome of a stochastic process. In particular, we assume that this process is AR(1). The model for the observed direct estimates is therefore
(5.2.1) ydt =ηdt+edt
where ηdt =β0+ ′ X dtβt+u1t +u2d a n d edt is sampling error. The vector βt contains the
regression coefficients for time t = 1, 2, .., T, β0 is the intercept and the random effects u1t and