Small Area Estimation Under Linear and Generalized Linear Mixed Models With Time and Area Effects

(1)

Small Area Estimation Under Linear and Generalized

Linear Mixed Models With Time and Area Effects

Ayoub Saei, Ray Chambers

Abstract

This is the theory component of the report on small area estimation theory that was prepared as part of Southampton’s involvement in the EURAREA “Enhancing small area estimation techniques to meet European Needs” project IST 2000-26290 in the European Union’s Fifth Research And Technological Development Framework Programme.

(2)

Small Area Estimation Under Linear and Generalized

Linear Mixed Models With Time and Area Effects

Ayoub Saei and Ray Chambers

Southampton Statistical Sciences Research Institute

University of Southampton

Highfield, Southampton

SO17 1BJ

United Kingdom

September 2003

Abstract

This is the theory component of the report on small area estimation theory that

was prepared as part of Southampton’s involvement in the EURAREA

“Enhancing small area estimation techniques to meet European Needs” project

IST 2000-26290 in the European Union’s Fifth Research And Technological

(3)

1 Estimation Under a Linear Mixed Model With Time and Area Effects

Let the vector y = {ydti} consist of the population values of the survey variable of interest Y.

The subscripts d, t and i (d = 1,2,…,D; t = 1,2,…,T; i = 1,2,…,Ndt) represent area, time and

unit respectively. Let X be the matrix of population values of the auxiliary covariates Xdti.

Assuming that the

) (dt j

u , j = 1, 2, …, J represents random effects that are related to area and

time, the objective is to estimate/predict the value of the vector θ =ay for a given matrix a. Initially we assume that the values in y follow a linear mixed model with random area and time effects of the form:

(1.1) y_dti =β₀+ ′ X dtiβt+ u_j₍_dt₎

j₌1

J

∑

+e_dti.

The vector βt contains regression coefficients at time t = 1, 2, .. T, β0 is the intercept, the

random effects {uj(dt)}and {edti} are assumed to be mutually independent and to follow normal

distributions with zero means and respective variances. The (dt) in uj(dt) indicates that these

random effects are related to the specific area d and the specific time t. Linear models with spatially correlated area effects and linear models with independent and autocorrelated time effects are special cases of model (1.1) with J = 1 and J = 2 respectively. Put

] ,... ,

[ ₁′ ′₂ ′ ′

= u u uJ

u , and e=[e_dti] and let Zjbe the “incidence” matrix for the random effects

vector uj. The model (1.1) can then be written in matrix form as

(1.2) y = Xβ +Zu + e

where Z =[Z1, Z2,…,ZJ]. The random vectors uj are assumed to be distributed as multivariate

normal with zero mean vectors and variance-covariance matrices σj

2_Ω

j(ρ), where ρ is a vector

of correlation/covariance parameters. The vector e is a normal error vector with zero mean and variance σ2W, where W is a known square matrix of order N and Ωj(ρ) is a matrix of

order equal to the rank of the matrix Zj. The covariance matrix of y is then

σ2₍_W₊ _ϕ

jZjΩjZ ′ j) j=1

J

∑

=σ2₍_W₊_Z_{Ω ′}_Z₎₌_σ2_Σ

where ϕ_j =σ2_j /σ2and Ω = diag(ϕjΩj(ρρ)).

(4)

These subscripts will be used below to denote conformable partitions of other vectors and matrices. Thus, the model (1.2) is partitioned conformably as

(1.3) _⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + + + = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ r r r r r r r s r s e e e u Z X e u Z X e e u Z Z X X y y r s s s s s s s η η β β β

where X = [XsXr], Z = [ZsZr], ηs=Xsβ +Zsuand ηr =Xrβ +Zru. Similarly, the matrix a is

partitioned conformably as a = [asar]. The vector-valued parameter of interest θ =ay can

then be written

(1.4) θ =a_sy_s+a_ry_r.

The first term in (1.4) depends only on the sample values and is known after the sample is observed. The second term, which depends on the non-sample values, is unknown. The estimate or predicted value of θ, say θ ˆ , is then obtained by replacing yr with its predicted

value in (1.4). Let E(yr|u) = ηr = Xrβ +Zru. The predictor of yr is then η ˆ r = Xrβˆ+Zruˆ,

where the estimates β ˆ and u ˆ are obtained by fitting the linear mixed model (1.5) ys = Xsβ + Zsu + es

to the sample data. This leads to the predicted value

(1.6) θ =ˆ a_sy_s+a_ry ˆ _r =a_sy_s+a_rη ˆ _r=a_sy_s+a_r(X_rβ +ˆ Z_ru ˆ ).

2 Best Linear Unbiased Prediction (BLUP)

The linear mixed model (1.5) characterises variation between areas and variation over time in the values of the characteristic Y. The aim is to predict or estimate θ =ay using the sample values ys. For known variance components in the model (1.5), the BLUP method provides a

predictor of θ that is an unbiased linear function of the sample values ys and has minimum

variance among all other linear unbiased predictors/estimators of θ. The method requires that the covariance between the observable random variable ys and the non-observable random

variable u be known. It does not impose a normality assumption on the random effect u. Under (1.5) the covariance between u and ys is σ2Z ′ sΩ. In this case, following Henderson

(1963), the best linear unbiased predictor (BLUP) of ηr is

(2.1) η ˜ _r = X_rβ +ˆ Z_ru ˆ

(5)

(2.2) β =ˆ (X ′ sΣs−1_X

s)

−1_X′ _sΣs−1_y

s and ˆ u = ΩΩ ′ Z sΣs

−1₍_y

s−Xsβ ˆ ).

Note that β ˆ is Aitken's generalized least squares (GLS) estimator of β under (1.5). Replacing

uˆ in (2.1) by its value from (2.2), and ηˆ_r by η~_r in the right hand side of (1.6) gives (2.3) θ =˜ a_sy_s+a_rη ˜ _r =a_sy_s+a_rX_rβ +ˆ a_rZrΩ ′ Z sΣs−1₍_y

s−Xsβ ˆ ).

An alternative form of (2.3) which is useful in computation can be obtained from following theorem.

Theorem: Put T_s* ₌₍_Ω−1_{+ ′}_Z

sWs

−1_Z

s)

−1₌_[_T

sjj ′

* _]

and Σs=Ws+ZsΩ ′ Z s. Then

a) Σs−1=W_s−1₋_W

s

−1_Z

sTs

*_Z′

sWs

−1

b) Z ′ sΣs−1_{= Ω}_Ω−1_T

s

*_Z′

sWs

−1

c) Ω ′ Z sΣs−1₌_T

s

*_Z′

sWs

−1

d) Z ′ sΣs−1_Z

s= ΩΩ

−1_{− Ω}_Ω−1_T

s

*_Ω−1

e) Z ′ _sjΣs−1_Z

sj ′ = ϕj

−1_Ωj−1₋_ϕ

j

−2_Ωj−1

Tsjj

*_Ωj−1

for j = j ′

−ϕj

−1_ϕ ′

j

−1_Ωj−1

T_sj*_j_′Ω−_j_′1 otherwise ⎧

⎨ ⎩

Using the above Theorem an alternative formulation of (2.3) is (2.4) θ =˜ a_sy_s+a_rX_rβ +ˆ a_rZ_rT_s*_Z′

sWs

−1₍_y

s−Xsβ ˆ )=asys+ary ˆ r.

The mean cross-product error (MCPE) matrix of the BLUP θ ˜ is obtained by writing

˜

θ − θθ =a_r( ˆ y _r−y_r)=a_r( ˜ η _r−y_r)=a_r( ˜ η r−ηηr+ηηr−y_r)=a_r[( ˜ η _r−ηηr)−(y_r−ηr)]. Let X*_r=a_rX_r

and Z_r*=a_rZ_r. The mean cross-product error matrix of BLUP estimator θ ˜ is then

(2.5)

MCPE( ˜ θ )=E[( ˜ θ − θθ)( ˜ θ − θθ ′ ) ]= X*_r_Z

r

*

(

)

E ( ˆ β −ββ)

( ˆ u −u) ⎡ ⎣ ⎢

⎤ ⎦ ⎥

( ˆ β −ββ) ( ˆ u −u) ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ′ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟

X*_r′ Zr *′ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟

⎟ + σ2arWra ′ r

=σ2_[_X

r

*₋ _Z

r

*_T

s

*_Z′

sWs

−1_X

s](X ′ sΣs

−1_X

s)

−1_[_X

r

*′ − ′ _X

sWs

−1_Z

sTs

*_Z

r

*′_]₊_σ2_a

rWra ′ r =G₁(ω)+G₂(ω)+G₄(ω)

where

G1(ω) = σ2Zr

*

T_s*Z*_r′,

G2(ω) = σ2[Xr

*₋

Z*_rT_s*Z ′ _sW_s−1X_s](X ′ sΣs−1X_s)−1[X*_r′ − ′ X _sW_s−1Z_sT_s*Z_r*′], G₄ =σ2a_rW_ra ′ _r

(6)

3 Empirical Best Linear Unbiased Prediction (EBLUP)

The BLUP development set out in the preceding section assumes variance components are known. In practice of course, this is hardly ever the case. We therefore need to estimate these parameters from the sample data. The empirical best linear unbiased prediction (EBLUP) method replaces them by sample-based estimates in the BLUP.

The two methods most frequently used to estimate variance components are maximum likelihood (ML) and residual maximum likelihood (REML). Kackar and Harville (1981) showed that the ML and REML variance component estimators are translation invariant and even functions of the data. In this section we derive the estimating equations for the variance component vector, ω =(σ2,ϕ ′ , ρ ′ ′ ) , under ML and REML.

3.1 Maximum Likelihood Estimation of the Variance Components (ML)

Maximum likelihood estimation of the regression parameter vector β and the variance components vector ω requires parametric specification of the distribution of the random variable ys. A standard assumption is that this variable has a multivariate normal distribution.

Given that assumption, we can write down the log-likelihood function generated by the observation vector ys under the general linear mixed model (1.5) as

l= −(1/2)[nln(2πσ2₎₊_{ln |}_Σs−1_|_+σ−2₍_y

s−Xsβ ′ ) Σs

−1₍_y

s−Xsβ)].

Differentiation of this log-likelihood function with respect to the parameters of this distribution leads to the ML score functions

(3.1.1) ∂l/∂β = σ−2_X′ _sΣs−1₍_y

s−Xsβ)

(3.1.2) ∂l/∂σ2_{= −}_(1/2)[_n_σ−2₋_σ−4₍_y

s−Xsβ ′ ) Σs

−1₍_y

s−Xsβ)]

(3.1.3) ∂l/∂ϕj = −(1/2)[tr(Σs

−1

ZsjΩjZ ′ _sj) −σ−2

(y_s−Xsβ ′ ) Σs−1ZsjΩjZ ′ _sjΣs−1(y_s−Xsβ)] for j = 1, 2,…

(3.1.4) ∂l/∂ρh = −(1/2)[tr(Σs

−1_Z

s(∂ΩΩ/∂ρh)Z ′ s) −σ−2

(y_s−Xsβ ′ ) Σs−1Z_s(∂ΩΩ/∂ρh)Z ′ sΣs−1(y_s−Xsβ)] for h = 1, 2,…

Equating (3.1.1) to zero yields the ML estimating equations for β. For fixed variance components vector ω, it is clear that the MLE for this parameter is then just its GLS estimator

ˆ

β ML =(X ′ sΣs

−1

(7)

Equating the score functions in (3.1.2) – (3.1.4) to zero yield the ML estimating equations for

h

j ρ

ϕ

σ2 , and . For fixed ϕ_j andρ_h, equating (3.1.2) to zero gives the ML estimate of σ2 as

(3.1.3) σ ˆ _ML2 ₌

n−1y ′ sΣs−1(y_s−X_sβ ˆ )=n−1y ′ _sW_s−1(y_s−X_sβ −ˆ Z_su ˆ )

where uˆ is given by (2.2) in the previous section.

The estimating equations for the variance components ϕj and ρh have no analytic

solution, and so have to be solved numerically. Various methods for computing ML estimates of ω have been used in literature. Henderson (1963, 1973, 1975), Harville (1977) and Fellner (1986, 1987) have proposed iterative procedures for calculating ML estimates under different normal variance components models. We adapt these methods in the following iterative algorithm for calculating the ML estimates of ϕj, σ2 and ρh.

1. Assign initial values to the variance components ϕj, ρh and σ2.

2. Using the current values for these variance components, calculate Σs and Ω. 3. Update β =(X ′ sΣs−1X_s)−1X ′ sΣs−1y_s.

4. Update u= ΩΩ ′ Z sΣs−1(y_s−Xsβ).

5. Update σ2=n−1y ′ _sW_s−1(y_s−Xsβ −Z_su). 6. Calculate T_s* =(Ω−1+ ′ Z _sW_s−1Z_s)−1=[T_sj*_j_′].

7. Update ϕ_j =ν−_j1(tr(T_sjj*Ωj−1)+σ−2u ′ jΩj−1u_j) where νj is the rank of the matrix Zsj.

8 . Update ρh = f_h(ρ,ϕ,T_s*,σ2,u) where fh is a known function whose specification

depends on the parameterization of Ω(ρ), and current values for variance components are used in the right hand side of this equation.

9. Return to step 2 and repeat the procedure until the values of the different parameters converge.

At convergence the MLE-based empirical best linear unbiased estimate (EBLUP) of θ i s calculatedby substituting the converged values of β and u above as the corresponding estimates in (2.1) and then computing (2.3).

3.2. Residual Maximum Likelihood Estimation of the Variance Components (REML)

(8)

transforming ys into two independent vectors y1 = K1ys and y2 = K2ys. The y1 vector has a

distribution that does not depend on the fixed effect β and hence satisfies E(K1ys) = 0, i.e.

K1X = 0, while the y2 vector is independent of y1 and satisfies K1ΣsK ′ 2 = 0. The matrix K1 is

chosen to have maximum rank, i.e. n-p, and so the rank of K2 is p. The likelihood function of

ysis then the product of the likelihoods generated by y1 and y2. Following Patterson and

Thompson (1971), the REML estimators of the variance components are then the maximum likelihood estimators of these parameters based on y1. That is, the REML method estimates

the variance components vector ω by maximising the log-likelihood function

l_REML = −(1/2)[(n−p)ln2πσ2₊ _K

1ΣsK1 +σ −2_y′

sK1(K1ΣsK1) −1_K

1ys]

where K₁=W_s−1−W_s−1X_s(X ′ _sW_s−1X_s)−1X ′ _sW_s−1 for the model (1.5). Note that if K₁ΣsK₁ is not of full rank, then |K₁ΣsK₁| must be interpreted as the determinant of its linearly independent rows and columns. Given this definition of K1, the matrix K2 is defined as K2 = ′ X sΣs

−1

. The log-likelihood function defined by y2 = K2 ys is

l_L = −(1/2)[pln2πσ2+lnK₂ΣsK ′ ₂ +σ−2(y₂−E(y₂)) (′ K₂ΣsK ′ ₂)−1(y₂−E(y₂))]

= −(1/2)[pln2πσ2+lnX ′ sΣs−1X_s +σ−2(y_s−Xsβ ′ ) Σs−1X_s(X ′ sΣs−1X_s)−1XsΣs−1(y_s−Xsβ)].

For given values of the variance components, β is then estimated by maximizing lL, leading to

ˆ

β =(X ′ sΣs−1X_s)−1X ′ sΣs−1y_s.

Differentiation of l_REML with respect to the variance components yields the REML estimating equations

(3.2.1)

∂l_REML/∂σ2 = −(1/2)[(n−p)σ−2−σ−4y ′ _sK₁(K₁ΣsK₁)−1K₁y_s] ∂l_REML/∂ϕ_j = −(1/2)[{tr(K_j_ϕ)−σ−2_y′

sKjϕK1(K1ΣsK1) −1_K

1ys] ∂l_REML/∂ρh = −(1/2)[{tr(K_h_ρ)−σ−2y ′ _sK_h_ρK₁(K₁ΣsK₁)−1K₁y_s]

where K_j_ϕ =K₁(K₁ΣsK₁)−1K₁∂Σs/∂ϕ_j and K_h_ρ =K₁(K₁ΣsK₁)−1K₁∂Σs/∂ρh. For given values

of ϕj and ρh, equating the REML score function for σ2 in (3.2.1) to zero leads to a REML

estimate of this parameter that satisfies

(3.2.2)

ˆ σ REML

2 ₌₍_n₋_p₎-1_y′

sK1(K1ΣsK1) −1_K

1ys =(n−p)-1y ′ sΣs−1(y_s−X_sβ ˆ ) =(n−p)-1_y′

sWs

−1₍_y

s−Xsβ −ˆ Zsu ˆ ).

Equating the REML score functions for ϕj and ρh in (3.2.1) to zero lead to non-linear

(9)

(1986, 1987) we define an iterative algorithm to calculate the REML estimates of the variance components in (1.5). It is identical to the algorithm for calculating the ML estimates defined earlier when one replaces T_s* there by Ts below.

1. Assign initial values to the variance components ϕj, ρh and σ2.

2. Using the current values for these variance components, calculate Σs and Ω 3. Update β =(X ′ sΣs−1X_s)−1X ′ sΣs−1y_s.

4. Update u= ΩΩ ′ Z sΣs−1(y_s−Xsβ).

5. Update σ2=(n−p)−1y ′ _sW_s−1(y_s−Xsβ −Z_su). 6. Calculate T_s* =(Ω−1+ ′ Z _sW_s−1Z_s)−1=[T_sj*_j_′].

7. Calculate Ts = [Tsjj′] =

* 1

1 *

*

)

( _s _s _s _s _s _s

s s s

s TZ X X X X Z T

T + ′ ′Σ− − ′ .

8. Update ϕ_j =ν−_j1(tr(TsjjΩj−1)+σ−2u ′ jΩj−1u_j) where νj is the rank of the matrix Zsj.

9 . Update ρh = f_h(ρ,ϕ,T_s,σ2,u) where fh is a known function whose specification

depends on the parameterization of Ω(ρ), and current values for variance components are used in the right hand side of this equation.

10.Return to step 2 and repeat the procedure until the values of the different parameters converge.

As with the MLE, the REML-based empirical best linear unbiased estimate (EBLUP) of θ is then calculatedby substituting the converged values of β and u above as the corresponding estimates in (2.1) and then computing (2.3).

3.3 Estimating the Mean Squared Error of the EBLUP

To start, we assume that the variance components are estimated via ML. The prediction error

of θˆ is θ − θθ =ˆ a_r( ˆ y _r−y_r)=a_r( ˆ η _r−y_r)=a_r( ˆ η r−ηηr+ηηr−y_r)=a_r[( ˆ η _r−ηηr)−(y_r−ηr)]. T h e

mean cross-product error matrix is therefore MCPE(θˆ ) = E[( ˆ θ − θθ)( ˆ θ − θθ ′ ) ]. After some algebra, it can be shown that this matrix is

(3.3.1) MCPE( ˆ θ )≅MCPE( ˜ θ )+G₃(ω)=G₁(ω)+G₂(ω)+2G₃(ω)+G₄(ω)

where G1(ω), G2(ω) and G4(ω) are given in section 2, and G3(ω) is the square matrix of the

(10)

of the ML estimates of the variance components γ =( ˆ ϕ ′ , ˆ ρ ′ ′ ) and, taking θ ˜ _α to be αth element

of the BLUP θ ˜ , G_{α ′}_α =Cov(∂ ˜ θ α

∂γγ , ∂_θ˜

′ α

∂γγ ). Note that B is defined by the “γ-component” of the

inverse of the Fisher information matrix for the variance components ω. Under the model (1.5) this information matrix is

1 2

nσ-4 σ-2[ϕj

−1

(νj−rj

*

)] σ-2[ (−vj

(h)

+rj

*(h) )

j=1

J

∑

]

[ϕj

−2

{(νj- 2rj

*

)δjj ′ +ϕj

−2_ϕ ′

j

−2 rjj ′

*

}] [ϕj

−1 (2rj

*(h)₋ vj

(h)₋ _ϕ

j

−1_ϕ ′

j

−1 ′

j =1

J

∑

rjj ′

*(h) ]

[ ({ ϕ−_j1_ϕ

′

j

−1 r_j*(_j_′hh ′ )

}+v(_jhh ′ )₋ 2r_j*(hh ′ )

)] ′

j =1

J

∑

j=1

J

∑

⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥

where δij is a Kronecker delta,

* j

r = ϕ-1_j tr(Ω−_j1T_sjj* ),

r_j*_j_′₌_tr(_T

sj j′

* _Ωj−1_T

sjj ′

* _Ω ′

j

−1₎

,

) *(h j

r = tr[ϕ−_j1(∂Ω−_j1/∂ρ_h)T_sjj* , (3.3.2) v(_jh)₌_tr[(_∂Ω_Ωj−1_/_∂ρh₎_Ωj_]

,

] )

/ (

tr[ * 1 * 1

) *( − ′ − ′ ′

′ = sjj ∂ j ∂ h sjj j

h j j

r T Ω ρ T Ω ,

v(_jhh ′ )₌_tr[(_∂Ω_Ωj−1_/_∂ρh₎_Ωj₍_∂Ω_Ωj−1_/_∂ρ ′

h )Ωj], )] / ( ) / (

tr[ * 1 * 1

) *( h j j j s h j j sj h h j j

r _′ ′ = T _′ ∂Ω−_′ ∂ρ T _′ ∂Ω− ∂ρ _′ ,

] ) / ( ) / (

tr[ 1 * 1

-1 ) *( j h j sjj h j j h h j

r ′ =ϕ ∂Ω− ∂ρ T ∂Ω− ∂ρ _′ Ω

and νj is the rank of the matrix Zsj. Recollect that Xr arXr

* ₌

, Z_r* =a_rZ_r, X*_s = ′ Z _sW_s−1X_s and

s s s s

y* =Z′W−1y . The EBLUP of θ can then be written (3.3.3) θˆ =a_sy_s+X_r*βˆ+Z*_rT_s*(y*_s−X*_sβˆ).

Let ∆ = [∆ ₁′ , ∆ ′ ₂,...,∆ ′ _n] ′ be the coefficient matrix of (y_s*−X*_sβˆ) in (3.3.3), i.e., ∆ = Z_r*T_s* and, when Z_α* is the αth row of the matrix Z*_r, put ∂∆∆_α/∂γ = ∂(Z_α*T_s*) /∂γ = ∇_α. The mean

cross-product error matrix of the EBLUP θ ˆ then has G₃(ω)=σ2_[tr(_∇

αΣs*∇ ′ α ′ B)] where

Σs

*_{= ′}_Z

sWs

−1_Z

s+ ′ Z sWs

−1_Z_{sΩ ′}_Z

sWs

−1_Z

s.

For model (1.5) we have ∇_α = -(Z_α*_T

s

*_⊗_I

J₊H)(∂Ω

−1_/_∂γ₎_T

s

*

(11)

MCPE matrix of the EBLUP θ ˆ is therefore MC ˆ P Eˆ ( ˆ θ )=G₁( ˆ ω )+G₂( ˆ ω )+2G₃( ˆ ω )+G₄( ˆ ω ), where ωˆ is the ML estimate of the variance components vector ω.

The preceding development still holds when the variance components in (1.5) are estimated via REML. The only change is that in this case the Fisher information matrix of the REML estimators _σˆ2 ,ϕˆ andρˆ

is now of the form

1 2

(n-p)σ-4 _σ-2

[ϕ−_j1

(ν_j−r_j)] σ-2

[ (−v(_jh)₊ r_j(h)

)

j=1

J

∑

]

[ϕ−_j2{(ν_j- 2r_j)δ_j_j_′+ϕ−_j2ϕ−_j_′2r_j_j_′}] [ϕ−_j1(2r_j(h)−v(_jh)− ϕ−_j1ϕ−_j1_′ ′

j =1

J

∑

r_j(_jh_′)]

[ ({ ϕj

−1_ϕ ′

j

−1 rjj ′

(hh ′ ) }+vj

(hh ′ )₋ 2rj

(hh ′ ) )] ′

j =1

J

∑

j=1

J

∑

⎡

⎣ ⎢ ⎢ ⎢ ⎢

⎤

⎦ ⎥ ⎥ ⎥ ⎥

where νj, vj

(h)

, v(_jhh ′ )

are defined in (3.3.2) and rj, rj

(h)

, r_j(hh ′ )

, r_j(_jh_′)

and r_j(_jh_′h ′ )

are obtained by

simply by deleting the star (*) from all notation used in (3.3.2).

4 Application to Unit Level Linear Mixed Models

In this section we illustrate the application of the preceding theory to a number of commonly used unit level linear mixed models. These are appropriate when individual level data (both for Y and X) are available within the small areas.

4.1 A Model with IID Area Effects and IID Time Effects

A special case of (1.1) is a linear mixed model with independent and identically distributed (iid) area effects and corresponding iid time effects. Note that this corresponds to an assumption of a constant covariance between two observations from the same small area at two different points in time. In term of the general model (1.1), we have two random components, an area effect and a time effect. The model in this special case is

(4.1.1) y_dti =β₀+ ′ X dtiβt+u₁_t +u₂_d +e_dti.

The u1t and u2d are assumed to be mutually independent and to follow normal distributions

with zero means and respective variances. The edti is a random error, independent of random

components u1t and u2d and is assumed to follow a normal distribution with zero mean and

variance σ2. Let u₁ =[u₁₁ ,u₁₂,...u₁_T]′ , u₂=[u₂₁, u₂₂,...u₂_D] ′ and e=[e_dti] and let Z1 and Z2 to

be the incidence matrices for the random effect vectors u1 and u2 respectively. The model

(12)

(4.1.2) y = Xβ +Z1u1+ Z2u2 + e.

Here Z₁=Z₁*//Z*₂//...//Z_D* where // denotes the “stacking” operator, with

Zd

* ₌

diag(1_N

dt;t = 1,L,T), where 1Ndt is a vector of dimension Ndt with all elements equal to

one. Similarly, Z₂ =diag(1_N

d;d = 1,L,D), where Nd = _t₌₁Ndt

T

∑

. The random vectors u1,u2

and e are assumed to be distributed as multivariate normal with zero mean vectors and variance-covariance matrices given by σ₁2IT, σ2

2

ID and σ2INrespectively, where N is the sum

of the population sizes at t = 1, 2, …, T and the matrices ID, IN and IT are identity matrices

with dimensions equal to D, N and T respectively. In the context of the notation introduced in

section 1, we therefore have J = 2, ϕ₁= σ1 2

σ2, ϕ2= σ 2 2

σ2, Ω1=IT, Ω2=ID, W = IN and therefore

Ω = ϕ1IT 0

0 ϕ₂I_D

⎡ ⎣ ⎢

⎤

⎦ ⎥ and Σ =IN +ϕ1Z1Z 1′ +ϕ2Z2Z ′ 2. Note that there is no parameter ρ. Setting

u=(u ₁′ ,u ′ ₂) ′ and Z = (Z1, Z2), (4.1.2) further simplifies to

(4.1.3) y = Xβ +Zu + e

with corresponding model for the sample values ys given by

(4.1.4) ys= Xsβ + Zsu + es.

In this case Σs=I_n+ϕ₁Z_s₁Z ′ _s₁+ϕ₂Z_s₂Z ′ _s₂, where n denotes the overall number of sample values of Y (across all times and all areas) and Zsj denotes the sample version of Zj.

The ML and REML estimation algorithms described in sections 3.1 and 3.2 and the MCPE estimator described in section 3.3 can then be used (with these versions of Ω and Σs) to

calculate the EBLUPs of any set of linear combinations of the values in the population vector

y. Note that in both the ML and REML estimation algorithms we ignore the step that updates

ρh since this parameter does not exist in the model (4.1.1). We also have ν1 = T and ν2 = D.

The observed information matrix for the ML estimates of ω =(σ2,ϕ₁,ϕ₂) ′ is then

ML Iˆ = 1

2 nσ ˆ -4

ˆ

σ -2 [ ˆ ϕ j

−1

(νj−r ˆ j

* )] [ ˆ ϕ j

−2

{(νj- 2 ˆ r j

*

)δjj ′ + ˆ ϕ j

−2_ϕ_ˆ ′

j

−2_ˆ r jj ′

* }]

⎡ ⎣

⎢ ⎤

⎦ ⎥

where a “hat” denotes an ML estimate, δjj ′ =Kroneker delta, r ˆ j

*

= ϕ jˆ -1

tr(T ˆ _sj*_j_′), for j = 1, 2

and ˆ r _j*_j_′₌_{tr( ˆ}_T

sj j′

* _Tˆ

sjj ′

* ₎

. Note that the estimate of the matrix B needed to evaluate the G3 term of

(13)

hand corner of the inverse of Iˆ_ML above. The estimated MCPE matrix for the ML-based θ ˆ is

therefore MC ˆ P Eˆ ( ˆ θ )=G₁( ˆ ω )+G₂( ˆ ω )+2G₃( ˆ ω )+G₄( ˆ ω ), where

G1(ωˆ ) = σ2Zr

*_ˆ

T _s*Z*_r′

G2(ωˆ ) = σ ˆ 2[Xr

*₋

Z*_rT ˆ _s*Z ′ _sX_s](X ′ _sΣ ˆ −_s1X_s)−1[X*_r′ − ′ X _sZ_sT ˆ _s*Z*_r′]

G₄( ˆ ω )=σ ˆ 2

a_ra ′ _r G₃( ˆ ω )=σ ˆ 2_{[tr( ˆ}_∇

αΣ ˆ s

*_∇ˆ ′ ′ α B ˆ )]

ˆ Σ s

*_{= ′}

Z _sZ_s+ ′ Z _sZ_sΩ ′ ˆ Z _sZ_s

ˆ

∇ α= -(Zα*T ˆ s

* _⊗_I 2)(∂Ω ˆ

−1_/_∂_γ_ˆ_{) ˆ}_T

s

*

∂_Ωˆ −1_/_∂_ˆ_γ

= −diag ϕ ˆ ₁−2_I

T ⊗ 1 0 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ , ˆ ϕ 2

−2_I

D⊗ 0 1 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎧

⎨ ⎩

⎫ ⎬ ⎭ .

Here θ =ˆ ( ˆ θ _α), ω =ˆ ( ˆ σ 2, ˆ ϕ ₁, ˆ ϕ ₂) ′ and γ =ˆ ( ˆ ϕ ₁, ˆ ϕ ₂) ′ . The only change when calculating the

estimate of the MCPE matrix for the REML-based θ ˆ is the estimate of B. This is given by the 2 × 2 submatrix in the bottom right hand corner of the inverse of the estimated information matrix for the REML estimates of the variance components, which itself is defined by

ˆ

I _REML= 1

2

(n-p) ˆ σ -4 _σ_ˆ-2 [ ˆ ϕ −_j1

(ν_j−r ˆ _j)] [ ˆ ϕ −_j2

{(ν_j- 2 ˆ r _j)δ_j_j_′+ ˆ ϕ −_j2_ϕ_ˆ ′

j

−2_ˆ r _j_j_′}]

⎡ ⎣

⎢ ⎤

⎦ ⎥

where r ˆ _j = ϕ jˆ -1

tr(T ˆ _sjj), for j = 1, 2 and r ˆ _j_j_′=tr( ˆ T _s_{j j}_′T ˆ _sj_j_′).

A common application is to predict the population totals of Y in each of the small areas at time T, in which case θ = ay where a=diag[a ′ _d;d=1,2,L,D], with ad the Nd vector with

zeros everywhere except for the last NdT of its elements, which are one.

4.2 A Model with IID Area Effects and Autocorrelated Time Effects

(14)

Ω1=

1 1−ρ2 ⎛ ⎝

⎜ ⎞ _⎠⎟

1 ρ . . ρT−1

ρ 1 . . .

. . . . .

. . . . ρ

ρT−1

. . ρ 1

⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥

where ρ is the correlation parameter. The inverse of this matrix is Ω₁−1=I_T +ρ2E+ρF, where

E is a diagonal matrix of dimension T with the diagonal elements of (0,1,1,...1,0) and F is a

T×T matrix with elements on the principle diagonal equal to zero, elements in the diagonals immediately above and below the principle diagonal equal to minus one and all other

elements zero. In this case Ω = ϕ1Ω1 0 0 ϕ₂I_D

⎡ ⎣ ⎢

⎤

⎦ ⎥ so Σs=In+ϕ1Z1Ω1Z ′ 1+ϕ2Z2Z ′ 2.

ML and REML estimation of the parameters β and ω =(σ2,ϕ₁,ϕ₂,ρ ′ ) of this model follows along the same lines as in section 4.1 above. The main difference is that in this case H

= 1 (so we drop the h subscript) and we need to update the estimate of ρ in the iterative estimation process. This is done by replacing step 8 in the ML algorithm (see section 3.1) by the identity

ρ = − ϕ1

−1

[tr(T_s*₁₁F)+σ−2u ₁′ Fu₁]

{(2 /(1−ρ2)+2ϕ₁−1[tr(T_s*₁₁E)+σ−2u ₁′ Eu₁]}.

Step 9 in the corresponding REML algorithm (see section 3.2) also uses this identity, but with T_s*₁₁ replaced by Ts11.

Estimation of the MCPE of the small area estimates requires the Fisher information matrix for the estimated variance components ω =ˆ ( ˆ σ 2_{, ˆ}_ϕ

1, ˆ ϕ 2, ˆ ρ ′ )

ˆ

I _ML = 1

2

nσ ˆ -4 _ˆ

σ −2_ˆ

ϕ 1 -1

(ν₁−r ˆ ₁*) σ ˆ −2_ˆ

ϕ 2 -1

(ν₂−r ˆ ₂*) σ ˆ −2

( ˆ r ₁*(1)-v₁(1)) ˆ

ϕ 1 -2₍_ν

1−2 ˆ r 1 *₎₊_ϕ_ˆ

1 -4_r_ˆ

11

* _ϕ_ˆ

1 -2_ϕ_ˆ

2 -2_r_ˆ

12

* _ϕ_ˆ

1 -1_{(2 ˆ}_r

1 *(1)_-_v

1 (1)₋_ϕ_ˆ

1 -1_r_ˆ

11 *(1)₎

ˆ ϕ 2

-2₍_ν 2−2 ˆ r 2

*₎₊_ϕ_ˆ 2 -4_r_ˆ

22

* ₋_ϕ_ˆ

1 -1_ϕ_ˆ

2 -2_r_ˆ

12 *(1)

( ˆ ϕ ₁-2_r_ˆ 11

*(11)₊_v 1 (11)₋_{2 ˆ}_r

1 *(11)₎ ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥

where the individual terms are defined in (3.3.2). In order to calculate them we replace

unknown parameter values by ML estimates and make use of the fact that ∂Ωˆ₁−1/∂ρˆ =

F E+

ρˆ

2 in this case. The matrix B can be calculated as the 3 × 3 submatrix in the bottom

right hand corner of the inverse of the corresponding estimate of this information matrix. Estimation of the MCPE matrix of the MLE-based EBLUP θ ˆ under this model then follows along exactly the same lines as in section 4.1, the only difference being that we now calculate,

ˆ

∇ α = -(Zα*T ˆ s

*_⊗_I 3)(∂Ω ˆ

−1_/_∂_γ_ˆ_{) ˆ}_T

s

(15)

where γ =ˆ ( ˆ ϕ ₁, ˆ ϕ ₂, ˆ ρ ′ ) , and

∂_Ωˆ −1_/_∂_ˆ_γ

= −diag{[( ˆ ϕ ₁−2_Ωˆ 1

−1_⊗_E 1)−ϕ ˆ 1

−1_(∂_Ωˆ 1

−1_/_∂_ρ_ˆ₎_⊗_E 3], ˆ ϕ 2

−2_I

D ⊗E2}

where

E1 =

⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 0 0 1

, E2 =

⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 0 1 0

, E3 =

⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 1 0 0 .

The only change when calculating the corresponding estimate of the MCPE matrix for the REML-based version of θ ˆ is estimation of B. This is calculated as the 3 × 3 submatrix in the bottom right hand corner of the inverse of the estimated Fisher information matrix for the REML estimates of the variance components under this model, given by

ˆ

I _REML= 1

2

(n−p) ˆ σ -4 _σ_ˆ−2_ϕ_ˆ 1

-1₍_ν

1−r ˆ 1) σ ˆ

−2_ϕ_ˆ 2 -1₍_ν

2−r ˆ 2) σ ˆ

−2_{( ˆ}_r 1

(1)_-_v 1

(1)₎

ˆ ϕ 1 -2

(ν1−2 ˆ r 1)+ϕ ˆ 1 -4_ˆ

r 11 ϕ ˆ 1 -2_ϕ_ˆ

2 -2_ˆ

r 12 ϕ ˆ 1 -1

(2 ˆ r 1 (1)

-v1 (1)₋_ϕ_ˆ

1 -1_ˆ

r 11 (1)

) ˆ

ϕ 2 -2

(ν₂−2 ˆ r ₂)+ϕ ˆ ₂-4_ˆ

r ₂₂ −ϕ ˆ ₁-1_ϕ_ˆ 2 -2_ˆ

r ₁₂(1)

( ˆ ϕ ₁-2r ˆ ₁₁(11)+v₁(11)−2 ˆ r ₁(11)) ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥

where we refer again to (3.3.2) for definitions of the various components of this matrix, and note that this requires removal of all star (*) superscripts in (3.3.2).

4.3 A Model with a Time Varying Area Effect

The AR(1) model in introduced in section 4.2 requires a reasonable number of observations from different times to obtain an efficient estimate of the correlation parameter ρ. If there are only data available for a few time periods then the ML and REML estimates of this correlation parameter will be negatively biased. Furthermore, the model assumes that the area effects do not themselves evolve over time, with the only change over time arising because of the evolving (global) time effect. An alternative AR(1) model that uses more information and reduces biases and allows area random effects to vary over time is therefore of interest. Such a model is specified by

(4.3.1) y_dti =β₀+ ′ X dtiβt+u_dt+e_dti

where the {udt} are random effects that follow independent AR(1) processes for d = 1, 2,…,D.

As usual, the area level random effects udt and the individual level random effects edti are

(16)

e=[e_dti] and let Z to be the incidence matrix for the random effect vector u. The model (4.3.1) can then be written in matrix form as

(4.3.2) y = Xβ +Zu + e.

The random vectors u and e are distributed as independent multivariate normal with zero mean vectors and covariance matrices given by σ_u2Ω₁ and σ2I_N respectively. Assuming that

the random effects for the same area and different points in time can be modelled as a realisation of an AR(1) process, and the same process applies in all areas, the matrix Ω₁ is then

Ω1=

1 1−ρ2 ⎛ ⎝

⎜ ⎞ _⎠⎟ I_D ⊗

1 ρ . . ρT−1

ρ 1 . . .

. . . . .

. . . . ρ

ρT−1

. . ρ 1

⎡

⎣ ⎢ ⎢ ⎢ ⎢

⎤

⎦ ⎥ ⎥ ⎥ ⎥

where ρ is the (common) autocorrelation parameter. The matrix Ω₁ is a block-diagonal matrix with Ω₁−1=diag(I_T +ρ2E+ρF), where E and F were defined in section 4.2. The variance-covariance of y is then σ2I_N +σ_u2ZΩ₁Z = ′ σ2Σ where Σ =I_N+ZΩ ′ Z , Ω =ϕΩ₁ and

ϕ = σu

2

/σ2. The corresponding model for the sample values is therefore (4.3.3) ys= Xsβ + Zsu + es

with the covariance of ys given by σ2Σs where Σs=In+ZsΩ ′ Z s. Here n is the total sample

size.

Again, we observe that ML and REML estimation of the parameters β and

ω =(σ2_,_ϕ_,_{ρ ′}₎

of this model follow the same lines as in section 4.1. In this case we update the estimate of ρ in the iterative estimation process by replacing step 8 in the ML algorithm (see section 3.1) by the identity

ρ = − ϕ−1[tr(Ts

*

F*)+σ−2u F′ *u]

{(2D/(1−ρ2)+2ϕ−1[tr(T_s*E*)+σ−2u E′ *u]}.

Here E* and F* are block-diagonal matrices of order D made up of blocks E and F

respectively. Step 9 in the corresponding REML algorithm (see section 3.2) also uses this identity, but with T_s* replaced by Ts, where Ts = Ts

*₊

(17)

Turning now to estimation of the MCPE of the small area estimates, we note that the estimated Fisher information matrix for the MLEs of the variance components ω =ˆ ( ˆ σ 2_{, ˆ}_ϕ_{, ˆ}_{ρ ′}₎

in this case is

ML Iˆ = 1

2

nσ ˆ -4 _σ_ˆ−2_ϕ_ˆ−1₍_{ν −}_r_ˆ 1

*₎ _σ_ˆ−2_{( ˆ}_r 1

*(1)_-_v 1 (1)₎

ˆ

ϕ −2₍_{ν −}_{2 ˆ}_r 1

*₎₊_ϕ_ˆ−4_r_ˆ 11

* _ϕ_ˆ−1_{(2 ˆ}_r 1

*(1)_-_v 1

(1)₋_ϕ_ˆ−1_r_ˆ 11

*(1)₎

( ˆ ϕ −2_r_ˆ 11

*(11)₊_v 1

(11)₋_{2 ˆ}_r 1 *(11)₎ ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥

where (3.3.2) provides definitions of the individual terms. This matrix can be evaluated by replacing unknown parameter values by ML estimates and making use of the fact that

∂_Ωˆ

1 −1_/_∂_ρ_ˆ

= 2 ˆ ρ E*₊_F*

in this case. As usual, we estimate of B by the 2 × 2 submatrix in the

bottom right hand corner of the inverse of this estimated information matrix, with estimation of the MCPE matrix of the MLE-based θ ˆ under this model then as in section 4.1, except that

ˆ

∇ α = -(Zα*T ˆ s

*_⊗_I 2)(∂Ω ˆ

−1_/_∂_γ_ˆ_{) ˆ}_T

s

*

where γ =ˆ ( ˆ ϕ , ˆ ρ ′ ) , and

∂_Ωˆ −1_/_∂_ˆ_γ

= −diag ϕ ˆ −2_Ωˆ 1 −1_⊗ 1

0 ⎡

⎣ ⎢ ⎤ ⎦ ⎥ −ϕ ˆ −1(∂Ω ˆ 1

−1_/_∂_ρ_ˆ₎_⊗ 0

1 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎧ ⎨ ⎩ ⎫ ⎬ ⎭ .

The corresponding estimate of the MCPE of the REML-based EBLUP θ ˆ differs only in computation of the estimate of the matrix B. Here this is obtained as the 2 × 2 submatrix in the bottom right hand corner of the inverse of the estimated Fisher information matrix for the REML estimates of the variance components, which is given by

ˆ

I _REML= 1

2

(n−p) ˆ σ -4 σ ˆ −2_ˆ

ϕ −1

(ν −r ˆ ₁) σ ˆ −2

( ˆ r ₁(1)-v₁(1)) ˆ

ϕ −2

(ν −2 ˆ r ₁)+ϕ ˆ −4_ˆ

r ₁₁ ϕ ˆ −1

(2 ˆ r ₁(1)-v₁(1)−ϕ ˆ −1_ˆ

r ₁₁(1)) ( ˆ ϕ −2r ˆ ₁₁(11)+v₁(11)−2 ˆ r ₁(11)) ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ .

As usual, (3.3.2) provides definitions of individual terms, after star (*) superscripts used there are discarded.

4.4 A Model with Spatial Correlated Area Effects

(18)

points in this case, we drop the ‘t’ subscript and assume all our data relate to a single time point. Generalization to the models with both spatially and temporally correlated random effects is straightforward.

Let ydi denote the ith population value for a characteristic of interest within an area d (i =

1,2,…Ni; d = 1,2,…D). The vector xdi represents the corresponding values of auxiliary

population information (covariates). The objective is to estimate/predict the value of the small area characteristic θ which is a linear function of the population values ydi. Let ud be the dth

area effect, ydi be the population response variable. The model of interest is then

(4.4.1) ydi = β0 +x ′ diβ+ ud + edi

where β0 is an intercept, β is a vector of regression coefficients, the edi are independent

random errors with E(edi) = 0 and Var(edi) = σ2 and the ud are normally distributed variables

with zero mean and covariances given by (4.4.2) Cov(u_d,u_d_′)=σ_u2_f₍_Dist₍_d_,_d′ _);_ρ₎

where ρ is an unknown parameter and Dist(d,d ′ )) is an appropriate measure of the “distance” between areas d and d′.

Let y = {ydi} be the vector of population values of the response variable, with ys

denoting the values observed in the sample, X ={xdi} be the matrix of regression variables

(covariates), with Xs denoting the corresponding sample values, and u and e the random area

effect and error vectors respectively. Let Z be the incidence matrix for the random effect vector u. The population model (4.4.1) can then be written

(4.4.3) y = Xβ +Zu + e.

The error vector e and area effect vector u have independent multivariate normal distributions with zero mean vectors and covariance matrices given σ2INand σu2Ω₁ respectively, with the

matrix Ω₁ reflecting the spatial autocorrelation of the area effects. For example, this is achieved via a model of the form

Ω₁ = [1+δd_d_′exp(Dist(d, d )′

ρ )]−

1

⎡ ⎣ ⎢

⎤ ⎦ ⎥

where ρ is an unknown parameter and δdd _′is zero for d = d ′ and 1 otherwise.The model for

the sample vector ys is

(19)

where the incidence matrix Z_s=

1_n

1 . . 0

0 . . 0

0 . . 1_n

D ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥

where 1_n

d is a vector of dimension nd with all

elements equal to one. It follows that the covariance matrix of ys is then σ2Σs where

Σs=I_n+ZsΩ ′ Z _s, Ω =ϕΩ₁ and ϕ = σ_u2/σ2.

Given this set-up we are back with the model (1.1) and so can apply the ML and REML estimation theory set out in section 3 to prediction of θ. In this context we observe that both the ML estimation algorithm described in section 3.1 and the REML estimation algorithm described in section 3.2 still apply, with the updating step 8 in the ML algorithm (step 9 in the REML algorithm) replaced by:

Put ρnew=ρold +θ(∂l/∂ρold), where

∂l/∂ρ = −(1/2)[(tr(Ω₁−1(∂ΩΩ₁/∂ρ)))−(ϕ−1tr(Ω₁−1T_s*Ω₁−1(∂ΩΩ₁/∂ρ))) −σ−2ϕ−1_u_ˆ′ _Ω

1 −1₍_∂Ω_Ω

1/∂ρ)Ω1 −1_u_ˆ_]

and θ is the (3,3) element of the inverse of the information matrix of the ML/REML estimate of the variance components ω =ˆ ( ˆ σ 2, ˆ ϕ , ˆ ρ ′ ) .

Note that for the spatial correlation model defined above, we have

∂Ω1/∂ρ =

Dist(d,d ′ )

ρ2 δdd ′ exp

Dist(d,d ′ ) ρ ⎛ ⎝

⎜ ⎞ _⎠⎟ 1+δd_d_′exp Dist(d,d ′ ) ρ ⎛ ⎝ ⎜ ⎞ _⎠⎟ ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ −2 ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ .

For estimation of the MCPE matrix in this case, we observe that the estimated Fisher information matrix of the ML estimates of the variance components ω =ˆ ( ˆ σ 2, ˆ ϕ , ˆ ρ ′ ) is

ˆ

I _ML= 1

2

nσ ˆ -4 _σ_ˆ−2_ϕ_ˆ−1₍_{ν −}_r_ˆ 1

*₎ _σ_ˆ−2_{( ˆ}_r 1

*(1)_-_v 1 (1)₎

ˆ

ϕ −2₍_{ν −}_{2 ˆ}_r 1

*₎₊_ϕ_ˆ−4_r_ˆ 11

* _ϕ_ˆ−1_{(2 ˆ}_r 1

*(1)_-_v 1

(1)₋_ϕ_ˆ−1_r_ˆ 11

*(1)₎

( ˆ ϕ −2_r_ˆ 11

*(11)₊_v 1

(11)₋_{2 ˆ}_r 1 *(11)₎ ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥

while that of the REML estimates is

ˆ

I _REML= 1

2

(n−p) ˆ σ -4 _σ_ˆ−2_ϕ_ˆ−1₍_{ν −}_r_ˆ

1) σ ˆ

−2_{( ˆ}_r 1

(1)_-_v 1

(1)₎

ˆ

ϕ −2₍_{ν −}_{2 ˆ}_r 1)+ϕ ˆ

−4_r_ˆ 11 ϕ ˆ

−1_{(2 ˆ}_r 1

(1)_-_v 1

(1)₋_ϕ_ˆ−1_r_ˆ 11

(1)₎

( ˆ ϕ −2_r_ˆ 11

(11)₊_v 1

(11)₋_{2 ˆ}_r 1 (11)₎ ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ .

In both cases see (3.3.2) for the definitions of the various components, remembering that in the REML case we need to discard the star (*) superscripts. Note also that evaluation of these components depends on ∂Ω ˆ ₁−1/∂ρ ˆ and is therefore dependent on the actual spatial correlation

(20)

of the matrix B defined by the 2 × 2 submatrix in the bottom right hand corner of the inverse of the estimated information matrix for either the ML or REML estimates, we can then estimate the MCPE matrix of the EBLUP θ ˆ following the same steps as outlined in section 4.3 above.

5 Application to Area Level Linear Mixed Models

There are many applications of small area estimation where the survey information is available at area level rather that at individual level. For example, survey-based estimates of small area averages (the so-called direct estimates) are available, but individual level data are not. The theory developed in sections 1 – 3 can be adapted to this situation provided it is reasonable to assume these direct estimates can also be modelled linearly. We now illustrate this via application of area level versions of the models considered in section 4.

5.1 A Model with IID Area and IID Time Effects

Let the vector y = {ydt; t = 1,2,…, T; d = 1,2,…,D} consist of the direct survey estimates of

the survey variable Y. The subscripts d and t (t = 1,2,…,T; d = 1,2,…,D) represent area and time respectively. Let X be the matrix of the auxiliary covariates X_dt, all measured at area level. Assuming that u1t and u2d represent time and area effects respectively, let ηdt denote the

mean of Y in small area d at time t (i.e. after conditioning on these effects). The objective then is to predict the value of θ=aη, where η = (ηdt), given the following model

(5.1.1) ηdt =β₀+ ′ X dtβ_t+u₁_t+u₂_d.

The vector βt contains regression coefficients at time t = 1,2, .., T, β0 is the intercept and the

random effects u1t and u2d are assumed to be mutually independent and normally distributed

with zero means and variances as defined below. To illustrate, let T = 2 and D = 3. The a

matrix that defines the small area means at last time period (t = 2) is then

a =

0 1 0 0 0 0

0 0 0 1 0 0

0 0 0 0 0 1

⎡

⎣ ⎢ ⎢

⎤

⎦ ⎥ ⎥ .

The linear predictor ηdt is related to direct estimator ydtvia following model

(21)

where edtrepresents sampling error. This is often referred to as a Fay-Herriot-type model. Let

u₁=[u₁₁, u₁₂,...u₁_T] ′ , u₂ =[u₂₁, u₂₂,...u₁_D] ′ and e=[e_dt] and let Z1 and Z2 be the incidence

matrices for the vectors u1 and u2 respectively. Put Z ′ 1=1D ⊗IT =(Z ′ 1 *_,_Z′

2 *_,...,_Z′

D

*₎

with

Z_d* ₌_I

T for d=1,L,D) and Z2 =ID⊗1T =diag(1T; d=1,L,D) where 1T is a vector of

dimension T with all elements equal to one and ID and IT are identity matrices of order D and

T respectively. The model (5.1.2) can then be written in matrix form as (5.1.3) y = Xβ +Z1u1+ Z2u2 + e.

The random vectors u1,u2 and e are assumed to be mutually independent with covariance

matrices given by σ₁2IT, σ2 2

ID and σ2W respectively. The matrix W is a known positive

definte square matrix of order n = T× D. Typically W is a diagonal matrix with elements that are functions of the sample sizes ndt. Here ndt denotes the sample size in small area d at time t.

The covariance matrix of y is then Var(y)=σ2_W₊_σ 1

2_Z

1Z 1′ +σ2 2_Z

2Z ′ 2. Setting u=(u ′ 1,u ′ 2) ′ and

Z = (Z1, Z2), the model (5.1.3) further simplifies to

(5.1.4) y = Xβ +Zu + e

in which case Var(y) = σ2Σ where Σ =W+ϕ₁Z₁Z ₁′ +ϕ₂Z₂Z ′ ₂ and ϕ_j =σ2_j/σ2.

ML and REML estimation of the parameters of (5.1.4), as well as calculation of the corresponding EBLUP and its estimated MCPE matrix, then follows along exactly the same lines as in section 4.1. The only difference is that we replace ys and Σs there by y and Σ here.

Note that this model can only be fitted if D > 1 and T > 1. If either of these conditions are not met, then σ2 is not identifiable. In such a case we need to estimate this parameter using other methods and then substitute this estimate in the preceding development.

5.2 A Model with IID Area and Autocorrelated Time Effects

Here we extend the model in the previous section to allow the time effect to be the outcome of a stochastic process. In particular, we assume that this process is AR(1). The model for the observed direct estimates is therefore

(5.2.1) y_dt =ηdt+e_dt

where ηdt =β₀+ ′ X dtβt+u₁_t +u₂_d a n d edt is sampling error. The vector βt contains the

regression coefficients for time t = 1, 2, .., T, β0 is the intercept and the random effects u1t and