Work Statis Dece ROB Abst Com Maxi meth well the r prop sever norm estim coeff estim when Keyw comp king Paper stics and Ec ember, 2011 BUST HEN tract mmon metho imum Like hods are bas know that random co osed (e.g. ral robust a mality assum mators. Thes ficients. Fi mation, in w n the within words: Hen ponents esti 11-43-(32) conometrics 1 NDERSON T Betsab
ods for estim elihood (M sed on the s they are ve omponents. Fellner 198 alternatives mption and se estimator nally, we which the m n-area samp nderson me imators. ) Series N III ESTIM THE NEST é Pérez, Da mating varia ML) and Re trong assum ery sensitiv Several ro 86, Richard based on th d provide rs can later describe a main target is le sizes are ethod III, L MATORS O TED ERRO aniel Peña ance compo estricted M mption of m ve to outlyin obust altern dson and W he Henderso explicit so be used to an applicati s the estima small. Linear mixe Univ OF VARIA OR MODEL and Isabel onents in Li Maximum L multivariate n ng observat natives of Welsh 1995) on method I lutions for derive robu ion of this ation of the ed model, Departame versidad Ca C 289 Fax ( ANCE COM L Molina inear Mixed Likelihood normal dist tions with r these meth ). In this w III which do the varian ust estimato s procedure means of a robust estim ento de Estad arlos III de M Calle Madri 03 Getafe ( (34) 91 624-MPONENT d Models in (REML). T tribution and respect to a hods have work we pr o not rely o nce compo ors of regre e to small areas or dom mators, var dística Madrid id, 126 Spain) -98-49 TS IN nclude These d it is any of been resent on the onents ession area mains riance
Robust Henderson III Estimators of Variance
Components in the Nested Error Model
Betsab´
e P´
erez, Daniel Pe˜
na and Isabel Molina
December 23, 2011
1
Introduction
In the last decades linear mixed models have received considerable attention in the lit-erature from a practical and theoretical point of view (e.g. McCulloch and Searle [14], Verbeke and Molenberghs [23], Demidenko [3]). These models are frequently used in many fields such as small area estimation or longitudinal studies because they model adequately the within-subject correlation typically present in these type of data. Other fields of application include clinical trials (Vangeneugden et al. [22]) and environmental studies (Wellenius et al. [24]). Despite the many different applications of these models, still diagnostic methods are not so well developed. Christensen et al. [2] studied case deletion diagnostics. Banerjee and Frees [1] studied case deletion and subject deletion diagnostics. Galpin and Zewotir [5] and [6] extended some diagnostic tools of ordinary linear regression, such as residuals, leverages and outliers to LMMs when the variances of the random factors are known. In practice, however, these variances are unknown and need to be estimated from sample data. If sample data are contaminated, then the esti-mation of variance components might be seriously affected and this will in turn affect all diagnostic tools. Given the importance of adequately estimating variance components, we introduce new robust estimators of variance components based on Henderson method III. This method has been chosen for three reasons; first, because it provides explicit formulas for the estimators, avoiding iterative procedures and the need for starting values and re-ducing the computational time; second, because it does not need any assumption on the shape of the probability of the distribution apart from the existence of first and second order moments; third, the estimation procedure consists simply of solving two standard
regression problems. These estimators can later be used to derive robust estimators of regression coefficients. Finally, we describe an application of this procedure to small area estimation, in which the main target is the estimation of the means of areas or domains when the within-area sample sizes are small.
2
Linear model with random effects
Let us consider sample data that come from D different populations groups. Suppose that there are nd observations from group d, d = 1, . . . , D, where n = PDd=1nd is the total sample size. Denote ydj the value of the study variable for j-th sample unit from
d-th group andxdj a (column) vector containing the values ofpauxiliary variables for the same unit. The model at individual level is given by
ydj =xTdjβ+ud+edj j = 1, . . . , nd d= 1, . . . , D. (1)
where βis thep×1 vector of fixed parameters,ud is the random effect ofd-th group and
edj is the model error. Random group effects and errors are supposed to be independent with distributions ud iid ∼ N(0, σu2) and edj iid ∼ N(0, σe2).
Observe that under this model, in contrast with model (3.1), the means of the observations are not affected by the group effect ud since E(ydj) =xTdjβ. However, the random group effects induce a (constant) correlation between all pairs of observations in the same group, because cov(ydj, ydk) = σu2 for k 6= j. Still, observations in different groups are uncorre-lated. Stacking the elements of the model in columns, we obtain y= (y11, y12, . . . , yDnD)
T
of size n, u = (u1, u2, . . . , uD)T of size D and e = (e11, e12, . . . , eDnD)
T of size n. In turn, concatenation of the predictor vectors gives the n×p matrix X = (x11,x12, . . . ,xDnD)
T. Additionally, we define the n×D block diagonal matrix
Z= 1n1 0 · · · 0 0 1n2 · .. . .. . · . .. 0 0 · · · 0 1nD
where here, 1nd denotes a vector of ones of size nd. Then, in matrix notation, the model
can be written as
The expectation and covariance matrix of y are given by
E(y) = Xβ and var(y) = σ2uZZT +σe2In =V. which means that
y∼N(Xβ, σ2uZZT +σe2In)
Let us define the vector of variance componentsθ = (σu2, σe2)T. Whenθ is known, Hender-son [32] obtained the Best Linear Unbiased Estimator (BLUE) of β and the Best Linear Unbiased Predictor (BLUP) of u, which are defined respectively as
˜
β= (XTV−1X)−1XTV−1y, (3) ˜
u=σu2ZTV−1(y−Xβ˜). (4)
3
Estimation of variance components
The estimator (3) and the predictor (4) depend on θ, which in practice is unknown and needs to be estimated from sample data. The empirical versions of (3) and (4), called EBLUE and EBLUP respectively, are obtained by replacing a suitable estimator ˆθ for θ in (3) and (4) and are given by
ˆ
β= (XTVˆ−1X)−1XTVˆ−1y, (5) ˆ
u= ˆσu2ZTVˆ−1(y−Xβˆ), (6) where the hat over V indicates that θ has been replaced by its estimator ˆθ.
Traditional methods for estimating variance components include those based on the like-lihood, namely maximum likelihood (ML) and restricted/residual ML (REML), and a moments method called Henderson method III, see e.g., Searle et al. [21]. However, when outliers are present, these methods may deliver estimators with poor properties. Below we briefly review each of these methods.
3.1
Maximum likelihood
Maximum likelihood estimation is usually carried out under the assumption thaty has a multivariate normal distribution. Under this assumption, the joint likelihood is given by
f(β, θ|y) = (2π)−n2|V|−1/2exp −1 2(y−Xβ) TV−1(y−Xβ) .
The joint log-likelihood is
`(β, θ|y) = ln(f(β, θ|y)) =c− 1
2[ln|V|+ (y−Xβ)
TV−1(y−Xβ)], where cis denotes a constant. Using the relations
∂ln|V| ∂θ =tr V−1∂V ∂θ and ∂V −1 ∂θ =−V −1∂V ∂θ V −1 , The first order partial derivatives of ` with respect toβ,σ2
u and σe2 are ∂`(β, θ|y) ∂β = X TV−1(y−Xβ), ∂`(β, θ|y) ∂σ2 u = −1 2tr V−1ZZT + 1 2(y−Xβ) T V−1ZZTV−1(y−Xβ), ∂`(β, θ|y) ∂σ2 e = −1 2tr{V −1}+1 2(y−Xβ) TV−1V−1(y−Xβ), and equating them to zero we obtain the equations
XTV−1y=XV−1Xβ, (7) tr{V−1ZZT}= (y−Xβ)TV−1ZZTV−1(y−Xβ), (8) tr{V−1}= (y−Xβ)TV−1V−1(y−Xβ). (9) Solving for β in (7), we obtain the ML estimating equation for β,
ˆ
β= (XTV−1X)−1XTV−1y, where here V depends on the ML estimator of θ = (σ2
u, σe2)T. Equations (8) and (9) do not have analytic solution and need to be solved numerically by iterative methods such as Newton-Raphson or Fisher-scoring.
3.2
Restricted maximum likelihood
A criticism of ML estimators of variance components is that they are biased downward, because they do not take into account the loss in degrees of freedom from the estimation of β. REML method corrects for this problem by transforming y into two independent vectors,y1 =K1yand y2 =K2y. The probability density function ofy1 does not depend on β and it holds E(y1) = 0, which means that K1X = 0. On the other hand, y2 is independent of y1, which means that K1VK2T = 0. The matrix K1 is chosen to have maximum rank, i.e. n −p, so the rank of K2 is p. The likelihood function of y is the product of the likelihoods ofy1andy2. The variance components coming from the REML
approach are the ML estimators of these parameters based on y1. Similarly to the ML case, the obtained equations do not have analytic solutions and need to be solved using iterative techniques such as EM algorithm, Fisher-scoring or Newton-Raphson methods.
Jennrich and Schluchter [25] compared the performances of the three algorithms and noted the following: (1) direct comparison of these algorithms in terms of required computational burden is difficult, because this depend to a large degree of how efficiently the algorithms are coded. (2) Newton-Raphson algorithm, with a quadratic convergence rate, generally converges in a small number of iterations, with a higher cost per iteration. (3) EM method has the lowest cost per iteration, but at times requires a large number of iterations. (4) Fisher-scoring algorithm is intermediate in terms of cost per iteration and required number of iterations. However, its cost per iteration is often not much smaller than that of Newton-Raphson algorithm, whereas Fisher-scoring algorithm sometimes requires a considerably larger number of iterations than Newton-Raphson algorithm. Lindstrom and Bates [26] provided arguments favoring the use of Newton-Raphson method.
3.3
Henderson method III
ML and REML estimators ofθare usually based on the assumption that the vectoryhas a multivariate normal distribution, although they remain consistent even when normality is not satisfied exactly under some regularity conditions (Jiang, [10]). An alternative method which does not rely on normality and provides explicit formulas for the estimators of the variance components is Henderson method III (H3). This method works as follows. First, consider a linear mixed model y = Xβ +e, where β might contain fixed and random effects. Let us split β into two subvectors β1 and β2 and define the full model as
y=X1β1+X2β2 +e. (10) The partition in sum of squares of model (10) is given by
SSR (β1,β2) = yTX(XTX)−1Xy,
SSE (β1,β2) = eTe= [(In−X(XTX)−1X)y]T[(In−X(XTX)−1X)y], SST (β1,β2) = yTy,
with their corresponding expected values given by E[SSR (β1,β2)] =tr XT1X1 XT1X2 XT 2X1 XT2X2 E(ββ T ) + rank(X)σ2e, E[SSE (β1,β2)] = [n−rank(X)]σ2e, E[SST (β1,β2)] =tr XT 1X1 XT1X2 XT2X1 XT2X2 E(ββT) +nσe2. (12)
Now consider the reduced model with only β1,
y = X1β1+. (13)
Analogously, the partition in sum of squares of model (13) is given by
SSR (β1) = yTX1(XT1X1)−1X1y,
SSE (β1) = T= [(In−X1(XT1X1)−1X1)y]T[(In−X1(X1TX1)−1X1)y], SST (β1) = yTy,
(14)
with their corresponding expected values
E[SSR (β1)] =tr XT 1X1 XT1X2, XT2X1 XT2X1(XT1X1)−1XT1X2 E(ββT) + rank(X1)σ2e, E[SSE (β1)] =tr{XT[In−X1(XT1X1)−1XT1] T[I n−X1(XT1X1)−1XT1]XE(ββ T)} + [n−rank(X)]σe2, E[SST (β1)] =tr XT1X1 XT1X2 XT 2X1 XT2X2 E(ββ T ) +nσ2e. (15)
The reduction in sum of squares due to introducing X2 in the model with onlyX1 is SSR(β2|β1) = SSR(β1,β2)−SSR(β1). (16) The expectation of this reduction is given by
E[SSR (β2|β1)] =tr{XT2[In−X1(XT1X1)−1X1T]X2E(ββT)} + [rank(X)−rank(X1)]σe2.
(17)
Now consider model (1) and rewrite it as (10) taking β1 = β, β2 = u, X1 = X and X2 =Z. This method equates the sum of squares SSR(β1,β2) in (14) and SSR(β2|β1) in (16) to their expectations in (12) and (17) respectively, obtaining two equations. Solving for σ2
(for more details see [21], chapter 5). Let ˆe and ˆε be the vectors of residuals obtained by fitting the two models (10) and (13) respectively, considering β2 as fixed. If rank(X) = p and rank(X|Z) = p+D, then the Henderson III estimators of the variance components are given by ˆ σe,H2 3 = PD d=1 Pnd j=1ˆe2dj n−p−D , σˆ 2 u,H3 = PD d=1 Pnd j=1εˆ2dj−σˆe2(n−p) tr{ZT[I−X(XTX)−1XT]Z}, (18) where ˆedj is the residual corresponding to observation (xTdj, ydj) in model (10) and ˆdj is the corresponding in model (13).
4
Diagnostic methods
Limited work has been done on diagnostic methods for linear mixed models. Christensen et al. [2] considered the case deletion diagnostics and Galpin and Zewotir [6] provided a definition of residuals, leverages and outliers when some variance components are known.
Fitted values of the response variable are
ˆ
y=Xβˆ +Zu,ˆ
and residuals are then
ˆ
e=y−yˆ =Ry, with R=V−1−V−1X(XTV−1X)−1XTV−1.
Studentized residuals (internal studentization):
tdj = ˆ edj p var(ˆedj) = ˆedj ˆ σe √ rdj
where rdj is the dj-th diagonal element of matrixR and edj is the dj-th element of vector ˆ
e=Ry.
Studentized residuals (external studentization): Let ˆσe(dj) denote the estimate of σe when the dj-th observation is deleted. If ˆσe2(dj) is used in place of ˆσ
2
e we obtain the
dj-th externally Studentized residual, given by
t∗dj = eˆdj ˆ σe(dj) √ rdj . The estimator t∗dj satisfies that t∗2
dj ∼ n−1
n−p−1F(1, n−p−1) where F(1, n−p−1) is an F-distribution with degrees of freedom 1 and (n−p−1).
Note that element the rdj used to standardized residuals depends on the variance com-ponents σ2
e and σ2u, which are unknown. When there are outliers, these might affect the estimators of variance components, and these estimators will change the distribution of standarized residuals.
To illustrate this, we have simulated data from model (1), with D= 15 groups and total sample sizen = 2500. The theoretical values of the variance components areσ2e = 0.5 and σ2
u = 0.5. In order to increase the estimator of the error varianceσe2, we introduced atyp-ical data on y as mean shifts, by increasing the values of the some of the response values byk times the theoretical standard deviation withk = 5. Index plots of internally studen-tized residuals, using the true variance components and the estimated ones, appear in the left and right panels of Figure 1 respectively. This example illustrates how the estimation of variance components affect the studentized residuals. On the right plot obtained with estimated variances, all residuals appear in the interval (-2.5,2.5); as a consequence, us-ing the standard rule applied to these residuals, outlyus-ing observations will not be detected.
0 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0 -1 5 0 -1 0 0 -5 0 0 5 0 1 0 0 1 5 0 O b s e rva tio ns R e s id u a ls
(a) Variance components known
0 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0 -2 -1 0 1 2 O b s e rva tio ns R e s id u a ls (b) Henderson III
Figure 1: Internally studentized residuals (a) using the true variance components and (b) when they are estimated using H3 method.
Leverage effect in the nested-error model
Assuming that θ is known, the vector of predicted values is
˜
y= (I−R)y (19)
This relation evokes the definition of the Hat matrix, as
H˜y=I−R.
The diagonal elements (1−rdj) of this matrix are measures of the leverage effect of the observations and are called leverages. Galpin and Zewotir [6] proposed the use of the rdjs to identify influential observations. If rdj approaches zero, this indicates that the corresponding observation has a large leverage effect.
Due to the grouped data structure in linear mixed models with one random factor, it seems more relevant to study the leverage effect of groups instead of that of isolated observations. The leverage effect of group d is defined here as
hd=xTd(X T
V−1X)−1xd, , d= 1, . . . , D (20) where xd = n−d1
Pnd
j=1xdj. In practice, V could be estimated using the robust variance components estimators described in the next section.
5
Robust Henderson method III
Consider the linear regression model with random effects given in (1). The estimators of variance components obtained by Henderson method III (H3 estimators) are given by
ˆ σe,H2 3 = PD d=1 Pnd j=1ˆe 2 dj n−(p+D) , σˆ 2 u,H3 = PD d=1 Pnd j=1εˆ 2 dj−σˆe2(n−p) tr{ZT[I−X(XTX)−1XT]Z}, (21) where ˆedj is the residual corresponding to observation (xTdj, ydj) in the full model (10) with group effects assumed to be fixed and ˆεdj is the corresponding residual in the reduced model (13).
Remark 1 Henderson III estimators are scale equivariant, that is,
ˆ
σe,H3(cy) =|c|σe,H3(y) and σˆu,H3(cy) = |c|σu,H3(y).
The estimator ˆσ2e,H3 can be expressed as
ˆ
σ2e,H3 = ˆσe,H2 3(y) = SSE(β ∗ ) n−rank(X∗) = yT(I n−H∗)y n−(p+D) where H∗ =X∗(X∗TX∗)−1X∗T, X∗ = (X|Z) and β∗ = (βT,uT)T. Then, ˆ σe,H3(cy) = s (cy)T(I n−H∗)(cy) n−(p+D) = s c2yT(I n−H∗)y n−(p+D) = |c| s yT(I n−H∗)y n−(p+D) = |c|σˆe,H3(y)
Therefore, the estimator ˆσe,H3 is scale invariant. Now we check that ˆσu,H3 is also scale equivariant.
The estimator ˆσ2u,H3 is given by
ˆ
σu,H2 3 = ˆσu,H2 3(y) = SSE(β)−σˆ 2 e,H3(n−p) tr[ZT(I n−H)Z] = y(In−H)y− h yT(I n−H∗)y n−(p+D) i (n−p) tr[ZT(I n−H)Z] denoting m =tr[ZT(In−H)Z] ˆ σu,H2 3 = 1 m y(In−H)y− n−p n−(p+D)y T(I n−H∗)y thus, ˆ σu,H3(y) = s 1 m yT(I n−H)y− n−p n−(p+D)y T(I n−H∗)y Then, ˆ σu,H3(cy) = s 1 m (cy)T(I n−H)(cy)− n−p n−(p+D)(cy) T(I n−H∗)(cy) = s c2 m yT(I n−H)y− n−p n−(p+D)y T(I n−H∗)y = |c| s 1 m yT(I n−H)y− n−p n−(p+D)y T(I n−H∗)y = |c|ˆσu,H3(y)
Therefore, the estimator ˆσu,H3 is scale invariant.
Let us express Henderson III estimators in terms of the means of squared residuals
ˆ σe,H2 3 = nhPD d=1 Pnd j=1eˆ2dj/n i n−(p+D) , σˆ 2 u,H3 = nhPD d=1 Pnd j=1εˆ2dj/n i −σˆ2 e(n−p) tr{ZT[I−X(XTX)−1XT]Z} , (22) We propose to robustify these estimators using, first, robust methods to fit the two models (10) and (13) and, after that, replacing in (22) the means of squared residuals by other robust functions.
Model (13) is a standard linear regression model, which can be robustly fitted using any method available in the literature such asL1 estimation, M estimation or the fast method of Pe˜na and Yohai [30]. Model (10) is a model with fixed group effects, which can be robustly fitted using an adaptation of the principal sensibility components method of Pe˜na and Yohai [30] to the grouped data structure. An alternative approach is the M-S estimation of Maronna and Yohai [31].
These fitting methods will provide better residuals ˆedj and ˆεdj, which are in turn used to find robust estimators of the variance components. Below we describe different estimators based on robust functions of these new residuals.
MADH3 estimators: In the two estimators given in (22), we substitute the means of squared residuals by the square of the normalized medians of absolute deviations (MAD), given by
MAD = 1.481 median(|ξˆdj|,ξˆdj 6= 0),
where ˆξdj is the residual of observation (xTdj, ydj) under the corresponding fitted model, either (10) or (13).
TH3 estimators: Trimming consists of giving zero weight to a percentage of extreme cases. In this case, in the two equations given in (22) we trim residuals that are outside the interval (b1, b2) with
Here, q1 and q2 are the first and third sample quartiles of residuals and k is a constant. Based on results obtained from different simulation studies, we propose to use the constant k = 2, just slightly smaller than that one used as outer frontier in the box-plot for detecting outliers.
RH3 estimators: Instead of replacing extreme residuals by zero as in the previous pro-posal, we can smooth residuals appearing in (22) according to an appropriate smoothing function. Here we consider Tukey’s biweight function, given by
ϕ(x) =x[1−(x/k)2]2, if |x| ≤k. (24)
In this case, the robust Henderson III estimators are given by
ˆ σ2e,RH3 = σ 2 e,M AD PD d=1 Pnd j=1ϕ 2(ˆe dj/σe,M AD) n−(p+D) , (25) ˆ σu,RH2 3 = σ 2 u,M AD PD d=1 Pnd j=1ϕ 2(ˆε dj/σu,M AD)−σˆe,RH2 3(n−p) tr{ZT(I n−X(XTX)−1XT)Z} . (26)
Remark 2 The function h(x) = σxϕ(x/σx) is scale invariant, where σx is a scale such
that σcx =cσx, c >0. If we consider σx=MAD(x), let us verify that
MAD(cx) =cMAD(x), c >0.
By definition MAD(x) = 1.4826median(|x−median(x)|)
MAD(cx) = 1.4826median(|(cx)−median(cx)|) = 1.4826median(|c|(x−median(x))|) = |c|[1.4826median(|x−median(x)|)] = |c|MAD(x).
Since σc x=c σx, we have that
h(c x) =cσxψ c x cσx =cσxψ x σx =h(x).
Remark 3 RH3 estimators of σ2e and σ2u are scale invariant.
Consider the estimator ˆσ2
e,RH3 ˆ σ2e,RH3 = σ 2 e,M AD PD d=1 Pnd j=1ϕ 2(ˆe dj/σe,M AD) n−(p+D) = s PD d=1 Pnd j=1h2( ˆedj) n−(p+D) ,
where h(·) is scale invariant. Therefore, ˆσe,RH3 is scale invariant.
Let m=tr{ZT(In−X(XTX)−1XT)Z}. The estimator ˆσ2u,RH3 is given by ˆ σu,RH2 3 = 1 m ( σε,M AD2 D X d=1 nd X j=1 ϕ2 ˆ εdj σε,M AD −σˆe,RH2 3(n−p) ) = 1 m ( D X d=1 nd X j=1 h2(ˆεdj)− PD d=1 Pnd j=1h2(ˆedj) n−(p+D) (n−p) ) .
Similarly, since h(·) is scale invariant, ˆσu,RH3 is scale invariant.
5.1
Simulation experiment
This section describes a Monte Carlo simulation study that compares the robust esti-mators of the variance components with the traditional non-robust ones. For this, we generated data coming from D = 10 groups. The group sample sizes nd, d = 1, . . . , D were respectively 20, 20, 30, 30, 40, 40, 50, 50, 60 and 60, with a total sample size of n = 400. We considered p = 4 auxiliary variables, and they were generated from nor-mal distributions with means and standard deviations coming from a real data set from the Australian Agricultural and Grazing Industries Survey. Thus, the values of the four auxiliary variables were generated respectively as X1 ∼ N(3.3,0.6), X2 ∼ N(1.7,1.2), X3 ∼N(1.7,1.6) andX4 ∼N(2.4,2.6). The simulation study is based on L= 500 Monte Carlo replicates. In each iteration, we generated group effects as ud
iid
∼ N(0, σ2
u) with
σ2
u = 0.25. Similarly, we generated errors asedj iid
∼ N(0, σ2
e) with σ2e = 0.25. Then we gen-erated the model responsesydj,j = 1, . . . , nd,d= 1, . . . , D, from model (1). Observe that in principle there is no contamination. Finally, we introduced contamination according to three different scenarios:
A. No contamination.
B. Groups with a mean shift: A subset Dc ⊆ {1,2, . . . , D} of groups was selected for contamination. For each selected group d ∈ Dc, half of the observations were replaced bycd1 = ¯yd+k sY,dand the other half bycd2 = ¯yd−k sY,d withk= 5, where ¯
ydand sY,d are respectively the mean and the standard deviation of the outcome for the clean data in d-th group. This increases the between group variabilityσ2
u. C. Groups with high variability: A small percentage of contaminated observations was
Table 1: Theoretical valuesσ2
u =σe2 = 0.25. Scenario 0: No contamination.
Method Estimators Bias MSE
×102 ˆ σ2 u σˆe2 σˆu2 σˆe2 σˆu2 σˆe2 H3 0,24 0,25 -0,0081 0,0014 1,43 0,03 ML 0,22 0,25 -0,0298 -0,0011 1,16 0,03 REML 0,25 0,25 -0,0046 0,0014 1,32 0,03 MADH3 0,25 0,25 0,0041 0,0018 2,33 0,09 TH3 0,23 0,25 -0,0189 -0,0019 1,04 0,04 RH3 0,24 0,23 -0,0136 -0,0179 1,25 0,06
increases the within group variability σ2e.
Then, we calculated the traditional estimators H3, ML and REML, and the proposed robust estimators, MADH3, TH3 and RH3. After the L = 500 replicates, we computed the empirical bias and mean squared error (MSE) of the estimators.
Table 1 reports the resulting empirical bias and percent MSE of each estimator under Scenario A, without contamination. Observe in that table that in absence of outlying observations, the traditional non-robust estimators, H3, ML and REML, provide the minimum MSE, but the robust alternatives TH3 and RH3 are not too far away from them. However, under Scenario B with full groups contaminated with a mean shift (Tables 2 and 3), the estimators ML, REML and H3 of σ2
u increase considerably their MSE. The estimator TH3 achieves the minimum MSE, followed by RH3. Under Scenario C with contamination introduced to make the within cluster variability increase (Tables 4 and 5), now the estimators ML, REML and H3 ofσe2 increase considerably their MSE whereas the robust estimators resist quite well.
5.2
Discussion
This part introduces three robust versions of H3 estimators called MADH3, TH3 and RH3 estimators. These estimators are obtained by first, fitting in a robust way the two submodels (10) and (13) and, then, replacing the means of squared residuals in H3 estimators by other robust functions of the residuals coming from those robust fittings.
Table 2: Theoretical valuesσ2
u =σe2 = 0.25. Scenario B: One outlying group.
Method Estimators Bias MSE
×102 ˆ σu2 σˆe2 σˆu2 σˆe2 σˆu2 σˆe2 H3 1,28 0,24 1,0286 -0,0095 123,73 0,04 ML 1,15 0,24 0,9000 -0,0120 123,27 0,04 REML 1,28 0,24 1,0285 -0,0096 123,38 0,04 MADH3 0,44 0,23 0,1884 -0,0169 7,84 0,10 TH3 0,24 0,24 -0,0089 -0,0142 1,25 0,05 RH3 0,46 0,22 0,2106 -0,0277 6,04 0,10
Table 3: Theoretical values σ2
u =σe2 = 0.25. Scenario B: Two outlying groups.
Method Estimators Bias MSE
×102 ˆ σu2 σˆe2 σˆu2 σˆe2 σˆu2 σˆe2 H3 2,79 0,23 2,5375 -0,0242 715,98 0,08 ML 2,13 0,22 1,8807 -0,0266 495,49 0,10 REML 2,37 0,23 2,1179 -0,0242 500,14 0,08 MADH3 1,10 0,21 0,8529 -0,0437 91,67 0,25 TH3 0,27 0,22 0,0227 -0,0319 2,13 0,13 RH3 0,76 0,21 0,5088 -0,0412 31,52 0,19
Table 4: Theoretical values σ2
u = σ2e = 0.25. Scenario C: 10% of atypical observations shared among groups.
Method Estimators Bias MSE
×102 ˆ σ2 u σˆe2 σˆu2 σˆe2 σˆu2 σˆe2 H3 0,23 0,60 -0,0175 0,3512 1,47 12,58 ML 0,21 0,60 -0,0397 0,3450 1,23 12,15 REML 0,24 0,60 -0,0144 0,3512 1,35 12,58 MADH3 0,28 0,27 0,0253 0,0198 2,78 0,14 TH3 0,24 0,25 -0,0073 -0,0012 1,17 0,04 RH3 0,22 0,30 -0,0266 0,0487 1,22 0,26
Table 5: Theoretical values σ2u = σ2e = 0.25. Scenario C: 20% of atypical observations shared among groups
Method Estimators Bias MSE
×102 ˆ σ2 u σˆe2 σˆu2 σˆe2 σˆu2 σˆe2 H3 0,22 0,93 -0,0268 0,6814 1,50 47,19 ML 0,20 0,92 -0,0489 0,6719 1,32 45,89 REML 0,23 0,93 -0,0236 0,6814 1,39 47,19 MADH3 0,30 0,29 0,0473 0,0406 3,48 0,29 TH3 0,25 0,25 0,0045 0,0003 1,27 0,04 RH3 0,21 0,37 -0,0400 0,1151 1,18 1,35
In simulations we have analyzed the robustness of our proposed estimators under two different contamination scenarios: when the between groups variability is increased by including a mean shift in some of the groups, and when the within group variability is increased by introducing given percentages of outliers within the groups. The new robust estimator RH3 achieves great efficiency under both types of contamination and at the same time preserves good efficiency when there is not contamination.
6
Robust estimation of regression coefficients
This section deals with robust estimation of regression coefficients using the estimators of variance components introduced above. These estimators are then used to derive robust predictors of the means in small areas.
6.1
Small area estimators
Small area estimation is usually done under the setup of finite population. Thus, we have a population U of size N that is assumed to be partitioned into D subpopulations U1, . . . , UD of sizesN1, . . . , ND called small areas. Particular quantities of interest are the means of the small areas,
Yd = 1 Nd Nd X j=1 ydj, d= 1. . . , D
A sample sd of sizend is drawn fromUd,d = 1, . . . , D. We assume that the model holds for all population units, that is, for units in the sample and out of the sample. Under this setup, the target area means are random. Therefore, is it common to say predicting Yd rather than estimating Yd. The mean of small aread can be split into two terms, one for the sample elements an the other for the out-of-sample elements, obtaining a linear combination of the sample mean ysd and the out-of-sample mean ysc
d. Yd= 1 Nd X j∈sd ydj+ X j∈sc d ydj = nd Nd ys d + nd Nd ysc d , d= 1. . . , D
When studying outliers in finite population inference, the existing literature is developed exclusively under one of the following assumptions:
Assumption 1. Non representative outliers: We assume that atypical observations appear only in the sample but not in the non-sample part of the population. Then, it seems natural to project the working model into the entire non-sampled part of the popu-lation. Chambers [27] call these type of outliers non-representative outliers. In this case, the appropriate methods for estimating model parameters are called Robust Projective, meaning that they project sample non-outlier behavior on to the non-sampled part of the population.
Assumption 2. Representative outliers We assume that atypical observations ap-pear in the sample and non-sample part of the population. In this case, robust projective methods will provide biased estimators of the small area means; therefore, it is necessary to correct for this bias using an appropriate correction factor.
Next section introduces two robust projective methods given in the literature, Fellner’s approach and Sinha and Rao’s procedure.
6.2
Previous robust procedures
6.2.1 Fellner’s approach
Fellner [4] derived robust estimators of variance components and regression coefficients β, together with a robust predictor of u, which could in turn be used to derive a robust EBLUP. The joint probability density function of y is given by
f(β, θ|y) = (2π)−n/2|V|−1/2exp −1 2(y−Xβ) TV−1(y−Xβ) . (27)
Similarly, the joint density function of u= (u1, . . . , uD)T is
g(u;σ2u) = (2πσ2u)−D/2exp{−uTu/2}.
Assuming θ known, the BLUE of β and the BLUP of u can be obtained simultaneously by maximizing the joint loglikelihood of y and u, lnf(β, θ|y,u) = lnf(θ|y) + lng(u), with respect to β and u. The resulting system of normal equations is given by
XTX/σe2 XTZ/σ2e ZTX/σ2 e I/σu2+ZTZ/σe2 β u = XTy/σ2e ZTy/σ2 e+ (I/σu2)0D
Fellner’s method is based in the idea of replacing in these equations, observations ydi and random effects udthat are far from their predicted values ˆydi=xdjTβˆ+ ˆud and ˆud by what
he called pseudo-observations. More explicitly, Fellner’s method solves the system XTX/σe2 XTZ/σe2 ZTX/σ2 e I/σ2u+ZTZ/σ2e β u = XTy∗/σe2 ZTy∗/σ2 e + (I/σ2u)0 ∗ D , (28)
where y∗ = (y∗di, i = 1, . . . , nd, d = 1, . . . , D) with ydi∗ = xTdjβˆ + ˆud +σeψ(ˆedj/σe) and
0∗D = (ˆud−σuφ(ˆud/σu);d= 1, . . . , D) andψ es an odd, monotonic and bounded function such as Huber’s psi function.
Equations (28) assume that variance components are known, but Fellner [4] also gave REML equations for variance components which, solved jointly with (28), yield also a robust estimator of β together with a robust predictor of u. For this, he proposes to robustify REML equations in the form
ˆ σu2 = {h(D−v∗)}−1σˆ u D X d=1 ψ2(ˆud/σˆu), ˆ σe2 = {h(n−p−D+v∗)}−1σˆe D X d=1 ψ2(ˆedj/ˆσe),
where h is an appropriately chosen constant to adjust for the bias in ˆσu2 and ˆσ2e at the normal distribution. This leads to h=E{ψ2(X)}, where X ∼N(0,1).
6.2.2 REBLUP estimators
Sinha and Rao [28] proposed a two-step procedure for constructing robust estimators of model parameters. The steps of the procedure are the following:
• Step 1. The estimators ˆβSR and ˆθSR are obtained simultaneously based on robus-tified ML equations.
• Step 2. The predictor ˆuSR is obtained using the estimators of Step 1.
In Step 1, the ML equations for β and θ are defined by
XTV−1(y−Xβ) = 0, (y−Xβ)TV−1∂V ∂θ` V−1(y−Xβ)−tr V−1∂V ∂θ` = 0, `= 1,2,
If some fitted values ˆydj =xTdjβˆ are unusually different from the corresponding observed values ydj, then we have the indication of apparent outliers in the data. To handle outliers in the response values, they proposed robustified ML equations in the form
XTV−1U12Ψ(r) = 0, Ψ(r)TU12V−1∂V ∂θ` V−1U12Ψ(r)−tr KV−1∂V ∂θ` = 0, ` = 1,2, where r = U−12(y − Xβ), U = diag(V), K = E{ψ2 b(X)}In with X ∼ N(0,1), Ψ(u) = (ψb(u1), ψb(u2), . . .)T with ψb(u) = u·min(1,|u|b ) and b = 1.345.
The complete algorithm for robust estimation of β and θ is:
1. Choose starting values β(0) and θ(0). Set m= 0.
2. (a) Calculate β(m+1). (b) Calculate θ(m+1). (c) Set m=m+ 1.
3. Repeat until convergence is achieved. Denote the estimates at convergence as ˆβSR and ˆθSR.
In Step 2, the predictor ˆuSR is obtained using the estimators of β and θ obtained in Step 1 and solving the following robustified equation
ˆ
σeZTΨ{(y−Xβ−Zu)/σˆe} −σˆuΨ(u/σˆu) = 0
Sinha and Rao [28] proposed to solve this equation using the Newton-Raphson method. Finally, the Robust EBLUPs (REBLUPs) of the small area means are given by
ˆ Y SR d = 1 Nd X j∈sd ydj+ X jsc d ˆ ydjSR , d= 1, . . . , D where ˆySRdj =xTdjβˆSR+ ˆuSRd . Some comments
The Newton-Raphson procedure is a commonly used iterative method for the solution of nonlinear equations. To solve the equation h(t) = 0, at each iteration the function h is linearized in the sense that it is replaced by its Taylor expansion of order one about
the current approximation. Let us denote by tm the m-th approximation. Then the next value is the solution of
h(tm) +h0(tm)(tm+1−tm) = 0 that is,
tm+1 =tm− h(t
m)
h0(tm)
If the procedure converges, the convergence is very fast; but it is not guaranteed to converge. If h0 is not bounded away from zero, the denominator may become very small, making the sequence tm unstable unless the initial value t0 is very near to the solution (Maronna et al., [29]).
6.3
Procedure using RH3
We propose a two-step procedure that provides robust estimators of model parameters based on the robust estimators of variance components given in (5).
• Step 1. Obtain the estimator ˆθRH3 using the robustified version of Henderson Method III given in (25) and (26).
• Step 2. Obtain the estimator ˆβRH3 and the predictor ˆuRH3 similarly as in Sinha and Rao [28], solving the robustified normal equations (28).
Then, the new robust EBLUPs, called here RH3-EBLUPs of the small area means are given by ˆ Y RH3 d = 1 Nd X j∈sd ydj+ X jsc d ˆ ydjRH3 , d= 1, . . . , D where ˆyRH3 dj =xTdjβˆ RH3 + ˆuRH3 d .
6.4
Simulation experiment
In this simulation study we generated data coming from D= 30 groups. Concerning the group sample sizes, half of them were taken of size nd = 10 and the other half of size
nd= 20, with a total sample size ofn= 450. We considered p= 4 auxiliary variables, and they were generated from normal distributions with means and standard deviations com-ing from a real data set from the Australian Agricultural and Grazcom-ing Industries Survey.
More concretely, the values of the four auxiliary variables were generated respectively as X1 ∼N(3.31,0.68), X2 ∼N(1.74,1.23), X3 ∼N(1.70,1.65) andX4 ∼N(2.41,2.61).
The number of Monte Carlo samples was L = 200. In each replicate, group effects were generated as ud
iid
∼ N(0, σu2) with σ2u = 1. Similarly, individual errors were generated as edj
iid
∼ N(0, σ2
e) with σ2e = 1. Finally, model responses ydj, j = 1, . . . , nd, d = 1, . . . , D, were generated from model (1). Using each Monte Carlo sample, the two models (10) and (13) were fitted robustly using respectively the M-S estimator of Maronna and Yohai [31] and the PSC method of Pe˜na and Yohai [30]. We assume that outliers are representative and use the correction factor proposed by Joingo et al. [?]. Firstly, data are generated without contamination. After that, contamination is introduced according to the following scenarios:
• Type 0. No contamination
• Type 1. Outlying areas: For each selected outlying domain, we substitute all their sample observations ydj by the constant C1 = ¯Yd+c·
r PNd j=1(ydj−Y¯d)2 Nd , where c= 4 and ¯Yd= N1 d PNd j=1ydj .
• Type 2. Outlying individuals within areas: We replace some observations within selected domains by C1 and some others by C2 = ¯Yd−c·
r
PNd
j=1(ydj−Y¯d)2
Nd .
To compare several predictors of the prediction of the small area means, we use the fol-lowing measures averaged over areas
Average Absolute Relative Bias (ARB):
ARB = 1 D D X d=1 1 L L X t=1 ˆ ¯ Yd−Y¯d ¯ Yd !
Average Relative Root MSE (RRM SE):
RRM SE = 1 D D X d=1 M SE( ˆY¯d) 1 2 ¯ Yd
Method
Bias
MSE
σ
u2σ
e2σ
u2σ
e2ML
-0,044
0,070
0,160
0,125
RML
-0,125
0,141
0,247
0,195
RH3
-0,174
0,075
0,279
0,142
Table 6: Scenario Type 0: No contamination
Parameter ML REML RH3
Bias MSE Bias MSE Bias MSE β0 -0,037 0,264 -0,033 0,312 -0,034 0,321 β1 0,316 0,014 0,314 0,015 0,312 0,014 β2 0,001 0,012 0,001 0,013 0,003 0,013 β3 -0,007 0,004 -0,006 0,005 -0,008 0,005
Table 7: Scenario Type 0: No contamination
6.5
Discussion
In this part we compare two ways to estimate regression coefficients in the linear with random effects. Then, these estimators were used to derive robust predictors of the means in small areas. Our simulation studies show that the new robust procedure RH3 gets the best results in the case of outlying areas at the same time good efficiency when there is not contamination.
Method ARB RRM SE
EBLUP 0,3667 0,3825
REBLUP 0,4015 0,5056
RH3-EBLUP 0,3843 0,4884
Method
Bias
MSE
σ
2 uσ
e2σ
u2σ
e2ML
2,346
-0,022
6,248
0,119
RML
0,838
0,335
1,430
0,362
RH3
0,437
-0,167
0,586
0,227
Table 9: Scenario Type 1: One outlying domain.
Parameter ML REML RH3
Bias MSE Bias MSE Bias MSE β0 0,250 0,319 0,092 0,308 0,087 0,306 β1 0,318 0,015 0,324 0,016 0,324 0,016 β2 -0,013 0,012 -0,005 0,013 -0,006 0,013 β3 -0,003 0,004 -0,007 0,005 -0,008 0,005
Table 10: Scenario Type 1: One outlying domain.
Method ARB RRM SE
EBLUP 0,4161 0,5301
REBLUP 0,4192 0,5251
RH3-EBLUP 0,4193 0,5248
Table 11: Scenario Type 1: One outlying domain.
Method Bias MSE
σu2 σe2 σu2 σe2
ML 5,027 -0,267 26,706 0,186
RML 3,205 0,478 15,848 0,541
RH3 2,386 -0,319 6,076 0,277
Parameter ML REML RH3 Bias MSE Bias MSE Bias MSE β0 0,637 0,688 0,336 0,453 0,307 0,459 β1 0,304 0,015 0,317 0,018 0,318 0,018 β2 -0,016 0,013 -0,009 0,014 -0,008 0,015 β3 -0,009 0,004 -0,012 0,005 -0,010 0,005
Table 13: Scenario Type 1: Two outlying domains.
Method ARB RRM SE
EBLUP 0,4162 0,6296
REBLUP 0,4316 0,5652
RH3-EBLUP 0,4338 0,5502
Table 14: Scenario Type 1: Two outlying domains.
Method
Bias
MSE
σ
2u
σ
e2σ
u2σ
e2ML
-0,095
0,959
0,173
1,046
RML
-0,159
0,363
0,296
0,349
RH3
-0,216
0,364
0,293
0,322
Table 15: Scenario Type 2: 10% outlying observations within groups.
Parameter ML REML RH3
Bias MSE Bias MSE Bias MSE β0 -0,014 0,342 -0,028 0,337 -0,018 0,323 β1 0,316 0,016 0,315 0,016 0,314 0,015 β2 0,001 0,009 0,002 0,014 0,004 0,009 β3 -0,006 0,005 -0,006 0,005 -0,008 0,005 Table 16: Scenario Type 2: 10% outlying observations within groups.
Method ARB RRM SE
EBLUP 0,4417 0,5286
REBLUP 0,3963 0,5002
RH3-EBLUP 0,3849 0,4881
Table 17: Scenario Type 2: 10% outlying observations within groups.
Method
Bias
MSE
σ
u2σ
e2σ
u2σ
e2ML
-0,180
1,912
0,184
3,783
RML
-0,214
0,604
0,293
0,567
RH3
-0,232
0,575
0,286
0,554
Table 18: Scenario Type 2: 20% outlying observations within groups.
Parameter ML REML RH3
Bias MSE Bias MSE Bias MSE β0 0,005 0,367 -0,028 0,352 -0,018 0,353 β1 0,306 0,020 0,316 0,018 0,314 0,017 β2 -0,006 0,015 -0,002 0,015 -0,001 0,015 β3 -0,007 0,006 -0,008 0,005 -0,009 0,005 Table 19: Scenario Type 2: 20% outlying observations within groups.
Method ARB RRM SE
EBLUP 0,4265 0,5440
REBLUP 0,3895 0,4920
RH3-EBLUP 0,3825 0,4845
References
[1] Banerjee M, Frees EW (1997) Influence diagnostics for linear longitudinal models.
Journal of the American Statistical Association 92:999–1005.
[2] Christensen R, Pearson LM, Johnson W (1992) Case-deletion diagnostics for mixed models. Technometrics 34:38–45.
[3] Demidenko E (2004) Mixed models. Theory and applications. Wiley.
[4] Fellner WH (1986) Robust estimation of variance components.Technometrics28:51– 60.
[5] Galpin JS, Zewotir T (2005) Influence diagnostics for linear mixed models. Journal of Data Science 3:153–177.
[6] Galpin JS, Zewotir T (2007) A unified approach on residuals, leverages and outliers in the linear mixed models. Test 16:58–75.
[7] Henderson CR (1953) Estimation of Variance and Covariance Components. Biomet-rics 9:226–252.
[8] Henderson CR (1975) Best linear unbiased estimation and prediction under a selec-tion model. Biometrics 31:423–447.
[9] Huggins, RM (1993). A robust approach to the analysis of repeated measures. Bio-metrics 49:715–720.
[10] Jiang J (2007) Linear and generalized linear models and their applications. Springer Series in Statistics.
[11] Laird NM, Ware JH (1982) Random-effects models for longitudinal data. Biometrics
38:963–974.
[12] Maronna R, Martin D, Yohai V (2006) Robust statistics. Theory and methods. Wiley.
[13] Maronna, RA and Yohai, VJ (2000) Robust regression with both continuous and categorical predictors. Journal of Statistical Planning and Inference 89:197–214.
[15] Patterson HD, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58:545–554.
[16] Pe˜na, D and Yohai, VJ (1999) A fast procedure for outlier diagnostics in large re-gression problems. Journal of the American Statistical Association 94:434–445.
[17] Richardson, AM (1995). Some problems in estimation in mixed linear models. PhD thesis. Australian National University, Canberra.
[18] Richardson AM (1997) Bounded influence estimation in the mixed linear model.
Journal of the American Statistical Association 92:154–161.
[19] Richardson AM, Welsh AH (1995) Robust restricted maximum likelihood in mixed linear models. Biometrics 51:1429–1439.
[20] Royall RM (1976) The linear least squared prediction approach to two-stage sam-pling. Journal of the American Statistical Association 71:657–664.
[21] Searle SR, Casella G, McCulloch CE (1992) Variance components. Wiley series in probability and mathematical statistics.
[22] Vangeneugden T, Laenen A, Geys H, Renard D, Molenberghs G (2004) Applying linear mixed models to estimate reliability in clinical trial data with repeated mea-surements. Controlled Clinical Trials 25:13–30.
[23] Verbeke G, Molenberghs G (2009) Linear mixed models for longitudinal data. Springer.
[24] Wellenius GA, Yeh GY, Coull BA, Suh HH, Phillips RS, Mittlemann MA (2007) Ef-fects of ambient air pollution on functional status in patients with chronic congestive heart failure: a repeated-measures study. Environmental Health 6–26.
[25] Jennrich, RI, Schluchter, MD (1986) Unbalanced repeated-measures models with structured covariance matrices. Biometrics 80-5-820
[26] Lindstrom MJ, Bates DM (1988) Newton-Rhapson and EM Algorithm for linear mixed-effects model for repeated-measures. Journal of the American Statistical As-sociation 83-1014-1022
[27] Chambers RL (1986) Outlier robust finite population estimation. Journal of the American Statistical Association 81-1063-1069
[28] Sinha SK, Rao JNK (2009) Robust small area estimation. The Canadian Journal of Statistics 37-381-399
[29] Maronna R, Martin D, Yohai V (2006) Robust statistics. Theory and methods.Wiley
[30] Pe˜na D, Yohai VJ (1999) A fast procedure for outlier diagnostics in large regression problems. Journal of the American Statistical Association. Theory and Methods 94-434-445
[31] Maronna, RA and Yohai, VJ (2000) Robust regression with both continuous and categorical predictors. Journal of Statistical Planning and Inference 89-197-214
[32] Henderson CR (1975) Best linear unbiased estimation and prediction under a selec-tion model. Biometrics 31-423-447