Simulation experiment - Small area estimation methods under complex sampling designs

This section compares the performance of direct calibration and small area estimation methods when the sample is drawn by cut-off sampling. Specifically, we will compare the two calibration estimators proposed in Section4.3.1, LCAL and LCALN, the na¨ıve HT direct estimator (4.11) ignoring the cut-off sampling and the EBLUP of the domain meanY¯i.

Calibration estimators preserve good properties under the design even when the model does not hold. Since under the model, the EBLUP of a linear parameter is known to be approximately the most efficient linear and unbiased estimator, here we want to compare its design-based properties with those of the calibration estimators. For this reason, we run design-based simulations by generating one population vector y, keeping it fixed and repeatedly drawing samples from it. The population vector yis generated from the nested error model in (4.38). To allocate the units into the set of included and excluded units, we generate a binary variable cij for each j = 1, . . . , Ui

andi= 1, . . . , m, wherecij = 0ifj ∈ UiE andcij = 1otherwise. In each Monte Carlo

(MC) replicate, we draw a srswor from those units withcij = 1independently for each domaini,i= 1, . . . , m.

The simulations were implemented in the statistical software environment R (R development core team 2016) using the packages sampling (Till´e and Matei, 2016),

nlme(Pinheiro et al.,2017) andsae(Molina and Marhuenda,2015). The first package contains functions for drawing samples and obtaining calibration estimators. Thenmle

package fits Gaussian linear and nonlinear mixed-effects models. The sae package contains functions for small area estimation.

We consider a population ofN = 20,000individuals divided intom= 80domains with the same sizeNi = 250, i= 1, . . . , m. We generate valuesxijq for three auxiliary variables q = 1,2,3, each generated from a N(3,2). The variables cij are generated independently for each j and i from a Bernoulli distribution with probability pij = Pr(cij = 1), which is related to the auxiliary variablesxq,ij through a logit model, that is,

pij =

exp(x0_ijζ)

1 + exp(x0_ijζ), j = 1, . . . , Ni, i= 1, . . . , m.

We chooseζ= (0.75,1,1)0. With these model parameters, the set of included units, that is, those withcij = 1, represent approximately half of the total population.

We generate the values of the target variableyijfrom those of the auxiliary variables

(x1,ij, x2,ij, x3,ij)0, such that the coefficient of determination is approximately 0.5. To achieve that, the vector of regression coefficients is taken asβ = (1,1.5,1)0, the domain effects standard deviation (sd) and error sd are respectively taken as σu = 0.75 and

σe= 4.

We drawL = 1,000samples from those units withcij = 1,j ∈ Ui by independent simple random sampling without replacement (srswor) of sizeni for each domaini=

1, . . . , m, takingni = 5,1 ≤ i ≤ 20, ni = 10,21 ≤ i ≤ 40, ni = 30,41 ≤ i ≤ 60and

With the sample data from the `-th MC replicate, we compute the direct HT estimator in (4.11), the calibration estimators with calibration at the domain level (LCAL) and at the population level (LCALN) and the EBLUP ofY¯i. We do not report results for the calibration after reweighting (RWCAL) and the generalized calibration (GCAL) estimator, because in our simulations they showed instability. To obtain the new weights,hij, in the calibration estimators, we use the functioncalibfrom package sampling (Till´e and Matei,2016). The EBLUP estimators are computed using the REML method for estimation of the model parametersσ2_v,σ2_e andβ.

Let Yˆ¯i be a generic estimator (HT, LCAL, LCALN or EBLUP) of Y¯i and Yˆ¯i(b) its value obtained in MC replicateb. We evaluate the performance of estimators in terms of relative bias (RB) and relative root MSE (RRMSE) under the design, approximated empirically as RBπ( ˆY¯i) = 100 B−1 B X b=1 ( ˆY¯_i(b)−Y¯i) ¯ Yi , RRMSEπ( ˆY¯i) = 100 v u u tB−1 B X b=1 ( ˆY¯_i(b)−Y¯i)2 ¯ Yi .

Averages across domains of absolute RB and of RRMSE are also calculated as

ARB=m−1 m X i=1 |RBπ( ˆY¯i)|, RRMSE=m−1 m X i=1 RRMSEπ( ˆY¯i).

Figure 4.1:Percent RB (left) and RRMSE (right) of HT, LCAL, LCALN and EBLUP estimators of domain mean,Y¯i, for each area.

0 20 40 60 80 −30 −20 −10 0 10 20 30 40 Area Relativ e Bias (%)

HT LCAL LCALN EBLUP

0 20 40 60 80 0 20 40 60 80 100 120 Area Relativ e MSE (%)

HT LCAL LCALN EBLUP

estimators of the mean Y¯i for each area i (x-axis), under srswor within the included elements (cij = 1) in each areai. This figure shows that the HT direct estimator obtained ignoring the cut-off sampling has large design bias and MSE for all the areas. On the other hand, the LCALN estimator shows a large bias for several areas, which may be due to the fact of not taking into account the area effect, since the minimization is done at the national level and the restriction is also established at the national level. LCAL estimator is the best in terms of bias. Note that this estimator fits a different regression parameterβifor each area. Moreover, Figure4.1right shows very large RRMSEs for the two calibration estimators, LCAL and LCALN, for those areas with the smallest sample sizes (ni ≤ 20). See that EBLUP exhibits the best results in terms of MSE and keeps a small design bias. In fact, the difference between the EBLUP and LCAL estimators in terms of bias is small.

Table 4.1:Averages across areas of percent absolute RB and RRMSE and average B2

π/MSEπfor HT, LCAL,

LCALN and EBLUP (in percentage).

Method ARB RRMSE B2_π/MSEπ HT 21.82 24.45 98.32 LCAL 2.96 27.33 2.48 LCALN 8.97 30.44 0.04 EBLUP 3.13 4.56 0.18

Table4.1 displays the ARB, the RRMSE and the squared bias over the MSE under the design (in percentage) for the considered estimators. In this table, again HT exhibits large design-bias and its B2_π/MSEπ ratio is practically 100%, whereas the considered calibration estimators and the EBLUP reduce considerably the bias. Again, EBLUP shows the best performance in terms of efficiency. The LCALN estimator performs the best in terms of ratio B2_π/MSEπ because it has a large MSE, so we consider that LCAL estimator performs better, although EBLUP is clearly performing the best when considering both MSE and bias under the design.

In document Small area estimation methods under complex sampling designs (Page 106-109)