Simulation study with non-informative selection

3.6 Simulation experiments

3.6.1 Simulation study with non-informative selection

We consider the same simulation setup as in Chapter2, where the population contains

N = 20,000units distributed intom= 80domains, withNi = 250units in each domain

i= 1, . . . , m. We consider two dummy auxiliary variables,xq ∈ {0,1},q = 1,2, whose

values are generated asxq,ij ∼ Bern(pqi), q = 1,2, with success probabilities given by

p1i= 0.3 + 0.5i/mandp2i = 0.2,i= 1, . . . , m. The values of the auxiliary variablesxq,ij are kept fixed across simulations. The vector of true regression coefficients is taken as β= (3,0.03,−0.04)0and the domain effects variance and error variance are respectively

σ_v2 = 0.152 andσ2_e = 0.52.

In each MC simulation out of K = 1,000, we construct a population vector y(k), whose elementsY_ij(k) are generated from the nested error model (3.1)-(3.2). Using the population vector y(k), we calculate the true values of the domain parameters F_αi(k),

i = 1, . . . , m. We take the poverty line as z = 12, which is approximately 0.6 times

the median of a population of incomes{Eij;j = 1, . . . , Ni, i= 1, . . . , m}, whereEij =

exp(Yij) with Yij from nested error model generated as mentioned above. For each MC population k = 1, . . . , K, we draw a sample s(k). We use independent Poisson

sampling within each domaini, with inclusion probability for individualjin the sample from domainitaken asπij ∼ Beta(α1, α2). We setα1 = 2.5and selectα2 to achieve a

specified expected domain sample size,n¯i =K−1PKk=1n (k)

i , wheren

(k)

i is the realized sample size in domainiin thek-th MC replicate. We consider three expected domain sample sizes: n¯i = 25,50,75. To achieve approximately those domain sample sizes, we takeα2 = 25,α2 = 10andα2 = 5.5respectively. We consider that the proposed sample

sizes are small enough since we are estimating small proportions.

With the sample data from thek-th Monte Carlo populationy(sk), we compute two direct estimators ofF_αi(k), namely the SM and also the WSM as in (3.19), using as weights

wij = π_ij−1. We also compute EB and pseudo EB estimates ofF_αi(k), forα = 0,1andi=

1, . . . , m, using the population values of the auxiliary variables. For the EB estimator,

we computedσˆ2_v,σˆ2_eandβˆby the REML method. For the pseudo EB estimator, we used the weighted estimatorβˆw given inYou and Rao(2002) and the REML estimators ofσv2 andσ2

e. LetFˆ

(k)

αi be one of the mentioned estimates (SM, WSM, EB or pseudo EB) in MC replicatek. We evaluate the performance of estimators in terms of relative bias (RB) and relative root MSE (RRMSE), under the model and the design, approximated empirically as RBm,π( ˆFαi) = K−1 K X k=1 ˆ F_αi(k)−F_αi(k) K−1 K X k=1 F_αi(k) , RRMSEm,π( ˆFαi) = v u u tK−1 K X k=1 ˆ F_αi(k)−F_αi(k)2 K−1 K X k=1 F_αi(k) .

Averages across domains of absolute RB and of RRMSE are also calculated as

ARBα=m−1 m X i=1 |RBm,π( ˆFαi)|, RRMSEα =m−1 m X i=1 RRMSEm,π( ˆFαi).

Figures 3.1, 3.2 and 3.3 display, respectively for approximate expected domain sample sizes ¯ni = 25,50and 75, percent RB (left) and RRMSE (right) of the estimators of the poverty gap, F1i, for each domain i = 1, . . . , m (x-axis). In these figures, all the estimators display a small RB for the three expected sample sizes, although the WSM appears to be more unstable across domains than the other ones. This estimator also performs the worst in terms of RRMSE, followed by the unweighted SM. Thus, model-based estimators (EB and pseudo EB) appear to be significantly more efficient than the two types of direct estimators (SM and WSM) for all the domains. In this simulation experiment with non-informative sampling, weighted estimators (WSM and pseudo EB) loose efficiency with respect to the respective unweighted ones, but the

efficiency loss of the pseudo EB turns out to be much smaller than the loss of the WSM with respect to the SM. As expected, the gain in efficiency of the model-based estimators compared to the direct estimators decreases as the expected sample size increases, with SMs becoming close to model-based estimators for the largest expected domain sample sizen¯i (Figure 3.3). Conclusions for the poverty incidence, F0i, are similar and hence figures are not shown.

Table 3.1 displays averages of absolute RB and RRMSE across domains for the considered expected domain sample sizes. This table shows an ARB smaller than2%

for all the considered estimators and sample sizes. EB and pseudo EB estimators have considerably smaller RRMSE than direct estimators for small ¯ni and preserve smaller RRMSE even for the largest value of n¯i. Since the sample selection mechanism is in this case non-informative, the RRMSE of pseudo EB estimator turns out to be between 3% and 4% larger than that of EB estimator. This suggests that EB estimators work well under unequal probability sampling as long as the inclusion probabilities do not depend on the outcomes. Nevertheless, in this case pseudo EB estimator does not lose too much.

ni= 25 ¯ni = 50 n¯i = 75

ARB RRMSE ARB RRMSE ARB RRMSE

Method F0i F1i F0i F1i F0i F1i F0i F1i F0i F1i F0i F1i SM 1.34 1.65 46.27 58.69 0.69 0.87 29.03 36.85 0.54 0.66 21.41 27.93 WSM 1.65 1.94 56.46 71.59 0.83 1.12 36.26 45.95 0.68 0.82 26.98 34.34 EB 0.74 0.89 28.21 35.60 0.46 0.60 20.99 26.73 0.40 0.47 17.58 22.29 PEB 0.88 1.04 31.25 39.29 0.54 0.72 24.13 30.43 0.49 0.61 20.07 25.39 Table 3.1: Averages across domains of percent absolute RB and RRMSE for SM, WSM, EB and pseudo EB estimators of poverty incidence,F0i, and poverty gap,F1i, under non-informative selection withn¯i=

25,50,75.

In document Small area estimation methods under complex sampling designs (Page 70-72)