• No results found

Small Area Estimation of a Single

Classifications

For multi-category response variables such as labour force status (unemployed, em-

ployed and inactive), linear models are inappropriate (Saei and Taylor, 2012). SAEs

for three category labour force status (unemployed, employed and inactive) have

been developed using multinomial logit mixed models. See Molina et al. (2007),

Saei and Taylor (2012) and L´opez-Vizca´ıno et al. (2013).

Molina et al. (2007) describe a SAE methodology based on a multinomial logit

mixed model with constant random effects for estimating three category labor force

status counts by area. Let yd1i, yd2i and yd3i denote the number of sampled unem-

ployed, employed and inactive people respectively in sex/age groupi(i= 1,2, . . . , Id)

in dth small area. Let mdi be the sample size for area d and pd1i, pd2i, pd3i be the

respective probabilities of unemployed, employed and inactive people. Let ud de-

note the random effects of area d. Let the vectors (yd1i, yd2i, yd3i) given ud and mdi

are independent across d and i with multinomial distribution, then the probability

function may be written as:

f(yd1i, yd2i|ud) = mdi! yd1i!yd2i!yd3i! pyd1i d1i p yd2i d2i p yd3i d3i (2.48)

(Molina et al., 2007) where yd3i = mdi −yd1i −yd2i is the reference category. It is

and the random area effects through the logit link as, log pdri pd3i =Zd0βr+ud (2.49)

(Molina et al., 2007) where Zd is the vector of auxiliary variables, βr is the vector

of regression coefficient for response levelrandudare identically and independently

distributed area specific random effects with mean 0 and variance σ2

v.

The model parameters in Molina et al.(2007) are estimated by penalized quasi-

likelihood with ML or REML method for the estimation of variance components.

The estimated parameters are then used to estimate labour force counts for each

small area. The MSE of the estimated counts were estimated using both a parametric

bootstrap method and an analytical approximation. The latter followed Prasad

and Rao (1990) analytical approximation (2.25) with a Taylor series expansion for

linearizing the fixed and random effect estimates. In a simulation study, Molinaet al.

(2007) compared both MSE estimates and recommended the bootstrap approach.

Fabrizi et al. (2011) focused on model-based small area methods for calculating

estimates of poverty rates based on different thresholds for subsets of the Italian

population. The subsets are obtained by cross-classifying by household type and

administrative region. They proposed an area level multivariate logistic-normal

model (2.42) to obtain a hierarchical two-stage model which is different from Molina

et al.(2007). They advised multivariate models in their application as three poverty

rates for each area are correlated. A hierarchical Bayesian approach is adopted to

estimate the parameters. Posterior distributions are approximated by means of

the ‘European Union-Statistics on Income and Living Conditions’ survey (EUSILC

2nd wave, year 2005). The obtained results of Fabrizi et al. (2011) compared the

incidence of poverty by household type in the different Italian administrative regions.

Saei and Taylor (2012) use the multinomial logit mixed model (2.49) for esti-

mating totals of unemployed, employed and inactive people at the Local Authority

District (LAD) level in Britain with category-specific area random effects. They

assumed that the category-specific random effects follow bivariate normal distribu-

tions. Their proposed model is compared with the constant random effects model

of Molinaet al. (2007) in a simulation and an empirical study. They point out that

the MSEs of SAEs based on their proposed model are smaller than those based on

model with constant random effects.

L´opez-Vizca´ıno et al. (2013) also examined the multinomial logit mixed model

(2.49) through two simulation studies and an empirical study based on data from

the Spanish Labour Force Survey (GLFS) in Galicia. In the first simulation study,

the behaviour of three models were compared: the multinomial logit mixed model

with category specific random effects (Model A), the multinomial model of Molina

et al.(2007) (Model B) and an independent binomial logit mixed model (Model C).

For each model, they consider multinomial logit mixed model with two response

categories and one reference category. Two auxiliary variables are generated from

bivariate normal distributions with means µ1 = µ2 = 1, variances σ1 =σ2 = 1 and

correlation ρ= 0.75.

Under Model A, the random effects are generated from multivariate normal with

Σv22 = 2. The response variable is then generated as yd|ud∼Multinomial(md, pd1, pd2), d = 1,2, . . . , D (2.50) where pdr = exp(ηdr) 1 +Pq−1 r=1exp(ηdr) d= 1,2, . . . , D, r = 1,2 (2.51) and ηdr =βr0+Zdrβr1+udr, d= 1,2, . . . , D, r = 1,2 (2.52)

(L´opez-Vizca´ıno et al., 2013). The regression coefficients are set to β10= 1.3, β20=

−1.2, β11 =−1.3, and β21= 1. For Model B, the data generation is similar but the

random effect is generated from the normal distribution with mean 0 and variance

σv2. For Model C, the data generated as in Model A but the response variables are

generated from independent binomial distribution as

ydr|udr ∼Binomial(md, pdr), d= 1,2, . . . , D, r = 1,2 (2.53)

(L´opez-Vizca´ıno et al., 2013) where pd3 = 1−pd1 −pd2. The simulation process is

repeated 1000 times for each of A, B, and C. Small area estimates based on Model

A show better performance (lower values of RMSE) than those based on Models B

and C when the data is generated under Model A. Small area estimates based on

Model A behave better than small area estimates based on Models B and C even

when the data is generated under Models B and C.

Another simulation study in L´opez-Vizca´ıno et al. (2013) investigates the be-

estimators give better results under all three models. In an empirical study, GLFS

data is used to examine the performance of small area estimates based on Models

A, B, and C. The results of empirical study show that small area estimates from

Model A have lower RMSE than than those from the other two models.

The multinomial logit mixed model (2.49) has to date only been used to estimate

the three category variable labour force status. In Chapter 5 and 6 of this thesis,

methods for estimating two-way and higher order cross-classified counts will be

developed using models related to (2.49).

Related documents