Classifications
For multi-category response variables such as labour force status (unemployed, em-
ployed and inactive), linear models are inappropriate (Saei and Taylor, 2012). SAEs
for three category labour force status (unemployed, employed and inactive) have
been developed using multinomial logit mixed models. See Molina et al. (2007),
Saei and Taylor (2012) and L´opez-Vizca´ıno et al. (2013).
Molina et al. (2007) describe a SAE methodology based on a multinomial logit
mixed model with constant random effects for estimating three category labor force
status counts by area. Let yd1i, yd2i and yd3i denote the number of sampled unem-
ployed, employed and inactive people respectively in sex/age groupi(i= 1,2, . . . , Id)
in dth small area. Let mdi be the sample size for area d and pd1i, pd2i, pd3i be the
respective probabilities of unemployed, employed and inactive people. Let ud de-
note the random effects of area d. Let the vectors (yd1i, yd2i, yd3i) given ud and mdi
are independent across d and i with multinomial distribution, then the probability
function may be written as:
f(yd1i, yd2i|ud) = mdi! yd1i!yd2i!yd3i! pyd1i d1i p yd2i d2i p yd3i d3i (2.48)
(Molina et al., 2007) where yd3i = mdi −yd1i −yd2i is the reference category. It is
and the random area effects through the logit link as, log pdri pd3i =Zd0βr+ud (2.49)
(Molina et al., 2007) where Zd is the vector of auxiliary variables, βr is the vector
of regression coefficient for response levelrandudare identically and independently
distributed area specific random effects with mean 0 and variance σ2
v.
The model parameters in Molina et al.(2007) are estimated by penalized quasi-
likelihood with ML or REML method for the estimation of variance components.
The estimated parameters are then used to estimate labour force counts for each
small area. The MSE of the estimated counts were estimated using both a parametric
bootstrap method and an analytical approximation. The latter followed Prasad
and Rao (1990) analytical approximation (2.25) with a Taylor series expansion for
linearizing the fixed and random effect estimates. In a simulation study, Molinaet al.
(2007) compared both MSE estimates and recommended the bootstrap approach.
Fabrizi et al. (2011) focused on model-based small area methods for calculating
estimates of poverty rates based on different thresholds for subsets of the Italian
population. The subsets are obtained by cross-classifying by household type and
administrative region. They proposed an area level multivariate logistic-normal
model (2.42) to obtain a hierarchical two-stage model which is different from Molina
et al.(2007). They advised multivariate models in their application as three poverty
rates for each area are correlated. A hierarchical Bayesian approach is adopted to
estimate the parameters. Posterior distributions are approximated by means of
the ‘European Union-Statistics on Income and Living Conditions’ survey (EUSILC
2nd wave, year 2005). The obtained results of Fabrizi et al. (2011) compared the
incidence of poverty by household type in the different Italian administrative regions.
Saei and Taylor (2012) use the multinomial logit mixed model (2.49) for esti-
mating totals of unemployed, employed and inactive people at the Local Authority
District (LAD) level in Britain with category-specific area random effects. They
assumed that the category-specific random effects follow bivariate normal distribu-
tions. Their proposed model is compared with the constant random effects model
of Molinaet al. (2007) in a simulation and an empirical study. They point out that
the MSEs of SAEs based on their proposed model are smaller than those based on
model with constant random effects.
L´opez-Vizca´ıno et al. (2013) also examined the multinomial logit mixed model
(2.49) through two simulation studies and an empirical study based on data from
the Spanish Labour Force Survey (GLFS) in Galicia. In the first simulation study,
the behaviour of three models were compared: the multinomial logit mixed model
with category specific random effects (Model A), the multinomial model of Molina
et al.(2007) (Model B) and an independent binomial logit mixed model (Model C).
For each model, they consider multinomial logit mixed model with two response
categories and one reference category. Two auxiliary variables are generated from
bivariate normal distributions with means µ1 = µ2 = 1, variances σ1 =σ2 = 1 and
correlation ρ= 0.75.
Under Model A, the random effects are generated from multivariate normal with
Σv22 = 2. The response variable is then generated as yd|ud∼Multinomial(md, pd1, pd2), d = 1,2, . . . , D (2.50) where pdr = exp(ηdr) 1 +Pq−1 r=1exp(ηdr) d= 1,2, . . . , D, r = 1,2 (2.51) and ηdr =βr0+Zdrβr1+udr, d= 1,2, . . . , D, r = 1,2 (2.52)
(L´opez-Vizca´ıno et al., 2013). The regression coefficients are set to β10= 1.3, β20=
−1.2, β11 =−1.3, and β21= 1. For Model B, the data generation is similar but the
random effect is generated from the normal distribution with mean 0 and variance
σv2. For Model C, the data generated as in Model A but the response variables are
generated from independent binomial distribution as
ydr|udr ∼Binomial(md, pdr), d= 1,2, . . . , D, r = 1,2 (2.53)
(L´opez-Vizca´ıno et al., 2013) where pd3 = 1−pd1 −pd2. The simulation process is
repeated 1000 times for each of A, B, and C. Small area estimates based on Model
A show better performance (lower values of RMSE) than those based on Models B
and C when the data is generated under Model A. Small area estimates based on
Model A behave better than small area estimates based on Models B and C even
when the data is generated under Models B and C.
Another simulation study in L´opez-Vizca´ıno et al. (2013) investigates the be-
estimators give better results under all three models. In an empirical study, GLFS
data is used to examine the performance of small area estimates based on Models
A, B, and C. The results of empirical study show that small area estimates from
Model A have lower RMSE than than those from the other two models.
The multinomial logit mixed model (2.49) has to date only been used to estimate
the three category variable labour force status. In Chapter 5 and 6 of this thesis,
methods for estimating two-way and higher order cross-classified counts will be
developed using models related to (2.49).