Structure Preserving Estimation (SPREE) is a small area estimation approach that
combines auxiliary information from a previous census with current survey data to
improve the precision of estimators of the cell totals in a contingency table (Purcell
and Kish, 1980). The idea underlying SPREE is that the dependence structure
in the previous census holds in the current time period, while the census marginal
levels may be out of date. SPREE adjusts the design of the table from the previous
census in a way that preserves the interactions from the census and the margins
from the current survey.
SPREE is a generalization of synthetic estimation. Synthetic estimates are ob-
tained by multiplying small area total population estimates by national level esti-
mates of population proportions in each cross-classified cell (Zhang and Chambers,
2004). Zhang and Chambers (2004) proposed a number of log-linear structural
contingency table and SPREE is a special case of this model.
Let {Ydr} be the set of cross-classified counts of interest corresponding to r =
1,2, . . . , qclassification cells ind= 1,2, . . . D areas. Similarly let{Zdr}be the set of
auxiliary cross-classifications which are obtained from the census or administrative
registers. Let the area population be Yd. =
P
rYdr (assumed known) and ˆY.r =
P
dYdr be the estimated national classification based on national survey data. The
within area proportion (θY
dr) is the target quantity which is defined as
θdrY = Ydr
Yd.
Let the within area proportion of auxiliary variable counts be denoted by θZdr,
which is defined as
θdrZ = Zdr
Zd.
where Zd. = PrZdr. The auxiliary classification {Zdr} are iteratively rescaled to
agree with both the known Yd. and the national estimates ˆY.r under SPREE. The
log-linear models which describe the SPREE are
log(Ydr) = λY0 +λ Y d +λ Y r +λ Y dr log(Zdr) = λZ0 +λdZ+λZr +λZdr (2.54)
(Zhang and Chambers, 2004) whereλY
0 andλZ0 are the overall means,λYd andλZd are
the marginal effects of the areas,λY
r andλZr are the marginal effects of the categories,
λYdr and λZdr are the interaction effects of each cross-classified area-category. SPREE
is equivalent to using these log-linear models assuming equal interaction parameters,
that is λY
The Zhang and Chambers (2004) approach requires cross-classified small area
counts based on survey data and the corresponding auxiliary cross-classification
from the previous census or administrative registers. In Chapter 6, log-linear models
will be used to estimate small area cross-classified counts of target variables by area
rather than considering area cross-classification of target variables and auxiliary
variables.
Noble and Arnold (2002) stated that the model supporting SPREE is a special
case of a generalized linear model. The estimators of the cell totals obtained from
SPREE are the maximum likelihood estimators of the expected counts under a
generalized linear model with a Poisson random component and a log link. Main
effects of rows and columns are estimated with the direct estimators. Interactions
are set equal to the interactions in a saturated log-linear model fitted to the census
two-way table. Census counts by the cross-classification of interest are needed. In
this thesis, Chapter 6 will make use of log-linear models with random effects which
will enable small area estimates for cross-classified counts when only survey data is
Theoretical and Numerical
Evaluation of Bivariate
Fay-Herriot Model
The purpose of this chapter is to compare the efficiency of small area estimators
obtained from a bivariate Fay-Herriot (BFH) model with those obtained from a uni-
variate Fay-Herriot (UFH) model. A numerical and a simulation study have been
conducted to examine the performance of these two estimators. The relative effi-
ciency (RE) of BFH and UFH estimators is measured theoretically and numerically.
The relative efficiencies of BFH over UFH estimators are related to the correlation
between dependent variables’ sampling errors and random effects, and other pa-
rameters such as the relative magnitude of the sampling variances and the random
effects variances.
3.1
Introduction
The area level Fay-Herriot model (Fay and Herriot, 1979) is widely used in small
area estimation. The basic area level univariate Fay-Herriot model can be extended
to a multivariate model taking into account the correlation of several target variables
together. The BFH model extends the UFH model for two target variables.
Statisticians are often required to estimate correlated descriptive measures, like
poverty or unemployment indicators. Bivariate models are useful when the aim
is to estimate area level statistics for two correlated variables. Various methods
such as the method of moments, maximum likelihood, and restricted maximum
likelihood can be used to estimate the variance components of BFH model and then
the estimated parameters are used to fit the model.
The purpose of this chapter is to explore the question of whether, and under
what conditions, the BFH model is advantageous compared to the UFH model for
small area estimation. The UFH and BFH models are fitted by setting the parame-
ters. These parameters are used in determining the variance-covariance matrices of
sampling errors and the random effects. Expressions of the REs of BFH estimators
over UFH estimators are derived in general, and for some special cases where the
expression can be simplified. These special cases will give insight into when BFH
should be worthwhile in practice. A numerical study and a simulation study are
also conducted to compare the two approaches.
In the numerical study, a simple approximation of the RE of BFH estimators
is calculated for many different sets of parameter values. Fixing and controlling
the parameter values provides a way of measuring the efficiency of BFH over UFH
SAEs. The results of the numerical study are summarized using an analysis of
variance (ANOVA) approach and a novel regression tree approach.
In the simulation study, the behaviour of the UFH and BFH estimators is in-
the UFH and BFH models are fitted and MSEs of the corresponding empirical best
linear unbiased predictor (EBLUP) are calculated using the estimated parameters
from the simulated data. In addition, another purpose is to look at whether one of
the thousands of cases in the numerical study is similar to the simulation parameter
settings, and whether this particular case in the numerical study gives consistent
results to the simulation.
The chapter is divided into following sections: Section 3.2 outlines the model
structure and the MSE for both the UFH and BFH estimators. The relative perfor-
mance of UFH and BFH estimators is measured by the relative efficiency of these
MSEs and in Section 3.3 some special cases regarding the relative efficiencies are
discussed. Section 3.4 describes the numerical study including an exploratory anal-
ysis of the results as well as a formal analyses of the numerical study results using
analysis of variance and regression trees. Section 3.5 further investigates the perfor-
mance of the UFH and the BFH estimators via a simulation experiment. Concluding
remarks are presented in Section 3.6.