Structure Preserving Estimation Model - Multivariate Small Area Estimation for Health Indicator

Structure Preserving Estimation (SPREE) is a small area estimation approach that

combines auxiliary information from a previous census with current survey data to

improve the precision of estimators of the cell totals in a contingency table (Purcell

and Kish, 1980). The idea underlying SPREE is that the dependence structure

in the previous census holds in the current time period, while the census marginal

levels may be out of date. SPREE adjusts the design of the table from the previous

census in a way that preserves the interactions from the census and the margins

from the current survey.

SPREE is a generalization of synthetic estimation. Synthetic estimates are ob-

tained by multiplying small area total population estimates by national level esti-

mates of population proportions in each cross-classified cell (Zhang and Chambers,

2004). Zhang and Chambers (2004) proposed a number of log-linear structural

contingency table and SPREE is a special case of this model.

Let {Ydr} be the set of cross-classified counts of interest corresponding to r =

1,2, . . . , qclassification cells ind= 1,2, . . . D areas. Similarly let{Zdr}be the set of

auxiliary cross-classifications which are obtained from the census or administrative

registers. Let the area population be Yd. =

rYdr (assumed known) and ˆY.r =

dYdr be the estimated national classification based on national survey data. The

within area proportion (θY

dr) is the target quantity which is defined as

θ_drY = Ydr

Yd.

Let the within area proportion of auxiliary variable counts be denoted by θZ_dr,

which is defined as

θ_drZ = Zdr

Zd.

where Zd. = P_rZdr. The auxiliary classification {Zdr} are iteratively rescaled to

agree with both the known Yd. and the national estimates ˆY.r under SPREE. The

log-linear models which describe the SPREE are

log(Ydr) = λY0 +λ Y d +λ Y r +λ Y dr log(Zdr) = λZ0 +λdZ+λZr +λZdr (2.54)

(Zhang and Chambers, 2004) whereλY

0 andλZ0 are the overall means,λYd andλZd are

the marginal effects of the areas,λY

r andλZr are the marginal effects of the categories,

λY_dr and λZ_dr are the interaction effects of each cross-classified area-category. SPREE

is equivalent to using these log-linear models assuming equal interaction parameters,

that is λY

The Zhang and Chambers (2004) approach requires cross-classified small area

counts based on survey data and the corresponding auxiliary cross-classification

from the previous census or administrative registers. In Chapter 6, log-linear models

will be used to estimate small area cross-classified counts of target variables by area

rather than considering area cross-classification of target variables and auxiliary

variables.

Noble and Arnold (2002) stated that the model supporting SPREE is a special

case of a generalized linear model. The estimators of the cell totals obtained from

SPREE are the maximum likelihood estimators of the expected counts under a

generalized linear model with a Poisson random component and a log link. Main

effects of rows and columns are estimated with the direct estimators. Interactions

are set equal to the interactions in a saturated log-linear model fitted to the census

two-way table. Census counts by the cross-classification of interest are needed. In

this thesis, Chapter 6 will make use of log-linear models with random effects which

will enable small area estimates for cross-classified counts when only survey data is

Theoretical and Numerical

Evaluation of Bivariate

Fay-Herriot Model

The purpose of this chapter is to compare the efficiency of small area estimators

obtained from a bivariate Fay-Herriot (BFH) model with those obtained from a uni-

variate Fay-Herriot (UFH) model. A numerical and a simulation study have been

conducted to examine the performance of these two estimators. The relative effi-

ciency (RE) of BFH and UFH estimators is measured theoretically and numerically.

The relative efficiencies of BFH over UFH estimators are related to the correlation

between dependent variables’ sampling errors and random effects, and other pa-

rameters such as the relative magnitude of the sampling variances and the random

effects variances.

3.1 Introduction

The area level Fay-Herriot model (Fay and Herriot, 1979) is widely used in small

area estimation. The basic area level univariate Fay-Herriot model can be extended

to a multivariate model taking into account the correlation of several target variables

together. The BFH model extends the UFH model for two target variables.

Statisticians are often required to estimate correlated descriptive measures, like

poverty or unemployment indicators. Bivariate models are useful when the aim

is to estimate area level statistics for two correlated variables. Various methods

such as the method of moments, maximum likelihood, and restricted maximum

likelihood can be used to estimate the variance components of BFH model and then

the estimated parameters are used to fit the model.

The purpose of this chapter is to explore the question of whether, and under

what conditions, the BFH model is advantageous compared to the UFH model for

small area estimation. The UFH and BFH models are fitted by setting the parame-

ters. These parameters are used in determining the variance-covariance matrices of

sampling errors and the random effects. Expressions of the REs of BFH estimators

over UFH estimators are derived in general, and for some special cases where the

expression can be simplified. These special cases will give insight into when BFH

should be worthwhile in practice. A numerical study and a simulation study are

also conducted to compare the two approaches.

In the numerical study, a simple approximation of the RE of BFH estimators

is calculated for many different sets of parameter values. Fixing and controlling

the parameter values provides a way of measuring the efficiency of BFH over UFH

SAEs. The results of the numerical study are summarized using an analysis of

variance (ANOVA) approach and a novel regression tree approach.

In the simulation study, the behaviour of the UFH and BFH estimators is in-

the UFH and BFH models are fitted and MSEs of the corresponding empirical best

linear unbiased predictor (EBLUP) are calculated using the estimated parameters

from the simulated data. In addition, another purpose is to look at whether one of

the thousands of cases in the numerical study is similar to the simulation parameter

settings, and whether this particular case in the numerical study gives consistent

results to the simulation.

The chapter is divided into following sections: Section 3.2 outlines the model

structure and the MSE for both the UFH and BFH estimators. The relative perfor-

mance of UFH and BFH estimators is measured by the relative efficiency of these

MSEs and in Section 3.3 some special cases regarding the relative efficiencies are

discussed. Section 3.4 describes the numerical study including an exploratory anal-

ysis of the results as well as a formal analyses of the numerical study results using

analysis of variance and regression trees. Section 3.5 further investigates the perfor-

mance of the UFH and the BFH estimators via a simulation experiment. Concluding

remarks are presented in Section 3.6.

In document Multivariate Small Area Estimation for Health Indicators (Page 78-83)