Abdulhakeem A.H. Eideh
Department of Mathematics
Al-Quds University, Al-Quds, Palestine [email protected]
Abstract
In this research we will deal with the problem of Bayes estimation of the parameter that characterise the superpopulation model, and Bayes prediction of finite population total, from a sample survey data selected from a finite population using informative probability sampling design, that is, the sample first order inclusion probabilities depend on the values of the model outcome variable (or the model outcome variable is correlated with design variables not included in the model). In order to achieve this we will first define the sample predictive distribution and the sample posterior distribution and then we use the sample posterior likelihood function to obtain the sample Bayes estimate of the superpopulation model parameters, and Bayes predictors of finite population total. These new predictors take into account informative sampling design. Thus, provides new justification for the broad use of best linear unbiased predictors (model-based school) in predicting finite population parameters in case of not accounting of complex sampling design. Furthermore, we show that the behaviours of the present estimators and predictors depends on the informativeness parameters. Also the use of the Bayes estimators and predictors that ignore the informative sampling design yields biased Bayes estimators and predictors. One of the most important feature of this paper is, specifying prior distribution for the parameters of the sample distribution makes life easier than the population parameters.
Keywords: Bayes estimator, Finite Population Sampling, Informative Sampling, Sample Distribution.
1. Introduction
Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 328
Nathan (2006a, 2006b, 2009), Eideh (2003, 2012a, 2012b). In these articles the authors reviewed many examples reported in the literature that illustrate the effect of ignoring the sampling process when fitting models to survey data based on complex sample and they discussed methods to deal with this problem.
schizophrenia”, and offer examples of the problems this creates. He discuss an alternative philosophy for survey inference which is called calibrated Bayes (CB), where inferences for a particular data set are Bayesian, but models are chosen to yield inferences that have good design-based properties. Little argue that CB resolves DMC conflicts, and capitalizes on the strengths of both frequentist and Bayesian approaches. Features of the CB approach to surveys include the incorporation of survey design information into the model, and models with weak prior distributions that avoid strong parametric assumptions. Savitsky and Toth (2016) propose to construct a pseudo-posterior distribution that utilizes sampling weights based on the marginal inclusion probabilities to exponentiate the likelihood contribution of each sampled unit, which weights the information in the sample back to the population. They construct conditions on known marginal and pairwise inclusion probabilities that define a class of sampling designs where consistency of the pseudo posterior is guaranteed.
In this paper we will account for informative probability sampling design in the Bayesian approach to sample survey inference, for example Bayes estimation of the parameters of superpopulation model and prediction of finite population total.
The plan of the paper is as follows, Section 2 is devoted to the definitions of sample and sample-complement distributions. In Section 3 we define the sample posterior distribution. In Section 4, we develop the sample posterior likelihood and Bayesian inference for superpopulation parameter. The Bayesian approach are studied for normal and Bernoulli superpoulation models in Sections 5 and 6. In Section 7, we propose a new Bayes prediction of finite population total. Finally some conclusions are presented in Section 8.
2. Sample and Sample-Complement Distributions
Let U
1,...,N
denote a finite population consisting of N units. Let y be the study variable of interest and let yi be the value of y for the ith population unit. We consider the population values y1,...,yN as random variables, which are independent realizations from a distribution with probability density function (pdf) fp
yi |
, indexed by a vector of parameters , where is the parameter space of . Let xi
xi1,...,xip
, iUbe the values of a vector of auxiliary variables, x1,...,xp, and z
z1,...,zN
be the values of known design variables, used for the sample selection process not included in the model under consideration. In what follows, we consider a sampling design with selection probabilities i Pr(is), and sampling weight wi 1i ; i1,...,N. In practice, the i’s may depend on the population values
x,y,z
. We express this dependence by writing: i Pr(is|x,y,z) for all units iU. Since 1,...,N are defined by the realizations
xi,yi,zi
,i1,...,N, therefore they are random realizationsdefined on the space of possible populations. The sample s consists of the subset of U
selected at random by the sampling scheme with inclusion probabilities 1,...,N.
Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 330
if unit iU is selected to the sample and Ii 0 if otherwise. The sample s is defined accordingly as s
i|iU,Ii 1
and its complement by cs
i|iU,Ii 0
. We assume probability sampling, so that i Pr(is)0 for all units iU.Let fp and Ep
denote the pdf and the mathematical expectation of the population distribution, respectively; fs and Es
denote the pdf and the mathematical expectation of the sample distribution, respectively; and fc and Ec
denote the pdf and the mathematical expectation of the sample-complement distribution. Assume that the population pdf depends on known values of the auxiliary variables xi, so that
i i p
p i f y
y ~ |x , . According to Pfeffermann, Krieger and Rinott, Y. (1998), the (marginal) sample pdf of yi is given by:
, , |
, | ,
, |
s , , , | ,
, |
i i p
i i p i i i p
i i p i
i s
E
y f y E
i y
f y
f
x
x x
x x
(1)
where
i i
p
i i i
p i i
ip E y f y dy
E |x ,, |x , , |x ,
and is the informativeness parameter.
Similarly, Sverchkov and Pfeffermann (2004) define the sample-complement pdf of yi
as:
i i p
i i p i i i p i
i c
E
y f y E
y f
x x x
x
| 1
| ,
| 1 |
(2)
For vector of random variables
y xi, i
, the following relationships hold:
1|
| i p i i
i
s w y E y
E (3a)
i i
s
i i
s
i i i
p y E w E wy
E |x |x 1 |x (3b)
1i p i
s w E
E
(3c)
i i
s
i i i s i
i p
i i i p i i c
x w
E
x y w E x
E
x y E
y E
| 1
| 1 |
1
| 1
|
x
(3d)
See Pfeffermann and Sverchkov (1999) and Sverchkov and Pfeffermann (2004).
Having derived the sample distribution, Pfeffermann, Krieger and Rinott (1998) proved that if the population measurements yi are independent, then as N
with n fixed
Step-one: Estimate the informativeness parameters using equation (3a). Denoting the resulting estimate of by ~.
Step-two: Substitute ~ in the sample log-likelihood function, and then maximize the resulting sample log-likelihood function with respect to the population parameters, :
,~
log
| ,,~
log
| ,,~
1 1
i i s n
i srs i
i p n
i srs
rs l E l E w
l
x
x
(4)
where lrs
,~ is the sample log-likelihood after substituting ~ in the samplelog-likelihood function and where
n
i
i i p
srs f y
l
1
, |
log
x is the classical
log-likelihood.
For more discussion on the use of sample and sample-complement distributions for analytic inference, see Pfeffermann and Sverchkov (1999, 2003), Sverchkov and Pfeffermann (2004), Eideh and Nathan (2009) and Eideh (2012a, 2012b).
3. Sample Posterior Distribution
In this section we describe Bayesian approach to the problem of estimation. This approach takes into account any prior knowledge of the survey that the sampler has. Let
q be the prior pdf of , here we look upon as a possible value of . Assume that the sample conditional pdf of yi given is fs
yi |
.The informativess parameter (or the parameter of sample selection process) that index the selection model is a characteristic of the data collection but is not generally of scientific interestis and it is not identifiable (see Nandram et al. (2006)), therefore in this paper we treat as fixed in repeated sampling. So that fs
yi |xi,,
can be written as
i|
s y
f . For more information on estimation of , see Eideh (2012a).
We next define the sample joint pdf of yi and , Bayes sample marginal or Bayes sample predictive pdf ofyi, and sample posterior distribution.
(a) The sample joint pdf of yi and denoted by, ms
yi,
is defined by:
i,
s i |
s y q f y
m (5)
(b) An important quantity in Bayesian inferences and statistical decision theory is the marginal or Bayes predictive distribution of Y , which plays an important role in various scientific applications. The Bayes sample marginal or Bayes sample predictive or Bayes sample predictive pdf of yi, denoted by, ms
yi is given by:
discrete is
if |
,
continuous is
if |
,
i s i
s
i s i
s i
s
y f q y
m
d y f q d y m y
m
Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 332
The Bayes sample predictive pdf of yi presents the evidence for a particular model, defined by prior distribution q
known as prior predictive as it represents the probability of observing the data yi that was observed before it was observed.(c) Now we introduce the sample posterior distribution, denoted by qs
|yi
, which is a way, after (posterior to) observing yi, for adjusting or updating the prior distribution
q (the beliefs about prior to experimentation or prior to survey) to qs
|yi
. Accordingly, all parametric statistical inferences about the parameters of interest, say
, are derived from the posterior sample distributions so it represents a complete solution to the inference problem.
The sample posterior distribution can be viewed as a way of combining the prior information q
and the sample information fs
y|
, in other words, qs
|x contains all of the information combining prior knowledge and observations. The sample posterior distribution of given yi, denoted by qs
yi is defined by:
| s Pr
, | ,
| s Pr
| | ,
|
| ,
|
i y m
y f y i q
E y m
y f y E
q
y m
y f q y
m y m y q
i s
i i p i
i p i s
i p i i p
i s
i s i
s i s i s
x
(7)
Note that:
(i) If the sampling design is noninformative, that is, Pr
is|yi,
Pr
is|
for all yi, then the sample joint pdf of yi and is the same as the population joint pdf of yi and , the Bayes sample predictive pdf of yi posterior distribution is the same as the Bayes population predictive pdf of yi, and the sample posterior distribution of given yi is the same as the population posterior distribution of given yi. So that data collection method (sampling design) does not influence Bayesian inference.(ii) The Bayes sample predictive pdf of yi,ms
yi , represents the normalizing constant in Bayes theorem, so that
i s
i s i
s
y m
y f q y
q | | (8)
is called normalized sample posterior distribution, while
| i
s i |
s y q f y
q (9)
(iii) The sample posterior distribution (the distribution of after the sample is drawn) can be viewed as a way of combining the prior information q
(which reflects the subjective belief of before the sample is drawn), the sampling design,
s| ,
Pri yi , and the population information fp
yi |
. In other words,
is y
q contains all of the information combining prior knowledge, sampling design and observations.
(iv) With fixed fp
yi|
and q
, the sample posterior distribution is completelydetermined by Ep
i|yi,
. For more information on this issue, see Pfeffermann, Krieger and Rinott (1998), and Eideh (2010). Hence, sample posterior distribution combine: population distribution fp
yi |
, prior distribution q
and
i| i,
p y
E .
(v) Furthermore,
2 2
1 1
2 1 2
1
| | |
|
i s
i s i
s i s
y f q
y f q q q y q
y q
(10)
That is, the sample posterior odds are equal to prior odds, q
1 q2 , multipliedby the sample likelihood ratio,
2
1 2
1
| | |
|
i s
i s i s
i s
y f
y f y L
y L
(11)
4. Sample Posterior Likelihood and Bayesian Inference for Superpopulation Parameter
Data collected by sample surveys, are used extensively to make inferences on assumed population models. Often, survey design features (clustering, stratification, unequal probability selection, etc.) are ignored and the sample data are then analyzed using classical methods based on simple random sampling. In this section we account for informative sampling design in the analysis of survey data using Bayesian inference.
If ys
y1,yn
are independent and identical random variables from fs
yi |
, then we can write the sample joint conditional pdf of ys
y1,yn
given , as:
n
i
i s s
s s
s L f y
f
1
|
| y
y (12)
Thus sample joint pdf of ys
y1,yn
and is:
s
s
ss q L
Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 334
If is a continuous random variable, then Bayes sample predictive pdf of
n
s y1,y
y
is:
m , d q L d
ms ys s ys s ys (14)
Hence the conditional sample pdf of sys or the sample posterior likelihood:
s s
s s
s s
s s s s
L q
m L q L q
y y
y y y
| |
(15)
Now we are in a position to deal with Bayesian inference, via the sample posterior distribution and the sample posterior likelihood. This provides the following representation of the sample posterior pdf of |ys:
Sample Posterior information = Prior information + Sampling design information + Sampling information
And in terms of pdfs:
Sample Posterior pdf = Prior pdf+ Sample Likelihood function = Prior pdf+ sampling design+ population pdf
Therefore the sample posterior pdf of |ys summarizes the total information, after viewing the sample data, collected under informative sampling design, and provides a basis for sample posterior or Bayesian inferences concerning the population parameters of interest.
The generalized maximum likelihood estimator (MLE) ˆg of is the value of
that maximizes the sample posterior likelihood function, Ls
|ys
. That is, ˆg is the most likely value of , given the prior and the sample data ys
y1,yn
. Obviouslyˆg has the interpretation of being the "most likely" value of given theprior and the sample data ys
y1,yn
.From Bayesian viewpoint, estimating the population parameter is obtained by choosing a decision function
ys , which depends on a loss function L
ys ,
, whose conditional expectation under the sample posterior distribution is minimum. That is,
|
, argmin
, argmin
d q
L L E
s s s
s s s
s
y y
y y y
If L
ys ,
ys
2 (squared error loss function), then it is easy to show that, the sample Bayes estimate of is given by:
ys B Es
ys
ˆ (17)
and the sample Bayes estimate of
is given by:
s
s
B E y
ˆ (18)
Moreover, the 100
1
% highest sample posterior density (HSPD) credible set for
, is the subset C of of the form:
q k
C : s |ys (19)
where ys
y1,yn
, and k
is the largest constant such that:
C s
1P y (20)
5. Application 1 – Normal Superpopulation Model
According to the definition of sample posterior distribution, to consider the Bayesian inference, we need to specify: fp
yi|
and q
, and Ep
i|yi,
.From now on, we denote the parameter index the population distribution by p, and the parameter index the sample distribution by s. Consider the following population model:
2
, ~
| p p p
i N
y (21)
be independent random variables, i1,...,N , where 2 0 is known, i1,...,N. Let
n
s y1,y
y be the sample data. Suppose that
i i
ip y ay
E exp 1
(22)
Using equation (1), it is easy to verify that the sample model is:
2
, ~
| s s s
i N
y (23)
where s p a12 s
p,a1,2
is a linear function ofp. So in order to find the Bayes estimator of p, we (for simplicity and to makes life easy) choose a prior distribution for s and then derive the Bayes estimator for p, via the relationship between p and s. We consider different cases based on prior distribution.Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 336
(a) The sample posterior distribution of sys is given by:
2 2 2 2 2 2
2 2
, ~
n n
y n
N s
s
s y (24)
where
ni i
s n y
y
1 1
.
so that
2 22 2
n y n
Es s s s
y (25)
and
22 2
2 2
2 2 2
n n n
Vs s ys (26)
Now, for given
a1,2,2,
and using s p a12, we have
2 1 2 2
2 2
2 1
a n
y n
a E
E
s s s
s s p s
y y
(27)
and
2 22 2 2 1
n V
a V
V
s s s
s s
s s p s
y
y y
(28)
Furthermore, the sample posterior distribution of p ys is given by:
2 2 2 2 2
1 2 2
2 2
, ~
n a
n y n
N s
s
p y (29)
Hence the sample and population posterior pdfs of p ys belong to the same family of normal distributions, but the mean under the sample posterior pdf is different from the mean under the population posterior pdf of p ys. Furthermore if a1 0, that is the
sampling design is ignorable, then the sample and population posterior pdfs are similar.
(b) For given
a1,2,2,
, the Bayes estimate of p under the sample posterior pdf is:
21
2 1 1
2 1 2 2
2 2
ˆ
1 1
ˆ
a srs
a y a
y
a n
y n E
pB
s s
s s
p s pB
y
where
2 22
n n
(31)
and
spB srs y
ˆ 1 (32)
is the Bayes estimate of pp under noninformative sampling design. Note that:
(1) If a1 0(that is, sampling design is noninformative), then the Bayes estimate of
p
based on the population model and sample model are coincide.
(2) If a1 0 and yi 0, then
i
y a i i
p y e
E | 1 is an increasing function of
i y , so that larger values are more likely to be in the sample than smaller values. In this case, ˆpB
srs overestimate p. The Bayes estimate ˆpB ˆpB
srs a12adjusts ˆpB
srs by the positive quantitya12. (3) If a1 0 and yi 0, then
ayii i
p y e
E | 1 is a decreasing function of
i y , so that smaller values are more likely to be in the sample than larger values. In this case, ˆpB
srs underestimatepp. The Bayes estimate ˆpB ˆpB
srs a12adjusts ˆpB
srs by the negative quantity a12.(4) ˆpB a12 ˆsB is a weighted average of (prior mean) and ys (the maximum likelihood estimator of s ), with weights and 1, respectively.
(5) ˆ can be written as: pB
21 2 2
2 2
2 2
ˆ
a
n n y
n s
pB
(33)
so that,
2 1
2 1 2 2
2 2
2 2
lim ˆ
lim
a y
a n
n y
n s
s n
pB n
(34)
Hence, for large sample size n, the Bayes estimate ofpp, shrinkage to ys a12. That is, as the sample size n increases, the influence of the prior distribution on posterior inferences decreases.
Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 338
(c) It is easy to verify that for given
a1,2,2,
, the generalized MLE of p is given by:
21 2 1
1
ˆ
a y
a y
y
s s s pg
(35)
which is the sample Bayes estimate of pp.
(d) Furthermore, the100
1
% highest sample posterior density (HSPD) credible set for p, is given by:
2 2 2 2 2
2 1
1
n z
a y
C s (36)
where is the 100
1 2
% percentile of the standard normal distribution.Note that if a1 0 (that sampling design is noninformative), then the 100
1
%highest sample posterior density (HSPD) credible set for , is given by:
1 2 2 2 2 2
n z
y
C s (37)
which is the classical 100
1
% highest posterior density (HPD) credible set for p.Not that
q
s ;s
is a conjugate family for
fs
yi |s
;s
, because
s i
s y
q | is in the class of
q
s ;s
.Case 2: Assume that the prior distribution for s p a12 ~ N
a12,2
withknown
1
2
, , a
. Under this case we have:
1 22 2 2 2 2 2 2 2
, ~
n n
y n a
N s
s
s y (38)
21
2 1
1
a E
a y E
s p p
s s
s s
y y
(39)
s s
p
p s
s V
n
V y y
2 2 2 2
(40)
2
1 2
1 1 y a
a E
For given
a1,2,2,
, the Bayes estimator of s under the sample posterior pdf is:
21
1
ˆ
sB Es s ys ys a
(42)
where
2 2 2
n n
Hence, the Bayes estimator of p is:
2
1 2 1
1
ˆ
a y
na y
y
s s s p
(43)
It is easy to verify that for given
a1,2,2,
, the generalized MLE of ss is given by:
21
1
ˆ
sg ys a
(44)
So that:
2
1 2 1
1
ˆ
a y
na y
y
s s s pg
(45)
The 100
1
% highest sample posterior density (HSPD) credible set for s , is given by:
2 2 2 2 2
2 1
1
n z
a y
Cs s
(46)
So that for p sa12 is:
2 2 2 2 2
2 1
1
n z
a y
C s (47)
Not that
q
s ;s
is a conjugate family for
fs
yi|s
;s
, because qs
s |yi
is in the class of
q
s ;s
.Case 3: Assume that the prior distribution for p ~ N
,2
with known
,2 . Similar to Cases 1 and 2, we have:
2 2 1 2 2 2 2 2 2
2 2
, ~
n n
na y n
N s
s p y
Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 340
so that
p s
p
p s
p
s s s
p s
V na E
n na
n y n
n na y n E
y y
y
1
2 2
2 2 1 2
2 2 2
2 2
2 2 1 2
2
(49)
and
p
p s
s p s
V n
n n V
y y
2 2 2
2 2 2
2 2
(50)
which is similar to Case 2 (a).
(b) For given
a1,2,2,
, the Bayes estimator of p under the sample posterior pdf is:
2
1
2 1
1
1 1
ˆ
a y
a y
E
s
s s
p s pB
y
(51)
where
2 2 2
n n
which is similar to Case 2 (b).
Note that, here, ˆ is a weighted average of (prior mean) and pB
2
1a
ys (the
maximum likelihood estimator of pp), with weights and , respectively.
(c) It is easy to verify that for given
a1,2,2,
, the generalized MLE of p is given by:
2
1
ˆ
pg ys ys na
(52)
which is the sample Bayes estimate of p - similar to Case 2 (c).
(d) The 100
1
% highest sample posterior density (HSPD) credible set for p, is given by:
2 2 2 2 2
2 1
1
n z
a y
C s (53)
which is similar to Case 2 (d).
Note that if a1 0(that sampling design is noninformative), then the 100
1
% highest sample posterior density (HSPD) credible set for p, is given by
1 2 2 2 2 2
n z
y
C s (54)
which is the 100
1
% highest posterior density (HPD) credible set for p.Hence, the prior distribution for p ~ N
,2
with known
,2 is equivalent to the prior distribution s p a12 ~ N
a12,2
with known
,2,a1
, in the sense that, they give the same predictor of p .Not that
q
p ;p
is a conjugate family for
fs
yi |s
;s
, because
p i
s y
q | is in the class of
q
p ;p
.It should be noted that, the similarity between Case 2 and Case 3 is interpreted as
follows: since under Case 3, p ~ N
,2
, therefore
2 2
1 2
1 ~ N a ,
a p
s
, which is the prior distribution under Case 2. The
situation is different under Case 1, wheres p a12 ~ N
,2
.Case 4: Suppose that the prior distribution of s is noninformative or improper:
s s q 1 ,- (55)
(a) Since ys is a sufficient statistics for s p a12, therefore, the conditional sample pdf of s y1,...,yn or the sample posterior likelihood is:
2 2
2
s s
2 exp 2
1
| |
| |
s s s
s s
s s s s s
s s
y n n
y L q
y q
L q q
y y
(56)
Hence the posterior distribution of s ys is given by:
n y N s s
s
2
,
~
y (57)
so that
s s
ss y
E y (58)
and
n
Vs s s
2
Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 342
We are interested in sample posterior distribution of p ys.
Since p s a12, therefore, sample posterior distribution of p ys is given by:
n a
y N s s
p
2 2 1 ,
~
y
(60)
so that
21 2 1
a E
a y E
s p p s s p s
y y
(61)
and
p s
p
p s
s V
n
V y y
2
(62)
Note that the sample and population posterior pdfs of p ys belong to the same family
of normal distributions, but the mean under the sample posterior pdf is different from the mean under the population posterior pdf of p ys. Furthermore if a1 0, that is the
sampling design is ignorable, then the sample and population pdfs are similar.
(b) For given
a1,2
, the Bayes estimator of p under the sample posterior pdf is
21
ˆ
pB Es pys ys a
(63)
which is the same as the maximum likelihood estimator of p under the sample model: yi |s s ~ N
s,2
wherep sa12.(c) It is easy to verify that for given
a1,2
, the generalized MLE of p is given by:2 1
ˆ
sg ys a (64)
(d) The 100
1
% highest sample posterior density (HSPD) credible set for p, is given by:
n z a
y
C s
2 2 2 1
(65)
Note that if a1 0(that sampling design is noninformative), then the 100
1
% highest sample posterior density (HSPD) credible set for p, is given by:
n z y
C s
2 2
(66)
Not that
q
s ;s
is a conjugate family for
fs
yi |s
;s
. Because
s i
s y
q | is in the class of
q
s ;s
.6. Application 2 – Bernoulli Superpopulation Model
Let yi |p p ~ Ber
p are independent random variables, where p
0,1 ,N
i1,..., . Suppose that the sample inclusion probabilities have expectations:
10 exp exp 0 , , exp Pr | 1 1 0 0 0 1 0 1 0 i i i i i i p y if y if a a a a a y a a y s i y E (67)
So that, a0 ln0 and a1 ln
1 0
. In this case 0and1 are known before sampling. We can show that the sample distribution of yi is:
1
, 0,1 1 1 1 Pr Pr 1 1 0 1 0 0 1 1 i y s y s y p p p y p p p p i p i s i s y s i y f y s i y f i i i i (68)Thus the distribution of yi ,is is the same as the distribution of yi ,iU, except that the population parameter p changes to:
s
p
s pp p
p
s
1 0
0 1 1 , , 1 (69)
which is not a linear function of p.
Notice that if a1 0, or equivalently 0 1, that is, the sampling design is ignorable
then the sample and population distributions are the same.
Case 1: Now suppose that the prior distribution of p is p ~ B
,
, where
,
are known. The conditional sample pdf of p ys or the sample posterior likelihood is given by:
p s
p s
s p
ps q L
q |y y ,
(70) where
i ss n ny
Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 344
which cannot be written in the form of beta distribution.
Not that
q
p ;p
is not a conjugate family for
fs
yi |p
;p
, because
p i
s y
q | is not in the class
q
p ;p
. Recall that
1 1 1 0
s p p p is not a linear function of p.
Case 2: To makes life easy, assume that the prior distribution of s is s ~B
,
. It is easy to verify that, the conditional sample pdf of s ys or the sample posterior likelihoodis given by:
s s
s s
s s
ss q L
q |y y ,
(72)
where
1
11
1 ,
1
|
s
s n ny
s y
n s n
i
s i s s s
s s s
Beta
y f q
L q
y
(73)
Hence,
1
11 ,
1
| s nnys
s y
n s s
s s
Beta
q
y (74)
After some algebra, we can show that the sample posterior distribution of sys is given by:
s s
s
s B ny nny
y ,
(75)
So that
n y n
y n n y n
y n E
s
s s
s s
s s
y
(76)
and
1
2
n n
y n n y n
V s s
s s s
y (77)
Not that
q
s ;s
is a conjugate family for
fs
yi |s
;s
. Because
s i
s y
q | is in the class
q
s ;s
.Hence, for given
,,0,1
, the Bayes estimate of s under the sample posteriorpdf is given by:
n y n
E s
s s s
sB
ˆ y
But we are interested in the Bayes estimate of p. For this, using the relationship between s and p, we get:
s s s sB sB sB pB y n n y n y n 1 0 0 1 0 0 ˆ 1 ˆ ˆ ˆ (79)Notice that if 0 1, that is, the sampling design is ignorable, then
srs ny n
pB s
pB
ˆ ˆ (80)
which is the classical Bayes estimate of punder simple random sample.
The Bayes estimate ˆ can be written as: pB
s s s s s pB y n n y n y y n n y n n 1 0 0 1 0 0 ˆ (81)which is a linear combination of ys- maximum likelihood estimator of p, under the population model yi |p p ~ Ber
p , and
- mean of s:B
,
.Furthermore, we have:
s
s s s s s s s n pB n y y y y n n y n y y n n y n n 1 lim ˆ lim 1 0 0 1 0 0 1 0 0 (82)
Hence, for large sample size n, the Bayes estimate ofpp, shrinkage to:
ˆ
inf 1 1 0 0 p s s s y y y (83)where ˆp
inf denotes the maximum likelihood estimate of p under informative sampling design, treating p as fixed unknown constant.Notice that if 0 1, that is, the sampling design is ignorable, then pB s
n y
ˆ
lim .
Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 346
design is informative, then for large sample size n, the Bayes estimate of p becomes
0ys 0ys 1 1ys
.It is easy to verify that for given
,
, the generalized MLE of ss is given by:
n y
n n
n y n
s s sg
ˆ
(84)
which is the sample Bayes estimate of s. Hence using the relationship between s and p, we have
s s
s pg
y n n y
n
y n
1 0
0
ˆ
(85)
Notice that if 0 1, that is, the sampling design is ignorable then, then
pg s
sg
n y
n
ˆ ˆ
(86)
which the Bayes estimate of pp under noninformative sampling design.
The important feature of this formula is that, if p is treated as fixed unknown constant (classical or frequentist statistics), that is, 0, and the sampling design is noninformative, then for large sample size the Bayes estimate of p becomes ys which is the MLE of p.
7. Bayes Prediction of Finite Population Total
Sverchkov and Pfeffermann (2004) use sample and sample complement distributions for the prediction of finite population totals under informative sampling for single-stage sampling designs. Later Eideh and Nathan (2009) extend the theory to general linear functions of the population values and to two-stage informative cluster sampling. None of the above studies consider prediction of finite population total from Bayesian perspective. In this section we use sample posterior and sample complement posterior distributions, to predict the finite population total under single-stage informative sampling design. Let
s i
i s
i i c
i i s
i i N
i
i y y y y
y T
1
(87)
The available information for the prediction process is
y i s I i U N n
O i,i , i, , , , where Ii 1 for is and Ii 0 for is.
i c p i c p i p c y m y f q yq | |
(88)
where, fc
yi|
is the sample complement distribution of yi given p p, which is given by:
p i p p i p i i p p i c E y f y E y f 1 | | 1 | (89)Also, we have:
p i s p i i s p i p p i i p p i c w E y w E E y E y E | 1 | 1 | 1 | 1 | (90) and
p p p p i c p p p p i c p i c y f q d y f q y m discrete is if | continuous is if | (91)Hence, qc
p<