Bayes Estimation and Prediction under Informative Sampling Design

(1)

Abdulhakeem A.H. Eideh

Department of Mathematics

Al-Quds University, Al-Quds, Palestine [email protected]

Abstract

In this research we will deal with the problem of Bayes estimation of the parameter that characterise the superpopulation model, and Bayes prediction of finite population total, from a sample survey data selected from a finite population using informative probability sampling design, that is, the sample first order inclusion probabilities depend on the values of the model outcome variable (or the model outcome variable is correlated with design variables not included in the model). In order to achieve this we will first define the sample predictive distribution and the sample posterior distribution and then we use the sample posterior likelihood function to obtain the sample Bayes estimate of the superpopulation model parameters, and Bayes predictors of finite population total. These new predictors take into account informative sampling design. Thus, provides new justification for the broad use of best linear unbiased predictors (model-based school) in predicting finite population parameters in case of not accounting of complex sampling design. Furthermore, we show that the behaviours of the present estimators and predictors depends on the informativeness parameters. Also the use of the Bayes estimators and predictors that ignore the informative sampling design yields biased Bayes estimators and predictors. One of the most important feature of this paper is, specifying prior distribution for the parameters of the sample distribution makes life easier than the population parameters.

Keywords: Bayes estimator, Finite Population Sampling, Informative Sampling, Sample Distribution.

1. Introduction

(2)

Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 328

Nathan (2006a, 2006b, 2009), Eideh (2003, 2012a, 2012b). In these articles the authors reviewed many examples reported in the literature that illustrate the effect of ignoring the sampling process when fitting models to survey data based on complex sample and they discussed methods to deal with this problem.

(3)

schizophrenia”, and offer examples of the problems this creates. He discuss an alternative philosophy for survey inference which is called calibrated Bayes (CB), where inferences for a particular data set are Bayesian, but models are chosen to yield inferences that have good design-based properties. Little argue that CB resolves DMC conflicts, and capitalizes on the strengths of both frequentist and Bayesian approaches. Features of the CB approach to surveys include the incorporation of survey design information into the model, and models with weak prior distributions that avoid strong parametric assumptions. Savitsky and Toth (2016) propose to construct a pseudo-posterior distribution that utilizes sampling weights based on the marginal inclusion probabilities to exponentiate the likelihood contribution of each sampled unit, which weights the information in the sample back to the population. They construct conditions on known marginal and pairwise inclusion probabilities that define a class of sampling designs where consistency of the pseudo posterior is guaranteed.

In this paper we will account for informative probability sampling design in the Bayesian approach to sample survey inference, for example Bayes estimation of the parameters of superpopulation model and prediction of finite population total.

The plan of the paper is as follows, Section 2 is devoted to the definitions of sample and sample-complement distributions. In Section 3 we define the sample posterior distribution. In Section 4, we develop the sample posterior likelihood and Bayesian inference for superpopulation parameter. The Bayesian approach are studied for normal and Bernoulli superpoulation models in Sections 5 and 6. In Section 7, we propose a new Bayes prediction of finite population total. Finally some conclusions are presented in Section 8.

2. Sample and Sample-Complement Distributions

Let U 



1,...,N



denote a finite population consisting of N units. Let y be the study variable of interest and let y_i be the value of y for the ith population unit. We consider the population values y₁,...,y_N as random variables, which are independent realizations from a distribution with probability density function (pdf) f_p



y_i |



, indexed by a vector of parameters , where  is the parameter space of . Let xi 



xi1,...,xip



, iU

be the values of a vector of auxiliary variables, x₁,...,x_p, and z



z₁,...,z_N



be the values of known design variables, used for the sample selection process not included in the model under consideration. In what follows, we consider a sampling design with selection probabilities _i Pr(is), and sampling weight w_i 1_i ; i1,...,N. In practice, the _i’s may depend on the population values



x,y,z



. We express this dependence by writing: _i Pr(is|x,y,z) for all units iU. Since ₁,...,_N are defined by the realizations



xi,yi,zi



,i1,...,N, therefore they are random realizations

defined on the space of possible populations. The sample s consists of the subset of U

selected at random by the sampling scheme with inclusion probabilities ₁,...,_N.

(4)

Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 330

if unit iU is selected to the sample and I_i 0 if otherwise. The sample s is defined accordingly as s



i|iU,I_i 1



and its complement by cs 



i|iU,I_i 0



. We assume probability sampling, so that _i Pr(is)0 for all units iU.

Let f_p and E_p

 

 denote the pdf and the mathematical expectation of the population distribution, respectively; f_s and E_s

 

 denote the pdf and the mathematical expectation of the sample distribution, respectively; and f_c and E_c

 

 denote the pdf and the mathematical expectation of the sample-complement distribution. Assume that the population pdf depends on known values of the auxiliary variables x_i, so that



i i p



p i f y

y ~ |x , . According to Pfeffermann, Krieger and Rinott, Y. (1998), the (marginal) sample pdf of y_i is given by:











 





  



 



  



, , |

, | ,

, |

s , , , | ,

, |

i i p

i i p i i i p

i i p i

i s

E

y f y E

i y

f y

f

x

x x



 

(1)

where



_i _i







_p



_i _i _i

 

_p _i _i



_i

p E y f y dy

E  |x ,,  |x , , |x ,

and  is the informativeness parameter.

Similarly, Sverchkov and Pfeffermann (2004) define the sample-complement pdf of yi

as:







 



i i p

i i p i i i p i

i c

E

y f y E

y f

x x x

x

| 1

| ,

| 1 |

 

 

 (2)

For vector of random variables



y x_i, _i



, the following relationships hold:













1

|

| _i  _p _i _i 

i

s w y E y

E  (3a)



_i _i





_s



_i _i





_s



_i _i _i



p y E w E wy

E |x  |x 1 |x (3b)

 

_



 



1

i p i

s w E

E 

(3c)





























i i

s

i i i s i

i p

i i i p i i c

x w

E

x y w E x

E

x y E

y E

| 1

| 1 |

1

| 1

|

  

 

x

(3d)

See Pfeffermann and Sverchkov (1999) and Sverchkov and Pfeffermann (2004).

Having derived the sample distribution, Pfeffermann, Krieger and Rinott (1998) proved that if the population measurements y_i are independent, then as N



with n fixed



(5)

Step-one: Estimate the informativeness parameters  using equation (3a). Denoting the resulting estimate of  by ~.

Step-two: Substitute ~ in the sample log-likelihood function, and then maximize the resulting sample log-likelihood function with respect to the population parameters, :

 

_,~

 

 _log



 _| _,_,~



 

 _log



_| _,_,~



1 1

i i s n

i srs i

i p n

i srs

rs l E l E w

l



x



x

 

 



 (4)

where lrs

 

,~ is the sample log-likelihood after substituting ~ in the sample

log-likelihood function and where

 













 n

i

i i p

srs f y

l

1

, |

log 

 x is the classical

log-likelihood.

For more discussion on the use of sample and sample-complement distributions for analytic inference, see Pfeffermann and Sverchkov (1999, 2003), Sverchkov and Pfeffermann (2004), Eideh and Nathan (2009) and Eideh (2012a, 2012b).

3. Sample Posterior Distribution

In this section we describe Bayesian approach to the problem of estimation. This approach takes into account any prior knowledge of the survey that the sampler has. Let

 



q be the prior pdf of , here we look upon  as a possible value of . Assume that the sample conditional pdf of y_i given  is f_s



y_i |



.

The informativess parameter (or the parameter of sample selection process)  that index the selection model is a characteristic of the data collection but is not generally of scientific interestis and it is not identifiable (see Nandram et al. (2006)), therefore in this paper we treat _{as fixed in repeated sampling. So that} f_s



y_i |x_i,,



can be written as



_i|



s y

f . For more information on estimation of , see Eideh (2012a).

We next define the sample joint pdf of y_i and , Bayes sample marginal or Bayes sample predictive pdf ofy_i, and sample posterior distribution.

(a) The sample joint pdf of y_i and _{denoted by,}m_s



y_i,



is defined by:



_i,

   

 _s _i |



s y q f y

m  (5)

(b) An important quantity in Bayesian inferences and statistical decision theory is the marginal or Bayes predictive distribution of Y , which plays an important role in various scientific applications. The Bayes sample marginal or Bayes sample predictive or Bayes sample predictive pdf of y_i, denoted by, m_s

 

y_i is given by:

 





  







  



    

 







 

discrete is

if |

,

continuous is

if |

,

 



  

 

i s i

s

i s i

s i

s

y f q y

m

d y f q d y m y

m

(6)

Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 332

The Bayes sample predictive pdf of y_i presents the evidence for a particular model, defined by prior distribution q

 

 known as prior predictive as it represents the probability of observing the data y_i that was observed before it was observed.

(c) Now we introduce the sample posterior distribution, denoted by q_s



|y_i



, which is a way, after (posterior to) observing y_i, for adjusting or updating the prior distribution

 



q (the beliefs about  prior to experimentation or prior to survey) to q_s



|y_i



. Accordingly, all parametric statistical inferences about the parameters of interest, say

 

 , are derived from the posterior sample distributions so it represents a complete solution to the inference problem.

The sample posterior distribution can be viewed as a way of combining the prior information q

 

 and the sample information f_s



y|



, in other words, q_s

 

|x contains all of the information combining prior knowledge and observations. The sample posterior distribution of  given y_i, denoted by q_s

 

 y_i is defined by:







 



  

 



  

 



  



  

 



  





 



 

 

 

 

| s Pr

, | ,

| s Pr

| | ,

|

| ,

|

 

 

i y m

y f y i q

E y m

y f y E

q

y m

y f q y

m y m y q

i s

i i p i

i p i s

i p i i p

i s

i s i

s i s i s

x

(7)

Note that:

(i) If the sampling design is noninformative, that is, Pr



is|y_i,



Pr



is|



for all y_i, then the sample joint pdf of y_i and  is the same as the population joint pdf of y_i and , the Bayes sample predictive pdf of y_i posterior distribution is the same as the Bayes population predictive pdf of y_i, and the sample posterior distribution of  given y_i is the same as the population posterior distribution of  given y_i. So that data collection method (sampling design) does not influence Bayesian inference.

(ii) The Bayes sample predictive pdf of y_i,m_s

 

y_i , represents the normalizing constant in Bayes theorem, so that





  

 



i s

i s i

s

y m

y f q y

q  |   | (8)

is called normalized sample posterior distribution, while



| i



  

 s i |



s y q f y

q  (9)

(7)

(iii) The sample posterior distribution (the distribution of  after the sample is drawn) can be viewed as a way of combining the prior information q

 

 (which reflects the subjective belief of  before the sample is drawn), the sampling design,



s| ,



Pri y_i , and the population information f_p



y_i |



. In other words,

 

i

s y

q  contains all of the information combining prior knowledge, sampling design and observations.

(iv) With fixed fp



yi|



and q

 

 , the sample posterior distribution is completely

determined by E_p



_i|y_i,



. For more information on this issue, see Pfeffermann, Krieger and Rinott (1998), and Eideh (2010). Hence, sample posterior distribution combine: population distribution fp



yi |



, prior distribution q

 

 and



i| i,



p y

E .

(v) Furthermore,









 

  

2 2



1 1

2 1 2

1

| | |

|

 

  



i s

i s i

s i s

y f q

y f q q q y q

y q

 (10)

That is, the sample posterior odds are equal to prior odds, q

   

1 q2 , multiplied

by the sample likelihood ratio,











2



1 2

1

| | |

|

  



i s

i s i s

i s

y f

y f y L

y L

 (11)

4. Sample Posterior Likelihood and Bayesian Inference for Superpopulation Parameter

Data collected by sample surveys, are used extensively to make inferences on assumed population models. Often, survey design features (clustering, stratification, unequal probability selection, etc.) are ignored and the sample data are then analyzed using classical methods based on simple random sampling. In this section we account for informative sampling design in the analysis of survey data using Bayesian inference.

If y_s 



y₁,y_n



 are independent and identical random variables from f_s



y_i |



, then we can write the sample joint conditional pdf of y_s 



y₁,y_n



 given , as:





 









 n

i

i s s

s s

s L f y

f

1

|

| y  

y (12)

Thus sample joint pdf of y_s 



y₁,y_n



 and  is:



_s 

  

 _s

 

_s

s q L

(8)

Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 334

If  is a continuous random variable, then Bayes sample predictive pdf of







 _n

s y1,y

y

is:

 









 

 



 m , d q L  d

m_s y_s _s y_s _s y_s (14)

Hence the conditional sample pdf of _sy_s or the sample posterior likelihood:









 



 



 

s s

s s s s

L q

m L q L q

y y

y y y

  

| |

(15)

Now we are in a position to deal with Bayesian inference, via the sample posterior distribution and the sample posterior likelihood. This provides the following representation of the sample posterior pdf of |y_s:

Sample Posterior information = Prior information + Sampling design information + Sampling information

And in terms of pdfs:

Sample Posterior pdf = Prior pdf+ Sample Likelihood function = Prior pdf+ sampling design+ population pdf

Therefore the sample posterior pdf of |y_s summarizes the total information, after viewing the sample data, collected under informative sampling design, and provides a basis for sample posterior or Bayesian inferences concerning the population parameters of interest.

The generalized maximum likelihood estimator (MLE) ˆ_g  of   is the value of

 that maximizes the sample posterior likelihood function, L_s



|y_s



. That is, ˆ_g  is the most likely value of , given the prior and the sample data y_s 



y₁,y_n



. Obviouslyˆ_g has the interpretation of being the "most likely" value of  given the

prior and the sample data y_s 



y₁,y_n



.

From Bayesian viewpoint, estimating the population parameter  is obtained by choosing a decision function 

 

y_s , which depends on a loss function L





 

y_s ,



_{, whose} conditional expectation under the sample posterior distribution is minimum. That is,

 





 





 













 

|

, argmin

 

 

  

d q

L L E

s s s

s

y y

y y y

(9)

If L





 

y_s ,









 

y_s 



2 (squared error loss function), then it is easy to show that, the sample Bayes estimate of  is given by:

 

y_s _B E_s



y_s



 ˆ ₍₁₇₎

and the sample Bayes estimate of  

 

  is given by:

 



s



s

B E   y

ˆ ₍₁₈₎

Moreover, the 100



1



% highest sample posterior density (HSPD) credible set for

 

 , is the subset C of  of the form:



  



 q  k 



C  : _s |y_s  (19)

where y_s 



y₁,y_n



 , and k

 

 is the largest constant such that:



C s



1

P y (20)

5. Application 1 – Normal Superpopulation Model

According to the definition of sample posterior distribution, to consider the Bayesian inference, we need to specify: f_p



y_i|



and q

 

 , and E_p



_i|y_i,



.

From now on, we denote the parameter index the population distribution by _p, and the parameter index the sample distribution by _s. Consider the following population model:



2



, ~

| _p _p _p 

i N

y   (21)

be independent random variables, i1,...,N , where 2 0_{is known,}i1,...,N. Let







 _n

s y1,y

y be the sample data. Suppose that

 

i i

 

i

p y ay

E  exp ₁

(22)

Using equation (1), it is easy to verify that the sample model is:



2



, ~

| _s _s _s 

i N

y   (23)

where _s _p a₁2 _s



_p,a₁,2



is a linear function of_p. So in order to find the Bayes estimator of _p, we (for simplicity and to makes life easy) choose a prior distribution for _s and then derive the Bayes estimator for _p, via the relationship between p and s. We consider different cases based on prior distribution.

(10)

Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 336

(a) The sample posterior distribution of _sy_s is given by:

   

 

 



 ₂ ₂ ₂ 2 2 ₂

2 2

, ~

 

  

  

n n

y n

N s

s

s y (24)

where  



n_

i i

s n y

y

1 1

.

so that





2 2

 

 

n y n

Es s s s

  

 y (25)

and









2

2 2

2 2 2

  

 

  

 

  

n n n

V_s _s y_s (26)

Now, for given



a₁,2,2,



and using _s _p a₁2, we have









2 1 2 2

2 2

2 1



 

 



a n

y n

a E

E

s s s

s s p s

 

 

  

 y y

(27)

and













2 2

2 2 2 1

 

  

n V

a V

V

s s s

s s

s s p s

   

   

y

y y

(28)

Furthermore, the sample posterior distribution of p ys is given by:

   

 

 

 

 2 ₂ 2 2 ₂

1 2 2

2 2

, ~

 

   

  

n a

n y n

N s

s

p y (29)

Hence the sample and population posterior pdfs of _p y_s belong to the same family of normal distributions, but the mean under the sample posterior pdf is different from the mean under the population posterior pdf of p ys. Furthermore if a1 0, that is the

sampling design is ignorable, then the sample and population posterior pdfs are similar.

(b) For given



a₁,2,2,



, the Bayes estimate of _p under the sample posterior pdf is:













 

2

1

2 1 1

2 1 2 2

2 2

ˆ

1 1

ˆ

 

 

  



 

   

a srs

a y a

y

a n

y n E

pB

s s

p s pB

 

       

 

 



 y

(11)

where

 

2 2

2

 

 

 

n n

(31)

and

 





_s

pB srs   y

ˆ   1 (32)

is the Bayes estimate of _p_p under noninformative sampling design. Note that:

(1) If a1 0(that is, sampling design is noninformative), then the Bayes estimate of 



p

 based on the population model and sample model are coincide.

(2) If a1 0 and yi 0, then





i

y a i i

p y e

E  _|  1 _{is an increasing function of}

i y , so that larger values are more likely to be in the sample than smaller values. In this case, ˆ_pB

 

srs overestimate _p. The Bayes estimate ˆ_pB ˆ_pB

 

srs a₁2

adjusts ˆ_pB

 

srs by the positive quantitya₁2. (3) If a₁ 0 and y_i 0, then





ayi

i i

p y e

E  _|  1 _{is a decreasing function of}

i y , so that smaller values are more likely to be in the sample than larger values. In this case, ˆ_pB

 

srs underestimate_p_p. The Bayes estimate ˆ_pB ˆ_pB

 

srs a₁2

adjusts ˆ_pB

 

srs by the negative quantity a₁2.

(4) ˆ_pB a₁2 ˆ_sB is a weighted average of  (prior mean) and y_s (the maximum likelihood estimator of _s ), with weights  and 1, respectively.

(5) ˆ can be written as: _pB









2

1 2 2

2 2

ˆ _ _

 

 

 a

n n y

n s

pB 

 



 (33)

so that,









2 1

2 1 2 2

2 2

lim ˆ

lim



   

 

  

a y

a n

n y

n s

s n

pB n

 

   

 

 

 



  

 ₍₃₄₎

Hence, for large sample size n, the Bayes estimate of_p_p, shrinkage to y_s a₁2. That is, as the sample size n increases, the influence of the prior distribution on posterior inferences decreases.

(12)

Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 338

(c) It is easy to verify that for given



a₁,2,2,



, the generalized MLE of _p is given by:









2

1 2 1

1

ˆ

 



  



a y

y

s s s pg

   

   

(35)

which is the sample Bayes estimate of _p_p.

(d) Furthermore, the100



1



% highest sample posterior density (HSPD) credible set for _p, is given by:









_

  

  

 

  

 2 ₂ 2 2 ₂

2 1

1

 

  



 

n z

a y

C _s (36)

where is the 100



1 2



% percentile of the standard normal distribution.

Note that if a1 0_{(that sampling design is noninformative), then the} 100



1



%

highest sample posterior density (HSPD) credible set for , is given by:



 

_

  

  

 

 

 1 ₂ ₂ 2 2 ₂

 

  

 _

n z

y

C _s (37)

which is the classical 100



1



% highest posterior density (HPD) credible set for _p.

Not that



q

 

_s ;_s 



is a conjugate family for



f_s



y_i |_s



;_s 



, because



s i



s y

q  | is in the class of



q

 

_s ;_s



.

Case 2: Assume that the prior distribution for _s _p a₁2 ~ N



a₁2,2



with

known



1



2

, , a

 . Under this case we have:





   

 

 

 1 ₂2 2 ₂ 2 ₂ 2 2 ₂

, ~

 

  



   

n n

y n a

N s

s

s y (38)







 





2

1

2 1

1

 

  



a E

a y E

s p p

s s

 



 

  

y y

(39)



s s



p



p s



s V

n

V y   y

 

 ₂ 2 2 ₂  

 

(40)















2



1 2

1  1  y a

a E

(13)

For given



a₁,2,2,



, the Bayes estimator of _s under the sample posterior pdf is:







 

2

1

ˆ _ _ _ _

_sB E_s _s y_s    y_s  a

(42)

where

   

 



 ₂ 2 ₂

 

 

n n

Hence, the Bayes estimator of _p is:











2



1 2 1

1

ˆ

 



  



a y

na y

y

s s s p

 

 

   

(43)

It is easy to verify that for given



a₁,2,2,



, the generalized MLE of _s_s is given by:



 

2

1

ˆ _ _ _ _

_sg    y_s  a

(44)

So that:











2



1 2 1

1

ˆ

 



  



a y

na y

y

s s s pg

 

 

   

(45)

The 100



1



% highest sample posterior density (HSPD) credible set for _s , is given by:



 

_

  

  

 

 



 2 ₂ 2 2 ₂

2 1

1

 

  

 

 

n z

a y

C_s _s

(46)

So that for _p _sa₁2 is:









_

  

  

 

 



 2 ₂ 2 2 ₂

2 1

1

 

  



 

n z

a y

C _s (47)

Not that



q

 

_s ;_s





f_s



y_i|_s



;_s



, because q_s



_s |y_i



is in the class of



q

 

_s ;_s



.

Case 3: Assume that the prior distribution for _p ~ N



,2



with known

 

,2 . Similar to Cases 1 and 2, we have:

   

 

 

 

 ₂ ₂ 1 2 2 ₂ 2 2 ₂

2 2

, ~

 

  



  



n n

na y n

N s

s p y

(14)

Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 340

so that







p s



p



p s



p

s s s

p s

V na E

n na

n y n

n na y n E

y y

y

 

 

 

  

  

 

1

2 2

2 2 1 2

2 2 2

2 2

2 2 1 2

2

 

  

  

 

  



(49)

and













p



p s



s p s

V n

n n V

y y

  

   

2 2 2

2 2



 

  

 

(50)

which is similar to Case 2 (a).

(b) For given



a₁,2,2,



, the Bayes estimator of _p under the sample posterior pdf is:



















2



1

2 1

1

1 1

ˆ

 



  

 

a y

E

s

s s

p s pB

 

 

     

 y

(51)

where

   

 



 ₂ 2 ₂

 

 

n n

which is similar to Case 2 (b).

Note that, here, ˆ is a weighted average of (prior mean) and _pB



2



1

a

ys  (the

maximum likelihood estimator of _p_p), with weights and , respectively.



a₁,2,2,





2



1

ˆ _ _ _

_pg  y_s  y_s  na

(52)

which is the sample Bayes estimate of _p - similar to Case 2 (c).

(d) The 100



1











_

  

  

 

 



 2 ₂ 2 2 ₂

2 1

1

 

  



 

n z

a y

C s (53)

which is similar to Case 2 (d).



(15)

Note that if a₁ 0(that sampling design is noninformative), then the 100



1



% highest sample posterior density (HSPD) credible set for _p, is given by



 

_

  

  

 

 

 1 ₂ ₂ 2 2 ₂

 

  

 _

n z

y

C _s (54)

which is the 100



1



% highest posterior density (HPD) credible set for _p.

Hence, the prior distribution for _p ~ N



,2



with known

 

,2 is equivalent to the prior distribution _s _p a₁2 ~ N



a₁2,2



with known



,2,a₁



, in the sense that, they give the same predictor of _p .

Not that



q

 

_p ;_p





f_s



y_i |_s



;_s 



, because



p i



s y



q

 

_p ;_p



.

It should be noted that, the similarity between Case 2 and Case 3 is interpreted as

follows: since under Case 3, _p ~ N



,2



, therefore



2 2



1 2

1 ~ N  a ,

a p

s   

 , which is the prior distribution under Case 2. The

situation is different under Case 1, where_s _p a₁2 ~ N



,2



.

Case 4: Suppose that the prior distribution of _s is noninformative or improper:

 

_s   _s 

q 1 ,-  (55)

(a) Since y_s is a sufficient statistics for _s _p a₁2, therefore, the conditional sample pdf of _s y₁,...,y_n or the sample posterior likelihood is:



   





  

















 

 



2 2

2

s s

2 exp 2

1

| |

s s s

s s

s s s s s

s s

y n n

y L q

y q

L q q

 

 

 

  

 y y

(56)

Hence the posterior distribution of _s y_s is given by:

   

  

n y N _s s

s

2

,

~ 

y (57)

so that



s s



s

s y

E  y  (58)

and





n

V_s _s _s

2

 

(16)

Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 342

We are interested in sample posterior distribution of p ys.

Since _p _s a₁2, therefore, sample posterior distribution of _p y_s is given by:

   

 

 

n a

y N _s s

p

2 2 1 ,

~  

y

(60)

so that









2

1 2 1



a E

a y E

s p p s s p s

  

  

y y

(61)

and



p s



p



p s



s V

n

V  y    y

2



(62)

Note that the sample and population posterior pdfs of p ys belong to the same family

of normal distributions, but the mean under the sample posterior pdf is different from the mean under the population posterior pdf of _p y_s. Furthermore if a1 0, that is the

sampling design is ignorable, then the sample and population pdfs are similar.

(b) For given



a₁,2



, the Bayes estimator of _p under the sample posterior pdf is





2

1

ˆ _

_pB E_s _py_s  y_s a

(63)

which is the same as the maximum likelihood estimator of _p under the sample model: y_i |_s _s ~ N



_s,2



where_p _sa₁2.



a₁,2



2 1

ˆ _

_sg  y_s a (64)

(d) The 100



1







_

  

  

 



n z a

y

C _s

2 2 2 1



 

(65)

Note that if a₁ 0(that sampling design is noninformative), then the 100



1



 

_

  

  

 

n z y

C _s

2 2





(66)

(17)

Not that



q

 

_s ;_s 





f_s



y_i |_s



;_s 



. Because



_s _i



s y



q

 

_s ;_s



.

6. Application 2 – Bernoulli Superpopulation Model

Let y_i |_p _p ~ Ber

 

_p are independent random variables, where _p

 

0,1 ,

N

i1,..., . Suppose that the sample inclusion probabilities have expectations:













 





1

0 exp exp 0 , , exp Pr | 1 1 0 0 0 1 0 1 0               i i i i i i p y if y if a a a a a y a a y s i y E    (67)

So that, a₀ ln₀ and a₁ ln



₁ ₀



. In this case ₀and₁ are known before sampling. We can show that the sample distribution of y_i is:





















  

1



, 0,1 1 1 1 Pr Pr 1 1 0 1 0 0 1 1                               i y s y s y p p p y p p p p i p i s i s y s i y f y s i y f i i i i                 (68)

Thus the distribution of y_i ,is is the same as the distribution of y_i ,iU, except that the population parameter _p changes to:





s



p

  

s p

p p

p

s _ _ _ _      

 

  

 

 1 0

0 1 1 , , 1 (69)

which is not a linear function of _p.

Notice that if a1 0, or equivalently 0 1, that is, the sampling design is ignorable

then the sample and population distributions are the same.

Case 1: Now suppose that the prior distribution of _p is _p ~ B



,



, where



,



are known. The conditional sample pdf of _p y_s or the sample posterior likelihood is given by:



_p _s

  

 _p _s



_s _p



_p

s q L

q  |y  y  ,

(70) where

 





 





















i s

s n ny

(18)

Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 344

which cannot be written in the form of beta distribution.

Not that



q

 

_p ;_p



is not a conjugate family for



f_s



y_i |_p



;_p



, because



p i



s y

q  | is not in the class



q

 

_p ;_p



. Recall that











 1  1 1  0



_s  _p _p   _p is not a linear function of _p.

Case 2: To makes life easy, assume that the prior distribution of _s is _s ~B



,



. It is easy to verify that, the conditional sample pdf of s ys or the sample posterior likelihood

is given by:



_s _s





 

_s _s



_s _s



_s

s q L

q  |y  y  ,

(72)

where

 





 









1





1

1 ,

1

|

   



 





s

s n ny

s y

n s n

i

s i s s s

s s s

Beta

y f q

L q



 _

  

 



 y

(73)

Hence,









1





1

1 ,

1

|   s  nnys

s y

n s s

s s

Beta

q    

 

 y (74)

After some algebra, we can show that the sample posterior distribution of _sy_s is given by:



s s



s

s B ny nny

 y  ,

(75)

So that





n y n

y n n y n

y n E

s

s s

 

 

   

 



 



 



y

(76)

and







 





1

2 _ _ _

 

  

 

n n

y n n y n

V s s

s s s

  



 

y (77)

Not that



q

 

_s ;_s 





f_s



y_i |_s



;_s



. Because



s i



s y

q  | is in the class



q

 

_s ;_s



.

Hence, for given



,,₀,₁



, the Bayes estimate of s under the sample posterior

pdf is given by:





n y n

E s

s s s

sB _ _

   

 



ˆ y

(19)

But we are interested in the Bayes estimate of _p. For this, using the relationship between _s and _p, we get:























                   s s s sB sB sB pB y n n y n y n 1 0 0 1 0 0 ˆ 1 ˆ ˆ ˆ (79)

Notice that if ₀ ₁, that is, the sampling design is ignorable, then

 

srs n

y n

pB s

pB   

 ˆ _ ˆ     (80)

which is the classical Bayes estimate of _punder simple random sample.

The Bayes estimate ˆ can be written as: _pB

















                                        s s s s s pB y n n y n y y n n y n n 1 0 0 1 0 0 ˆ ₍₈₁₎

which is a linear combination of y_s- maximum likelihood estimator of _p, under the population model y_i |_p _p ~ Ber

 

_p , and



  



- mean of _s:B



,



.

Furthermore, we have:























s



s s s s s s s n pB n y y y y n n y n y y n n y n n                                              1 lim ˆ lim 1 0 0 1 0 0 1 0 0                    (82)

Hence, for large sample size n, the Bayes estimate of_p_p, shrinkage to:





ˆ

 

inf 1 1 0 0 p s s s y y y _       (83)

where ˆ_p

 

inf denotes the maximum likelihood estimate of _p under informative sampling design, treating _p as fixed unknown constant.

Notice that if ₀ ₁, that is, the sampling design is ignorable, then _pB _s

n  y

ˆ

lim .

(20)

Pak.j.stat.oper.res. Vol.XIII No.2 2017 pp327-353 346

design is informative, then for large sample size n, the Bayes estimate of _p becomes











₀y_s ₀y_s ₁ 1y_s



.

It is easy to verify that for given



,



, the generalized MLE of _s_s is given by:

 

 

   

  

 

    

 

 

   

 

  

 

 

n y

n n

n y n

s s sg

ˆ

(84)

which is the sample Bayes estimate of _s. Hence using the relationship between _s and _p, we have





















  

   

 

s s

s pg

y n n y

n

y n

1 0

0

ˆ

(85)

Notice that if ₀ ₁, that is, the sampling design is ignorable then, then

pg s

sg

n y

n _

 



ˆ _ ˆ

 

 

(86)

which the Bayes estimate of _p_p under noninformative sampling design.

The important feature of this formula is that, if _p is treated as fixed unknown constant (classical or frequentist statistics), that is,   0, and the sampling design is noninformative, then for large sample size the Bayes estimate of _p becomes y_s which is the MLE of _p.

7. Bayes Prediction of Finite Population Total

Sverchkov and Pfeffermann (2004) use sample and sample complement distributions for the prediction of finite population totals under informative sampling for single-stage sampling designs. Later Eideh and Nathan (2009) extend the theory to general linear functions of the population values and to two-stage informative cluster sampling. None of the above studies consider prediction of finite population total from Bayesian perspective. In this section we use sample posterior and sample complement posterior distributions, to predict the finite population total under single-stage informative sampling design. Let



 

 



 



s i

i s

i i c

i i s

i i N

i

i y y y y

y T

1

(87)

The available information for the prediction process is







 





y i s I i U N n



O _i,_i ,   _i,  , , , where I_i 1 for is and I_i 0 for is.

(21)



   

 



i c p i c p i p c y m y f q y

q  |   |

(88)

where, f_c



y_i|



is the sample complement distribution of y_i given _p _p, which is given by:













p i p p i p i i p p i c E y f y E y f         1 | | 1 | (89)

Also, we have:





























p i s p i i s p i p p i i p p i c w E y w E E y E y E        | 1 | 1 | 1 | 1 |       (90) and

 

  



  



        





  p p p p i c p p p p i c p i c y f q d y f q y m discrete is if | continuous is if |      (91)

Hence, q_c



_p<