ON ESTIMATION OF POULATION MEAN USING INFORMATION ON AUXILIARY ATTRIBUTE

(1)

Using Information on Auxiliary Attribute

Rajesh Singh

Department of Statistics BHU, Varanasi (U.P.), India [email protected]

Abstract

We consider the problem of estimating the finite population mean when some information on auxiliary attribute is available. We obtain the mean square error (MSE) equation for the proposed estimators. It has been shown that the proposed estimator is better than Naik and Gupta (1996), Singh et al. (2008), Abd-Elfattah (2010) estimators. The results have been illustrated numerically by taking some empirical population considered in the literature.

Keywords: SRSWOR, Attribute, Point bi-serial correlation, MSE, Efficiency.

1. Introduction

In survey sampling the use of auxiliary information can increase the precision of an estimator when study variable y is highly correlated with the auxiliary variable x. But in several practical situations, instead of existence of auxiliary variables there exists some auxiliary attributes, which are highly correlated with study variable y, such as (i) use of drugs and gender (ii) amount of milk produced and a particular breed of cow.

Consider a sample of size n drawn by simple random sampling without replacement (SRSWOR) from a population of size N. Let yi and _i denote the observations on

variable y and  respectively for the ith unit (i=1,2,3....N.). It is assumed that attribute  takes only the two values 0 and 1 according as

 = 1, if ith unit of the population possesses attribute  = 0, if otherwise.

Let



 

 N 1 i

i

A and



 

 n 1 i

i

a denote the total number of units in the population and

sample possessing attribute  respectively, N A p and

n a

p denote the proportion of

units in the population and sample, respectively, possessing attribute.

Define,





Y Y y

e_y  





,

P P p e_  

 

e 0,



iy,



E _i

 

e fC ,

(2)

Where

   

   

N 1 n 1

f ,

Y S C

2 2 y 2

y  ,

P S C

2 2 p 2 p 

and

 

 

S S

S

y y

pb is the point biserial correlation coefficient.

Here,







 



 N

1 i

2 i 2

y y Y

1 N

1

S ,







   _  

N

1 i

2 i 2

P 1

N 1

S and

   

 

  





 

N

1 i

i i

y y NPY

1 N

1 S

In order to have an estimate of the population mean Y of the study variable y, assuming the knowledge of the population proportion p, Naik and Gupta (1996) defined following ratio and product estimators

      

p P y

t_NGR (1.1)

      

P p y

t_NGP (1.2)

The mean square error (MSE) of t_NGR and t_NGP up to the first order of approximation, respectively, are







2 _pb _y _p



p 2 y 2

NGR fY C C 2 C C

t

MSE     (1.3)







2 _pb _y _p



p 2 y 2

NGP fY C C 2 C C

t

MSE     (1.4)

2. Other estimators

Singh et al. (2008)suggested the following ratio estimator







 

1 2



2 1

S m P m

m p m

p P b y

t 

  

  (2.2)

where m1 (≠0) and m2 are either real numbers or the functions of the parameters of the

attribute such as C_p_,₂

 

 and_pb.

In Singh et al. (2008), MSE equation of these ratio- type estimators were given by

 





2





pb 2

y 2

S f RS S 1

t

MSE  _   (2.3)

(3)

Abd-Elfattah et al. (2010) proposed some ratio type estimators. The minimum MSE attained in Abd-Elfattah et al. (2010) was









2





pb 2

y Abd

min t fS 1

MSE   (2.4)

The minimum MSE of tAbd is equal to the MSE of regression estimator

) p P ( ˆ y

t_reg    .

(MSE(t_reg)f



S2_y



1_pb2





).

Shabbir and Gupta (2007) considered following estimator









_

      



p P P p d d y

t_SG ₁ ₂ (2.5)

where d and ₁ d are constants and whose sum is not necessarily equal to one. ₂

The optimum MSE reported by Shabbir and Gupta (2007) of t_SGis

 



₂



pb 2

y 2 pb 2

y 0

SG

1 fC 1

1 fS t

MSE

  

  

Unfortunately the expression obtained by Shabbir and Gupta (2007) is incorrect. The corrected MSE of t_SGis given as-

 

    

  

   

          

2 2 3 1

5 4 2 2

4 3 2 5 1 2 min

SG

2 Y

t MSE

(2.6) where,



C 3C 4 C C



, f

Y

Y2 2 2_y 2_x _y _x

1    

 ₂ XYf



2C2_x C_yC_x



,

, fC X2 2_x

3 

 ₄ Y2f



C2_x C_yC_x



Y2, ₅ XYfC2_x.

3. The proposed estimator

We define a family of ratio estimators of population mean Y as

(3.1) m

p m

m P m y y t

2 1



 _

  

 

  

  

where m and ₁ m are same as defined in (2.2) and ₂ ₁ and ₂ are real constants to be determined such that the MSE of t_ is minimum.

Remark 1: Here we would like to mention that the choice of the estimator depends on the availability and values of the various parameter(s) used (for choice of the parameters

1

(4)

Expressing t_in terms of e’s we have















  Y 1e  1e

t ₁ _y ₂ (3.2)

where .

b ap

aP

  

Expanding the right hand side of (3.2) and retaining terms up to second power of e’s, we have









_

  

 

   



 __ _  _ _ __

   

 _ _ _

 e e e e

2 1 e

1 e

1 Y

t ₁ _y ₂ 2 2 _y _y (3.3)

Subtracting Y from both side of (3.3) and then taking expectations, we get the bias of the estimator t_up to the first order of approximation, as

 









_

  

 

   



  _ __ 

    

 _

 1 2 2 2C2p CyCp

2 1 f

1 Y

t

B (3.4)

Subtracting Y from both side of (3.3), squaring and then taking expectations, we get MSE of the estimator t_up to the first order of approximation, as

 

t Y



A A 2 A 2 A 2 1



MSE _  2₁2 ₁2₂ ₂ ₁₂ ₃ ₂ ₄ ₁ (3.5)

where

, fC 1 A₁  2_y







C C 4 C 1 C



,

f 1

A₂   2_y22 2_p _ _yCp  2 2_p





, C C 2

C C 2

1 f

1

A₃ 2 2_p 2_y _y _p

   



  _ _ _ _



 _





, C C C

2 1 f

1

A₄ 2 2_p _y _p

   



  _ __ 

 _

2 3 2 1

4 3 2 * 1

A A A

  

 and .

A A A

2 3 2 1

3 4 1 * 2

  



Substituting these optimum values of ₁*and *₂ in (3.5), we get the minimum MSE of



t as-

    

  

  

 

 ₂

3 2 1

4 3 2 4 1 2 2

min

A A A

A A 2 A A A 1 Y )

t (

(5)

4. Another estimator

Singh et al. (2007) suggested exponential ratio type and exponential product type

estimators, respectively, as

     

  

p P

p P exp y t_SR

(4.1)

     

  

P p

P p exp y t_SP

(4.2)

MSE expressions for the estimatorst_SR and t_SP are given, respectively, as

 

    

  

  

 _ _y _p

2 p 2 y 2

SR C C

4 C C Y f t

MSE

(4.3)

 

    

  

  

 _ _y _p

2 p 2 y 2

SP C C

4 C C Y f t

MSE

(4.4)

Using (3.1) and Singh et al. (2007) estimator, we define another family of estimators for population mean Y as













 





   

 

  

   

  

 

  

 

b ap b aP

b ap b aP exp b

ap b aP p P w y w

t_w ₁ ₂

(4.5) where w and ₁ w are constants and whose sum is not necessarily equal to one. ₂

The Bias and MSE expressions of t_w are respectively, given by

 











2 ₁ _y _x



x 2 1

1

p w 1Y f w YA w PBC w YB C C

t

Bias      

(4.6)



1



2 2 12



1 3



22 2

p) w 1 Y w m 2m w m

t (

MSE     



₄ ₅



₁ ₃ ₂ ₅

2

1w m m 2w m 2w m

w

2    



(4.7) where,



 





4 1 2 4



, 8

A

2

         

 ,

2

B 

  



 

  



C B C 2B C C



, f

Y

m₁ 2 2_y 2 _x2   _y _x m₂ X2f

 

C2_x ,



AC 2B C C



, f

Y

m₃  2 2_x   _y _x m₄ YXf



BC2_x C_yC_x



,



BC



. f

Y X

(6)

Differentiating equation (4.7) with respect to w1 and w2 and than equating to zero we get



₂



2 2 3 1

5 2 4 3 * 1

L L L

L L L L w

 

 and



₂



2 2 3 1

4 2 5 1 * 2

L L L

L L L L w

  

where



Y m 2m



,

L₁ 2  ₁ ₃ L₂ 



m₄m₅



, L₃m₂,



m Y



,

L₄  ₃ 2 L₅ 



m₅



.

Substituting these optimum values of w and ₁* w in (4.7), we get the minimum MSE of *₂

p

t as-

 

    

  

  

 

2 2 3 1

5 4 2 2

4 3 2 5 1 2 min

p

L L L

L L L 2 L L L L Y t

MSE

5. Efficiency comparison:

First, we compare the efficiency of proposed estimator t_with usual estimator and than with regression estimator.

The variance of the usual estimator y is given by

2 y

fC ) y (

V  (5.1)

 

t V

 

y MSE _ _min 

2 y 1 2 2

3 2 1

4 3 2 4 1

2 _Y _f _C

A A A

A A 2 A A A

1 

    

  

  



(5.2)

On solving, we observe that above condition always holds true. Therefore, proposed estimator t_under optimum condition performs better than usual estimator.

Similarly, it can be shown that

 

t min MSE

 

reg MSEmin



tAbd



MSE _  

If,

 

1 . C

f Y A

A A

A A 2 A A A 1

Y 2 ₁ 2_y 2

2 3 2 1

4 3 2 4 1 2 2



  

    

  

  



(5.3)

(7)

Next, we compare the efficiency of proposed estimator t with usual estimator and than _p with regression estimator.

 

t V

 

y MSE

min

p 

If,

. C Y f L

L L

L L L 2 L L L L

Y ₁ 2 2_y

2 2 3 1

5 4 2 2

4 3 2

5 1

2 _

    

  

  



(5.4)

On simplification, we observe that above condition is always true. Therefore proposed estimator

 min

t_w performs better than usual estimator in all situations.

Similarly it can be shown that

 

_min



_Abd



min

p MSE reg MSE t

t

MSE  

If,

 

2 2

y 1 2 2

2 3 1

5 4 2 2

4 3 2

5 1

2 _Y _f _C ₁

L L L

L L L 2 L L L L

Y  _

    

  

  



(5.5)

This is also true for all values of (1,0,1.)and (1,0,1.)

Finally we have compared the efficiency of proposed estimator t_wwith the estimator t _p

 

_min

min

p MSE t

t

MSE  _

Or if,

    

  

  

 

    

  

  



2 3 2 1

4 3 2 4 1 2 2

2 2 3 1

5 4 2 2

4 3 2

5 1 2

A A A

A A 2 A A A 1 Y L

L L

L L L 2 L L L L Y

(5.6) The conditions depends upon choice of  and .

6. Empirical study

We have used the data given in Sukhatme and Sukhatme ((1970) p. 256). Where,

Y : Number of villages in the circle and

_{: represent A circle consisting more than five villages.}

The following Table shows percent relative efficiencies (PRE’s) of different estimator’s with respect to usual estimator.

n N _Y P _pb Cy _C_p

(8)

Table 1: PRE of different estimators with respect to usual estimator

Estimator PRE

y

100

NGR

t

12.648

RER

t _60.603

) opt ( s

t

170.488

min SG)

t (

172.120

. min

) t ( _

173.132

w

t

1,0

1 , 0   

1 , 1  

172.120

187.804

392.62

Conclusion

From Table 1, one can see that the proposed estimator tα under optimum condition performs better than the Shabbir and Gupta (2007) estimator, Singh et al. (2008) estimator and usual estimator. Also, the performance of the second proposed estimator tw depends upon choice of α and β. For α = 1, β = 1, it attains maximum efficiency.

References

1. Abd-Elfattah, A.M. El-Sherpieny, E.A. Mohamed, S.M. Abdou, O.F. (2010). Improvement in estimating the population mean in simple random sampling using information on auxiliary attribute. Applied Mathematics Computation, doi:10.1016/j.amc.2009.12.041.

2. Naik, V.D., Gupta, P.C. (1996). A note on estimation of mean with known population of an auxiliary character. Journal of Indian Society of Agricultural Statistics 48(2) 151–158.

3. Shabbir, J., Gupta, S. (2007). On estimating the finite population mean with known population proportion of an auxiliary variable. Pakistan Journal of Statistics 23 (1) 1–9.

(9)

5. Singh, R. Chauhan, P. Sawan, N. Smarandache, F. (2008). Ratio estimators in simple random sampling using information on auxiliary attribute. Pakistan journal of statistics and operations research 4 (1) 47–53

6. Singh, R., Kumar, M. and Smarandache, F. (2008). Almost unbiased estimator for estimating population mean using known value of some population parameter(s). Pakistan journal of statistics and operations research 4(2) pp63-76.

7. Singh, R. and Kumar, M. (2009). A note on transformations on auxiliary variable in survey sampling. Model Assisted Statistics and Applications (Accepted).