Using Information on Auxiliary Attribute
Rajesh Singh
Department of Statistics BHU, Varanasi (U.P.), India [email protected]
Abstract
We consider the problem of estimating the finite population mean when some information on auxiliary attribute is available. We obtain the mean square error (MSE) equation for the proposed estimators. It has been shown that the proposed estimator is better than Naik and Gupta (1996), Singh et al. (2008), Abd-Elfattah (2010) estimators. The results have been illustrated numerically by taking some empirical population considered in the literature.
Keywords: SRSWOR, Attribute, Point bi-serial correlation, MSE, Efficiency.
1. Introduction
In survey sampling the use of auxiliary information can increase the precision of an estimator when study variable y is highly correlated with the auxiliary variable x. But in several practical situations, instead of existence of auxiliary variables there exists some auxiliary attributes, which are highly correlated with study variable y, such as (i) use of drugs and gender (ii) amount of milk produced and a particular breed of cow.
Consider a sample of size n drawn by simple random sampling without replacement (SRSWOR) from a population of size N. Let yi and i denote the observations on
variable y and respectively for the ith unit (i=1,2,3....N.). It is assumed that attribute takes only the two values 0 and 1 according as
= 1, if ith unit of the population possesses attribute = 0, if otherwise.
Let
N 1 i
i
A and
n 1 i
i
a denote the total number of units in the population and
sample possessing attribute respectively, N A p and
n a
p denote the proportion of
units in the population and sample, respectively, possessing attribute.
Define,
Y Y y
ey
,P P p e
e 0,
iy,
E i
e fC ,Where
N 1 n 1
f ,
Y S C
2 2 y 2
y ,
P S C
2 2 p 2 p
and
S S
S
y y
pb is the point biserial correlation coefficient.
Here,
N
1 i
2 i 2
y y Y
1 N
1
S ,
N
1 i
2 i 2
P 1
N 1
S and
N
1 i
i i
y y NPY
1 N
1 S
In order to have an estimate of the population mean Y of the study variable y, assuming the knowledge of the population proportion p, Naik and Gupta (1996) defined following ratio and product estimators
p P y
tNGR (1.1)
P p y
tNGP (1.2)
The mean square error (MSE) of tNGR and tNGP up to the first order of approximation, respectively, are
2 pb y p
p 2 y 2
NGR fY C C 2 C C
t
MSE (1.3)
2 pb y p
p 2 y 2
NGP fY C C 2 C C
t
MSE (1.4)
2. Other estimators
Singh et al. (2008)suggested the following ratio estimator
1 2
2 1
S m P m
m p m
p P b y
t
(2.2)
where m1 (≠0) and m2 are either real numbers or the functions of the parameters of the
attribute such as Cp,2
andpb.In Singh et al. (2008), MSE equation of these ratio- type estimators were given by
2
pb 2
y 2
S f RS S 1
t
MSE (2.3)
Abd-Elfattah et al. (2010) proposed some ratio type estimators. The minimum MSE attained in Abd-Elfattah et al. (2010) was
2
pb 2
y Abd
min t fS 1
MSE (2.4)
The minimum MSE of tAbd is equal to the MSE of regression estimator
) p P ( ˆ y
treg .
(MSE(treg)f
S2y
1pb2
).Shabbir and Gupta (2007) considered following estimator
p P P p d d y
tSG 1 2 (2.5)
where d and 1 d are constants and whose sum is not necessarily equal to one. 2
The optimum MSE reported by Shabbir and Gupta (2007) of tSGis
2
pb 2
y 2 pb 2
y 0
SG
1 fC 1
1 fS t
MSE
Unfortunately the expression obtained by Shabbir and Gupta (2007) is incorrect. The corrected MSE of tSGis given as-
2 2 3 1
5 4 2 2
4 3 2 5 1 2 min
SG
2 Y
t MSE
(2.6) where,
C 3C 4 C C
, fY
Y2 2 2y 2x y x
1
2 XYf
2C2x CyCx
,, fC X2 2x
3
4 Y2f
C2x CyCx
Y2, 5 XYfC2x.3. The proposed estimator
We define a family of ratio estimators of population mean Y as
(3.1) m
p m
m P m y y t
2 1
2 1
2 1
where m and 1 m are same as defined in (2.2) and 2 1 and 2 are real constants to be determined such that the MSE of t is minimum.
Remark 1: Here we would like to mention that the choice of the estimator depends on the availability and values of the various parameter(s) used (for choice of the parameters
1
Expressing tin terms of e’s we have
Y 1e 1e
t 1 y 2 (3.2)
where .
b ap
aP
Expanding the right hand side of (3.2) and retaining terms up to second power of e’s, we have
e e e e
2 1 e
1 e
1 Y
t 1 y 2 2 2 y y (3.3)
Subtracting Y from both side of (3.3) and then taking expectations, we get the bias of the estimator tup to the first order of approximation, as
1 2 2 2C2p CyCp
2 1 f
1 Y
t
B (3.4)
Subtracting Y from both side of (3.3), squaring and then taking expectations, we get MSE of the estimator tup to the first order of approximation, as
t Y
A A 2 A 2 A 2 1
MSE 212 122 2 12 3 2 4 1 (3.5)
where
, fC 1 A1 2y
C C 4 C 1 C
,f 1
A2 2y22 2p yCp 2 2p
, C C 2
C C 2
1 f
1
A3 2 2p 2y y p
, C C C
2 1 f
1
A4 2 2p y p
2 3 2 1
4 3 2 * 1
A A A
A A A
and .
A A A
A A A
2 3 2 1
3 4 1 * 2
Substituting these optimum values of 1*and *2 in (3.5), we get the minimum MSE of
t as-
2
3 2 1
4 3 2 4 1 2 2
min
A A A
A A 2 A A A 1 Y )
t (
4. Another estimator
Singh et al. (2007) suggested exponential ratio type and exponential product type
estimators, respectively, as
p P
p P exp y tSR
(4.1)
P p
P p exp y tSP
(4.2)
MSE expressions for the estimatorstSR and tSP are given, respectively, as
y p
2 p 2 y 2
SR C C
4 C C Y f t
MSE
(4.3)
y p
2 p 2 y 2
SP C C
4 C C Y f t
MSE
(4.4)
Using (3.1) and Singh et al. (2007) estimator, we define another family of estimators for population mean Y as
b ap b aP
b ap b aP exp b
ap b aP p P w y w
tw 1 2
(4.5) where w and 1 w are constants and whose sum is not necessarily equal to one. 2
The Bias and MSE expressions of tw are respectively, given by
2 1 y x
x 2 1
1
p w 1Y f w YA w PBC w YB C C
t
Bias
(4.6)
1
2 2 12
1 3
22 2p) w 1 Y w m 2m w m
t (
MSE
4 5
1 3 2 52
1w m m 2w m 2w m
w
2
(4.7) where,
4 1 2 4
, 8A
2
,
2
B
C B C 2B C C
, fY
m1 2 2y 2 x2 y x m2 X2f
C2x ,
AC 2B C C
, fY
m3 2 2x y x m4 YXf
BC2x CyCx
,
BC
. fY X
Differentiating equation (4.7) with respect to w1 and w2 and than equating to zero we get
2
2 2 3 15 2 4 3 * 1
L L L
L L L L w
and
2
2 2 3 14 2 5 1 * 2
L L L
L L L L w
where
Y m 2m
,L1 2 1 3 L2
m4m5
, L3m2,
m Y
,L4 3 2 L5
m5
.Substituting these optimum values of w and 1* w in (4.7), we get the minimum MSE of *2
p
t as-
2 2 3 1
5 4 2 2
4 3 2 5 1 2 min
p
L L L
L L L 2 L L L L Y t
MSE
5. Efficiency comparison:
First, we compare the efficiency of proposed estimator twith usual estimator and than with regression estimator.
The variance of the usual estimator y is given by
2 y
fC ) y (
V (5.1)
t V
y MSE min 2 y 1 2 2
3 2 1
4 3 2 4 1
2 Y f C
A A A
A A 2 A A A
1
(5.2)
On solving, we observe that above condition always holds true. Therefore, proposed estimator tunder optimum condition performs better than usual estimator.
Similarly, it can be shown that
t min MSE
reg MSEmin
tAbd
MSE
If,
1 . Cf Y A
A A
A A 2 A A A 1
Y 2 1 2y 2
2 3 2 1
4 3 2 4 1 2 2
(5.3)
Next, we compare the efficiency of proposed estimator t with usual estimator and than p with regression estimator.
t V
y MSEmin
p
If,
. C Y f L
L L
L L L 2 L L L L
Y 1 2 2y
2 2 3 1
5 4 2 2
4 3 2
5 1
2
(5.4)
On simplification, we observe that above condition is always true. Therefore proposed estimator
min
tw performs better than usual estimator in all situations.Similarly it can be shown that
min
Abd
min
p MSE reg MSE t
t
MSE
If,
2 2y 1 2 2
2 3 1
5 4 2 2
4 3 2
5 1
2 Y f C 1
L L L
L L L 2 L L L L
Y
(5.5)
This is also true for all values of (1,0,1.)and (1,0,1.)
Finally we have compared the efficiency of proposed estimator twwith the estimator t p
minmin
p MSE t
t
MSE
Or if,
2 3 2 1
4 3 2 4 1 2 2
2 2 3 1
5 4 2 2
4 3 2
5 1 2
A A A
A A 2 A A A 1 Y L
L L
L L L 2 L L L L Y
(5.6) The conditions depends upon choice of and .
6. Empirical study
We have used the data given in Sukhatme and Sukhatme ((1970) p. 256). Where,
Y : Number of villages in the circle and
: represent A circle consisting more than five villages.
The following Table shows percent relative efficiencies (PRE’s) of different estimator’s with respect to usual estimator.
n N Y P pb Cy C p
Table 1: PRE of different estimators with respect to usual estimator
Estimator PRE
y
100
NGR
t
12.648
RER
t 60.603
) opt ( s
t
170.488
min SG)
t (
172.120
. min
) t (
173.132
w
t
1,0
1 , 0
1 , 1
172.120
187.804
392.62
Conclusion
From Table 1, one can see that the proposed estimator tα under optimum condition performs better than the Shabbir and Gupta (2007) estimator, Singh et al. (2008) estimator and usual estimator. Also, the performance of the second proposed estimator tw depends upon choice of α and β. For α = 1, β = 1, it attains maximum efficiency.
References
1. Abd-Elfattah, A.M. El-Sherpieny, E.A. Mohamed, S.M. Abdou, O.F. (2010). Improvement in estimating the population mean in simple random sampling using information on auxiliary attribute. Applied Mathematics Computation, doi:10.1016/j.amc.2009.12.041.
2. Naik, V.D., Gupta, P.C. (1996). A note on estimation of mean with known population of an auxiliary character. Journal of Indian Society of Agricultural Statistics 48(2) 151–158.
3. Shabbir, J., Gupta, S. (2007). On estimating the finite population mean with known population proportion of an auxiliary variable. Pakistan Journal of Statistics 23 (1) 1–9.
5. Singh, R. Chauhan, P. Sawan, N. Smarandache, F. (2008). Ratio estimators in simple random sampling using information on auxiliary attribute. Pakistan journal of statistics and operations research 4 (1) 47–53
6. Singh, R., Kumar, M. and Smarandache, F. (2008). Almost unbiased estimator for estimating population mean using known value of some population parameter(s). Pakistan journal of statistics and operations research 4(2) pp63-76.
7. Singh, R. and Kumar, M. (2009). A note on transformations on auxiliary variable in survey sampling. Model Assisted Statistics and Applications (Accepted).