Some generalized classes of double sampling regression type estimators using auxiliary information

(1)

Original Research

Some generalized classes of double sampling regression type estimators using auxiliary information

Shashi Bhushan^1*, R. Karan Singh² and Arvind Pandey¹

1Department of Statistics, Pachhunga University College, Mizoram University, Aizawl 796 001, India

2Department of Statistics, University of Lucknow, Lucknow 226 007, India

Received 17 December 2010 | Revised 17 January 2011 | Accepted 27 January 2011

ABSTRAC T

In the present study, a double sampling regression type estimator representing a class of estimators is proposed. The bias and mean square error (MSE) of the proposed estimator is obtained. A more generalized class of double sampling regression type estimator utilizing the auxiliary information available at first phase in the form of mean and variance of the auxiliary variable is also proposed.

The bias and MSE of the proposed class were obtained for this class. The concluding remarks show that the proposed classes of estimators are better than that of double sampling estimator based on earlier proposition. An empirical study is included for illustration.

Key words Key wordsKey words

Key words: Auxiliary information; bias; double sampling regression type estimator; mean square error.

Mathematics Subject Classification 2000:

Mathematics Subject Classification 2000: Mathematics Subject Classification 2000:

Mathematics Subject Classification 2000: Primary 62D05 Secondary 62F10

Corresponding author: S. Bhushan Phone.

E-mail: [email protected]

INTRO DUC TIO N

If the supplementary information on the auxiliary variable is not known then the double sampling ratio and regression strategies are very well known. Many biased double sampling ratio type, double sampling regression type transformed estimators and the biased double sampling estimators obtained through parametric linear combination of ratio or regression estimator and the usual unbi- ased estimators are available for estimating

the population mean.^{1- 7} The use of population mean and variance of auxiliary variable for increasing the efficiency of the sampling strat- egy has been discussed recently by Singh,⁸ Bhushan¹ and Bhushan et al.⁹ among others.

Let y be the characteristic under study and x be the auxiliary variable. Thus for a finite population of size N, we denote by

Yi: the observation on the i th unit of the population for the characteristic y under study (i =1,2,...,N),

Xi: the observation on the i th unit of the population for the auxiliary characteristic x under study (i =1,2,...,N),

January-March, 2011 ISSN (print) 0975-6175 ISSN (online) 2229-6026

(2)

,

(where ρ is the population correlation coeffi- cient between x and y) and

: the (p,q) th product mo- ment about mean between x and y.

If the information about the mean and variance of the auxiliary variable is not known then we resort to double sampling. Let the auxiliary characteristic x be observed on a large preliminary simple random sample of size drawn without replacement from a population of size N in the first phase. Also the characteristic of interest y and the auxil- iary characteristic x be observed on the second phase sample of size n drawn from first phase sample by simple random sampling without replacement.

and

be the sample mean and sample variance of auxiliary characteristic x based on first phase sample of size . Also based on second phase sub-sample of size n, let

1

1 ^N

i i

Y Y

N ₌

=

∑

1

1 ^N

i i

X X

N ₌

=

∑

2 2

1

1 ( )

1

N

y i

i

S Y Y

N =

= −

−

∑

2 2

1

1 ( )

1

N

x i

i

S X X

N =

= −

−

∑

2 2

1

1 ( )

N

x i

i

X X

σ

N

=

∑

−

1

1 ( )( )

1

N

xy i i x y

i

S X X Y Y S S

N

ρ

=

= − − =

−

∑

2

xy y

x x

S S

β

= =

ρ

and

be the sample mean of auxiliary characteristic x and characteristic y under study, respec- tively,

and

be the sample variance of characteristic x and characteristic y, respectively,

be the sample covariance between x and y, and

be the sample regression coefficient of y on x.

A PRO P O SED DO UBLE SA MP LIN G ES TI- MA TO R

A proposed double sampling estimator of population mean is

(1)

where θ is the characterizing scalar to be de- termined suitably. Note that for θ=0 the pro- posed estimator reduces to the usual double sampling linear regression estimator

(2)

Let

with

1

1 ( )( )

1

n

xy i i

i

s x x y y

n =

= − −

−

∑

2 xy

x

b s

= s

2

xy xy

s =S +e s_x²=S_x²+e₃

2 2

x x e4

σ

=

σ

+

σ

^'2x =

σ

x²+e4^' 1

1 ( ) ( )

N

p q

pq i i

i

X X Y Y

µ

N

=

∑

− −

n'

'

' '

1

1 ⁿ

i i

x x

n =

=

∑

^{' 2} _' ^' ^{' 2}

1

1 ( )

n

x i

i

x x

σ

n

=

∑

−

n'

1

1 ⁿ

i i

y y

n ₌

=

∑

1

1 ⁿ

i i

x x

n ₌

=

∑

2 2

1

1 ( )

1

n

x i

i

s x x

n ₌

= −

−

∑

² ²

1

1 ( )

1

n

y i

i

s y y

n ₌

= −

−

∑

Y

( ' )

ylrD

= +

y b x

−

x

y = +Y e0 x

= +

X e₁ x^'

= +

X e₁^'

(3)

Then by using (1), we have

(4)

Using the results given in Sukhatme and Sukhatme,⁴ Bhushan¹ and Bhushan et al.⁹

(5)

where and .

Using (4) and (5) it can be seen that

(6) showing that is a biased estimator of population mean. Using (4) and neglecting terms

' '

0 1 1 2 3 4 4

( ) ( ) ( ) ( ) ( ) ( ) ( ) 0

E e =E e =E e =E e =E e =E e =E e =

0 4 21

( ) _n

E e e =

γ µ

E e e( 0 4^')

= γ µ

_n' 21

1 2 21

( ) _n

E e e =

γ µ

E e e( _{1 2}^' )

= γ µ

_n_' ₂₁

1 3 30

( ) _n

E e e =

γ µ

E e e( 1 3^' )=

γ µ

_n' 30

' 2 ' 2

4 4 4 ' 40 20

( ) ( ) _n( )

E e =E e e =

γ µ

−

µ

2 2

( 0 ) _n _y

E e =

γ

S E e( ₁²)=

γ

_nS_x² ( 0 1) _n _xy

E e e =

γ

S E e( ₄²)=

γ µ

_n( ₄₀−

µ

₂₀²)

1 4 30

( ) _n

E e e =

γ µ

E e e( _{0 4})=

γ µ

_n ₂₁

'

0 1 '

( ) _n _xy

E e e =

γ

S E e( ₁^{' 2})=E e e( _{1 1}^')=

γ

_n_'S_x²

' 2 ' 2

4 4 4 ' 40 20

( ) ( ) _n( )

E e =E e e =

γ µ

−

µ

'

0 4 ' 21

( ) _n

E e e =

γ µ

of ei’s having powers greater than two, we get the MSE given by

(7)

which is minimum for the optimum value of θ given by

(8)

and the minimum mean square error of is given by

(9)

A MO RE GENER ALIZ ED CL ASS O F DO U- BLE SA MP LIN G ES TI MA TO RS

A more generalized estimator of population mean is proposed to be

(10)

where ; g(w) is a function of w

such that g(w)=1 at w=1 satisfying the follow- ing conditions:

1. Whatever be the sample chosen, w as- sumes values in the bounded closed interval I of the real line containing the point unity.

2. Within the interval I the function g(w) is continuous and bounded.

3. The first, second and third partial de- rivatives of g(w) exist and are continu- ous and bounded in I.

By expanding g(w) about the point w=1 in the third order Taylor’s series, we have

' '

0 1 1 2 3 4 4

( ) ( ) ( ) ( ) ( ) ( ) ( ) 0

E e =E e =E e =E e =E e =E e =E e =

' ' ' 2 ' '

' 0 4 0 4 4 4 4 ' 1 2 1 2 1 3 1 3

0 2 4 4 2 2 1 1 2 2

x x x xy xy x x

e e e e e e e e e e e e e e e

Y Y e e e e e

Y Y

β

^ S S S S ^

 

= + +  − + − − + +  − + − − + 

YθD = + +Y e e − +e − − + + e − +e − − +

' ' ' 2 ' '

' 0 4 0 4 4 4 4 ' 1 2 1 2 1 3 1 3

0 2 4 4 2 2 1 1 2 2

x x x xy xy x x

e e e e e e e e e e e e e e e

Y Y e Y e e e e

Y Y S S S S

θ

σ σ σ

 

= + +  − + − − + + − + − − +

 

' ' ' '

1 4 1 4 1 4 ' 30

( ) ( ) ( ) _n

E e e =E e e =E e e =

γ µ

' ( ') / '

n N n Nn

γ

= −

( ) /

n N n Nn

γ

= −

( D) ( D) ( )

Bias Yθ =E Yθ − =Y − − −

30

21 21

' 2 2

( ) ( ) ( _n _n)

x Sxy Sx

µ

θµ µ

γ γ β

σ

  

 

= − = −  −  − 

( )

^D ⁽ ⁿ ⁿ^'^)[(1 ²⁾ ^y² ²⁴²⁽ ⁴⁰ ²⁰²⁾ ² ²¹ ² ³⁰^] ⁿ^' ^y²

x x x

Y Y Y

MSE Yθ

γ γ ρ

S

θ µ µ θ µ βθ µ γ

S

σ σ σ

= − − + − + − +

2 2

2 2 2 2

' 4 40 20 2 21 2 30 '

2 2

( _n _n)[(1 ) _y ( ) ] _n _y

x x x

Y Y Y

MSE Y

γ γ ρ

S

θ µ µ θ µ βθ µ γ

S

σ σ σ

= − − + − + − +

30 21 20

2

40 20

( )

opt Y

βµ µ µ θ

⁼

µ

⁻−

µ

( )

min ^' ² ² ^' ²

( )[(1 ) ]

D n n y n y

MSE Yθ =

γ

−

γ

−

ρ

S − +

γ

S

2

2 2 30 21 2

' 2 '

20 2

( )

( )[(1 ) ]

( 1)

n n y n y

MSE Y

γ γ ρ

S

βµ µ γ

S

µ β

= − − − − +

−

Y

( ) ( ' )

Ygd =yg w +b x −x

(4)

(11)

where , and may

depend on w; , and denote the first, second and third order derivatives of g(w) at the point w=1,1 and w^*, respectively.

Using the notations given in (3), we have and

Putting these values in (11) and neglecting the terms of ei’s (i=0,1,2,3,4) and ’s (i=2,4) having powers greater than two, we get

(12)

Taking expectation, we get

(13)

showing that is a biased estimator of population mean. Also, the mean square error of the estimator is given by

(14)

(14) is minimized when

(15)

and the minimum mean square error of is same as that of expression given by (9).

CO NC L UDI N G RE MAR KS

The proposed generalized classes of estimators and are biased and their biases are given by (6) and (13) respectively. The minimum mean square errors of the proposed generalized classes are equal and are given by (9). Therefore, the proposed generalized classes of estimators and are preferred to usual linear regression estimator, ratio estimator, mean per unit estimator and product estimator in the sense of lesser mean square error. In the class of estimators, there ex- ists a subclass of optimum estimators satisfying (15) such that every member of the subclass attains the same minimum mean square error given by (9). For example, for

the estimator belongs to the sub-class of estimators attaining the minimum mean square error given by (9). Further, a double sampling ratio estimator based on Singh⁸ is

having MSE given by

where such that

2 2 2 2 2

' '

( _sd) ( _n _n)[ _y _x 2 _xy] _n _y

MSE y = γ −γ S +Rδ S − R Sδ +γ S

2 3

( 1) ( 1)

(1) ( 1) '(1) ''(1) '''( ) ( )

2! 3!

gd

w w

Y = y g + w− g + − g + − g w +b x −x

 

2 3

* '

( 1) ( 1)

(1) ( 1) '(1) ''(1) '''( ) ( )

2! 3!

w w

Y = y g + w− g + − g + − g w +b x −x

 

* 1 ( 1)

w

= + ψ

w

−

⁰< <

ψ

¹

ψ

'(1)

g g''(1) g'''(w^*)

'

ei

' ' 2

' 4 4 4 '

0 4 4 2 2 2 0 4 0 4 2

'(1) '(1)

( )

gd

x x x x

e e e Yg g

Y Y e e e e e e e

σ σ σ σ

 

= + + − − +  + −

 

' '

0 4 4 2 2 2 0 4 0 4 2

'(1) '(1)

( )

x x x x

e e e Yg g

Y Y e e e e e e e

σ σ σ σ

= + + − − + + − ₄² ₄^{' 2} _{4 4}^' ''(1)₄ ₁^' ₁ ₂ ₂

( 2 )

2 _x _xy _xy _x _x

Yg e e e e e e e e

e e e e e e

+ + −

σ

+ − + − − +

' '

2 ' 2 ' ' 1 2 1 2 1 3 1 3

4 4 4 4 4 1 1 2 2

x xy xy x x

Yg e e e e e e e e

e e e e e e

S S S S

β

^ ^

+ + − +  − + − − + 

 

' 2 21 4 40 20 2

'(1) ''(1)

( ) ( )[ ( ) ]

gd n n 2

x x x xy

g Yg

Bias Y

γ γ µ µ µ β

σ σ

= − + − + −

2 30 21

' 2 21 4 40 20 2

'(1) ''(1)

( ) ( )[ ( ) ]

x x Sx Sxy

µ µ

γ γ µ µ µ β

^ ^

= − + − +  − 

 

{ }

²

2

2 2 2 2

' ' 4 40 20 2 21 2 30

( ) ( )[(1 ) '(1) ( ) ]

x x x

Y g

γ γ γ ρ µ µ µ µ

σ σ σ

= + − − + − + −

2 2 2 2

' ' 40 20 21 30

( gd) n y ( n n)[(1 ) y ( ) ]

MSE Y =

γ

S +

γ

−

γ

−

ρ

S +

µ

−

µ

+

µ

−

µ

' ' 4 40 20 2 21 2 30

2 '(1) 2 '(1)

( ) ( )[(1 ) ( ) ]

x x x

Yg

β

Yg

γ γ γ ρ µ µ µ µ

σ σ σ

= + − − + − + −

30 21 20

2

40 20

( )

'(1) ( )

g Y

βµ µ µ

µ µ

= −

−

30 21 20

2

40 20

( )

'(1) '(1)

( ) ^opt

g g

Y

βµ µ µ θ

⁼ ⁼

µ

⁻−

µ

⁼

x

X δ X

= σ +

( ' )

( )

x sd

x

y y x x

σ σ

= +

+

(5)

showing that the proposed class of estimators are better than the double sampling Singh⁸ estimator. Also, the parameters involved in the may be estimated by the corresponding sample values in order to get a class of estimators depending upon estimated optimum value.

EMP IRIC AL ST UDY

The gain in precision of the proposed estimator(s) versus the usual double sampling linear regression estimator is studied for thirty two populations as provided in Bhushan¹ and Bhushan et al.⁹ The table given below pro- vides the gain in precision when the proposed estimator(s) are used over the double sampling linear regression estimator.

2

min '

( sd) ( ^D) ( n n)[( y x) ] 0

MSE y −MSE Yθ = γ −γ ρS −R Sδ + ≥

2

30 21

2

20 2

( )

( ) ( ) ( )[( ) ] 0

( 1)

βµ µ µ β

− = − − + − ≥

−

'(1)_opt g

Population

Number Gain(YθD/Ygd) Population

Number Gain(YθD/Ygd)

1 2.58583 17 300.001

2 9.08774 18 232.916

3 0.224016 19 10.8319

4 1.36995 20 11.1082

5 6.46651 21 0.0658345

6 25.9388 22 5.58773

7 1.04419 23 13.1132

8 0.035816 24 4.36055

9 6.54509 25 5.7562

10 65.492 26 2.78404

11 1.09776 27 0.0496923

12 2.58583 28 0.323048

13 0.0787519 29 0.586345

14 4.43361 30 11.9976

15 68.5157 31 5.47894

16 10.3939 32 2.16561

Table 1. Gain in precision of the proposed estimators over linear regression estimator.

Unpublished thesis submitted and accepted by Lucknow University, U.P., India.

6. Murthy MN (1967). Sampling Theory and Methods. Statis- tical Publishing Society, Calcutta.

7. Cochran WG (1977). Sampling Techniques 3^rd edn. John Wiley and Sons, New York.

8. Singh GN (2003). On the improvement of product method of estimation in sample surveys, J Ind Soc Agr Stat, 56, 267-275.

9. Bhushan S, Mishra P & Singh RK (2010). A class of regression type estimators using mean and variance of auxiliary variable (under communication).

REFERE NC E S

1. Bhushan S (2007). Some Improved Sampling Strategies in Finite Population. Ph.D. thesis, University of Lucknow, Lucknow, U.P., India.

2. Singh D & Chaudhary FS (2002). Theory and Analysis of Sampling Survey Design. New Age Pvt. Ltd., New Delhi.

3. Mukhopadhyaya P (1998). Theory and Methods of Survey Sampling. Prentice-Hall India, New Delhi.

4. Sukhatme PV & Sukhatme BV (1997). Sampling Theory of Surveys with Applications. Piyush Publications, Delhi.

5. Kumar R (1995). Some Contributions to Survey Sampling.