• No results found

Optimal Allocation in Stratified Randomized Response Model

N/A
N/A
Protected

Academic year: 2020

Share "Optimal Allocation in Stratified Randomized Response Model"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Sat Gupta

Department of Mathematical Sciences, University of North Carolina at Greensboro, 383 Bryan Building Greensboro, NC 27402, USA

Abstract

A Warner (1965) randomized response model based on stratification is used to determine the allocation of samples. Both linear and log-linear cost functions are discussed under uni and double stratification. It observed that by using a log-linear cost function, one can get better allocations.

Key Words: Randomized response, linear and log-linear functions, cost function, stratification.

Corresponding address: Department of Statistics, Quaid-i-Azam University, Islamabad 45320, Pakistan

Introduction

Various randomized response techniques have developed to obtain truthful answers since the pioneering work by Warne (1965). The significant contribution is by Greenberg et al. (1969), Moors (1971), Mangat and Singh (1990), Mangat (1994) and Singh et al. (2000). Hong et al. (1994) used the proportional allocation for obtaining the response more accurately. Recently Kim and Warde (2003) discussed the Warner’s model and used the optimum allocation in stratification. It is no doubt that optimal allocation gives better results as compared to proportional allocation, as also discussed by Cochran (1977). Kim and Warde (2003) mainly focused of getting the truthful response. They did not obtain the allocation of samples and cost function. Khare (1987) discussed the allocation of samples in the presence of non-response. In this paper our emphasize is to look the allocation of samples in relation to cost function in randomized response model. We used both linear and loglinear cost functions and also study is extended to double stratification.

Warner’s Stratified Model

Let yh and xh (h=1, 2,…..,L) be the response and auxiliary variables of stratum h, having strata, L. A population of size

= = L

h h N N

1

(2)

various strata according to some particular characteristics that may be age, height or martial status etc. or some other suitable similar characteristics. A

sample of size

= = L

h h n n

1

is obtained from each stratum to measure the

response y. Each selected respondent from each stratum is instructed to use the randomized device before giving the response in the form of yes or no. We assumed that the number of observations in each stratum is known. The Warner (1965) randomized device consists of sensitive and its negative statements with probability Ph and (1−Ph) respectively. The probability of yes answer under the assumption that response will be true is given by

) 1 )( 1

( h h

h h

h P P

G = π + − −π , h=1, 2,…..,L ,

where Gh is the proportion of yes answer in stratum h, Ph is the probability that the respondent in the sample of stratum h has a sensitive question card and πh is the proportion of respondent with yes answer in stratum h. The

maximum likelihood estimate of πh is given by

1 2

) 1 ( ˆ ˆ

− − − =

h h h

h

P P G

π , Ph ≠0.5

where Gˆh is the proportion of yes answer in the sample of stratum h. SinceGˆh

follows the binomial distribution B(nh, Gh) and selection is made independently in different strata. Therefore maximum likelihood estimate of π

is

= = L

h

h h W

1

ˆ

ˆ π

π , where Wh = Nh/N is the stratum weight. The πˆ is an

unbiased estimate of π i.e. π =

π =

π =

π =π =

= =

L

h

h h L

h

h h L

h

h

h W E W

W E E

1 1

1

) ˆ ( ]

ˆ [

) ˆ

( and

its variance is given by

] ˆ [

) ˆ (

1

=

= L

h

h h W Var

Var π π

= = L

h

h hVar W

1

2 (πˆ ),

= −

− + − = L

h h

h h h h h h

P P P n

W Var

1 2

2

] ) 1 2 (

) 1 ( ) 1 ( [ )

ˆ

(π π π (2.1)

Cost Functions

We discuss the linear and loglinear cost functions in stratified randomized response

Linear Cost Function Define:

= +

= L

h h hn C c

C

1

(3)

where C is the total cost, c0 is the fixed cost, Ch is the cost per unit in stratum h and δ is a constant. We select a sample of size nh to minimize the

V

V(πˆ)= (say) for a specified cost or to minimize the cost for a specified variance.

For obtaining nh, we define a function ψ by using the equations (2.1) and (3.1) as

= = − − + − − + − = L h h L h h h h h h h h

h C n C c

P P P n W 1 0 1 2 2 )] ( [ ] ) 1 2 ( ) 1 ( ) 1 ( [π π λ δ

ψ (3.2)

where λ is the lagrangian multiplier

Now solving (3.2) for.nh, we get

) 1 /( 1

2

2 (2 1)

) 1 ( ) 1 ( +                         − − + − = δ δλ π π h h h h h h h P P P W

n . (3.3)

Taking summation of (3.3) and

= = L h h n n 1

, we get

) 1 /( 1 1 2 2 ) 1 /( 1 2 2 ] / } ) 1 2 ( ) 1 ( ) 1 ( { [ ] / } ) 1 2 ( ) 1 ( ) 1 ( { [ + = +

− + − − − + − = δ δ π π π π L h h h h h h h h h h h h h h h h C P P P W C P P P W n n

. (3.4)

Case 1:

If cost is fixed then by (3.2) and (3.4), we have

δ δ δ δ δ δ π π π π / 1 ) 1 /( 1 / 1 2 2 ) 1 /( 1 2 1 2 / 1 0 ] ] ) }( ) 1 2 ( ) 1 ( ) 1 ( { [ [ ] / } ) 1 2 ( ) 1 ( ) 1 ( { [ ) ( + = + =

− − + − − − + − − = L h h h h h h h h h h h h h h L h h C P P P W C P P P W c C

n . (3.5)

Case 2:

If variance is fixed then by using (2.1) and (3.4), we have

] ] } ) 1 2 ( ) 1 ( ) 1 ( { [

[ 1/ /( 1)

1 2 2 + =

− + − = π π δ δ δ h L h h h h h h h C P P P W n ×

¦

= + − − + − L h h h h h h h

h C V

P P P W 1 ) 1 /( 1 2 2 / ] ] / } ) 1 2 ( ) 1 ( ) 1 ( { [

[ π π δ . (3.6)

Using (3.6), we get optimum variance

] ] } ) 1 2 ( ) 1 ( ) 1 ( { [ [ ) ˆ

( 1/ /( 1)

(4)

× C n P P P W L h h h h h h h

h }/ ] ]/

) 1 2 ( ) 1 ( ) 1 ( { [ [ 1 ) 1 /( 1 2 2

¦

= + − − + −π δ

π . (3.7)

Log-linear Cost Function Define:

= + = L h h h n C c C 1

0 log δ >0, (3.8)

By (2.1) and (3.8), we have

)] ( log [ } ) 1 2 ( ) 1 ( ) 1 ( { 1 0

1 2 2

2

= = − − + − − + − = L h h h L h h h h h h h

h C n C c

P P P n

W

V π π λ . (3.9)

Solving (3.9) for nh, we get

= − − + − − − + − = L h h h h h h h h h h h h h h h h C P P P W C P P P W n n 1 2 2 2 2 / } ) 1 2 ( ) 1 ( ) 1 ( { / } ) 1 2 ( ) 1 ( ) 1 ( { π π π π

. (3.10)

Case 1:

If cost is fixed then by (3.8) and (3.10), we have

                    − − + − − − =

= = L h h h h h h h h h L h h C C P P P W C c C n 1 2 2 1

0 }/

) 1 2 ( 1 ( ) 1 ( { log ) ( exp π π ×      − − −

hL= h h h h h h h C P P P W 1 2

2 }/

) 1 2 ( ) 1 ( ) 1 (

{π π (3.11)

Case 2:

If variance is fixed, then by (2.1) and (3.10), we have

V C P P P W C n L h h h h h h h L h h

h }/ ]/

) 1 2 ( ) 1 ( ) 1 ( { ][ [

1 1 2

2

∑ ∑

= = − − + −

= π π (3.12)

From (3.12), the optimum variance is

n C P P P W C

V L h

h h h h h h L h h h

opt }/ ]/

) 1 2 ( ) 1 ( ) 1 ( { ][ [ ) ˆ (

1 1 2

2

∑ ∑

= = − − + − = π π

π . (3.13)

Numerical Illustration

The following data is used to determine the sample size and expected cost. We obtain the followings (i), (ii) and (iii), linear cost functions by substitution of

(5)

(i)

= +

= L

h h hn C c

C

1 2

0 , (ii)

= +

= L

h h hn C c

C

1

0 ,

(iii)

= +

= L

h

h h n C c

C

1

0 and (iv)

= +

= L

h

h

h n

C c

C

1

0 log( ).

Table1: Artificial data for two groups

Stratum πh Ph Wh Ch

1 2

0.4 0.6

0.6 0.7

0.3 0.7

4 9 1

2

0.2 0.4

0.6 0.7

0.3 0.7

9 4

Table 2: Fixed cost

95 )

(Cc0 = units (Cc0)=100 units

Cases n

1

n n2 n n1 n2

(i) (ii) (iii) (iv)

6 15 115 3396

3 9 67 2120

3 7 48 1276

6 17 142 6332

3 6 45 1549

3 11 96 4783

Table 3: Fixed variance 1 =

V V =0.175

Cases n

1

n n2 E. cost n n1 n2 E. cost (i)

(ii) (iii) (iv)

3 3 3 3

2 2 2 2

1 1 1 1

22 17 14 3

15 16 16 3

6 6 5 1

9 10 11 2

662 90 34 2

From Tables 2 and 3, it is observed that the log-linear cost function is better as compared to ordinary cost function. The expected cost is decreasing with increase of πh, Ph and Wh.

Double Stratification

To estimate the proportion of yes answers from respondent y, it is reasonable to stratify the population on the basis of auxiliary variable x, but when such information on x is lacking or stratum weights, Whare unknown, then we use the double sampling technique, (see Cochran, 1977; Rao, 1973).

(6)

(ii) The collected samples are arranged into L strata on the basis of x. Let

h

n′ be the number of units falling in stratum h, i.e.

=

′ =

L

h h

n n

1

.

(iii) A sub-sample of size nhnh(=νhnh′), 0<νh ≤1, where νh is known for

each strata, is selected with replacement and the information on yh in stratum h is collected from respondents using the randomized device to estimate the proportion π. Also nh′ /n′=wh is an unbiased estimate of

h

h N W

N / = .

As πˆ is an unbiased estimate of π, therefore its variance ignoring the finite population correction (fpc) is given by

) 1 1 ]( ) 1 2 (

) 1 ( ) 1 ( [ ]

) 1 2 (

) 1 ( ) 1 ( [ 1 ) ˆ (

1 2

2

2

− + − ′

+ − − + − ′

=

= h

L

h h

h h h h h

P P P n

W P

P P n

Var

ν π

π π

π

π ,

(5.1) where

= = L

h h

P P

1

, and

= = L

h 1 h

π

π .

Linear Cost Function Define:

=

+ ′ ′ +

= L

h h hn C n

c c C

1

0 ( )δ , (5.2)

where c′ be the cost of classification per unit, c0 is the fixed cost and Ch be the cost of measuring unit in stratum h.

Since nh is random variable therefore the expected cost is.

= ′ + ′ ′ + =

= L

h h h h

W C n n c c C C E

1 0

* ( )

)

( δ ν . (5.3)

Here we choose n′ and νh to minimize Var(πˆ)=V(say) for a specified expected cost or to minimize the expected cost for specified variance (see Cochran, 1977).

Now we define a function ψ by using a constant λ as By (5.1) and (5.3), we have

) 1 1 ]( ) 1 2 (

) 1 ( ) 1 ( [ ]

) 1 2 (

) 1 ( ) 1 ( [ 1

1 2

2

2

− + − ′

+ − − + − ′

=

= h

L

h h

h h h h h

P P P n

W P

P P

n π π π π ν

ψ

)] (

) (

[ * 0

1

c C W C n n

c L

h h h h

− − ′

+ ′ ′

+

= ν

λ δ . (5.4)

] ) 1 2 (

) 1 ( ) 1 ( [ 1

1 2

2

= −

− + − ′

+ ′

= L

h h

h h h h h h B

P P P n

W

n π ν π π

(7)

)] (

[ * 0

1 c C W C n n c L h h h

h − −

′ + ′ ′ +

= ν

λ δ , (5.5)

where ]

) 1 2 ( ) 1 ( ) 1 ( [ ] ) 1 2 ( ) 1 ( ) 1 ( [ 1 2 2 2

= − − + − − − − + − = L h h h h h h h B P P P W P P P π π π π π .

Now solving (5.5) for n′ and νh, we get

) 1 /( 1 2 ] [ + ′ = ′ δ δλ π c

n B (5.6)

and 2 1/( 1) 1/2

) 1 /( 1 2 / 1 2 ) ( ) ( ] [ ] ) 1 2 ( ) 1 ( ) 1 ( [ h B h h h h h h C c P P P λ π δλ π π ν δ δ + + ′ − − + −

= (5.7)

Case 1:

If cost is fixed, then substituting of n′ and νh from (5.6) and (5.7) in (5.3), we get 0

) (

)

( * 0

2 3 ) 1 /( 1 2

1A + +A ACc =

A δ δ , (5.8)

where ) 1 /( 1 2

1 =[( )δ ′] δ+

δ

π c

A B , 1/λ

2 =

A and

2 / 1 2 1 3 ] ) 1 2 ( ) 1 ( ) 1 ( [ − − + − =

= h h h h h h L h h P P P W C

A π π

Case 2:

If variance is fixed, then substituting the values of n′ and νh from (5.6) and (5.7) in (5.1), we get

λ λ δ δ 3 ) 1 /( 1

1( ) A

A

V = + + , (5.9)

Log-linear Cost Function Define:

= ′ + ′ ′ + = L h h h h W C n n c c C 1 0

* log ν . (5.10)

By (5.1) and (5.12), we have

) 1 1 ]( ) 1 2 ( ) 1 ( ) 1 ( [ ] ) 1 2 ( ) 1 ( ) 1 ( [ 1 1 2 2

2

− + − ′ + − − + − ′ =

= h L h h h h h h h P P P n W P P P

n π π π π ν

ψ

)] (

log

[ * 0

1 c C W C n n c L h h h

h − −

′ + ′ ′ +

= ν

λ . (5.11)

Solving (5.11) for n′ and νh, we have

c A

n′=πB2 2 / ′ and

(8)

Case 1:

If cost is fixed, then substituting n′ and νh in (5.10), we get 0

) (

) log( )

/

log( * 0

2 3 2

2 + + =

c c A A A C c

c πB (5.12)

Case 2:

If variance is fixed, then substituting n′ and νh in (5.1), we get 0

) / ( ) /

(cA2 + A3 A2V = . (5.13)

Conclusion

It is observed that by using a log-linear cost function, one can select more samples for a given fixed cost in Table 2 and in Table 3, the expected cost is much lower for a fixed variance. So generally it is preferable to use the log-linear cost function for selecting the samples in stratified randomized response survey.

Acknowledgement:

The first author wishes to thanks the facilities provided by the University of Southern Maine, USA during his post-doctoral research work in 2003.

References

1. Cochran W. G., 1977, Sampling Techniques, (3rd edt.), Wiley and Sons: New York.

2. Greenberg B. G; Abul-Ela; Abdel-Latif A; Simmons W. R., and Horvitz D. G.,

1969, The unrelated question RR model-theoretical frame-work, J. Amer. Statist. Assoc., 64, 520-539.

3. Hong, K; Yum, J and Lee, H., 1994, A stratified randomized response

technique, Korean J. App. Statist., 7, 141-147.

4. Khare B. B., 1987, Allocation in stratified random sampling in presence of nonresponse, Metrika, XLV, 1-2, 2132-221.

5. Kim J-M. and Warde W. D., 2003, A stratified Warner’s randomized response

model, J. Statist. Plann. Infer., 1-12 (To be appear).

6. Mangat N. S., 1994, An improved randomized response strategy, J. Roy.

Statist. Soc, B, 56(1), 93-95.

7. Mangat N. S and Singh R., 1990, An alternative randomized response

procedure, Biometrika, 77, 439-442.

8. Moors, J. J. A., 1971, Optimization of the unrelated question randomized

response model, J. Amer. Statist. Assoc. 66, 627-629.

9. Rao J. N. K., 1973, On double sampling for stratification and analytical

survey, Biometrika, 60, 125-133.

10. Singh, S; Singh R and Mangat, N. S., 2000, Some alternative strategies to Moor’s model in randomized response model, J. Statist. Plann. Infer., 83, 243-255.

11. Warner, S. L., 1965, Randomized response: a survey technique for

References

Related documents