SOME SYSTEMATIC SAMPLING STRATEGIES USING MULTIPLE RANDOM STARTS

(1)

Multiple Random Starts

S. Sampath

Department of Statistics

University of Madras, Chennai 600005 India

[email protected]

S. Ammani

Department of Mathematics KRMM College, Chennai 600020 India

[email protected]

Abstract

This paper deals with some linear trend free sampling – estimating strategies. Two different systematic sampling schemes which are multiple random starts analogues of Balanced Systematic Sampling (BSS) due to Sethi (1965) and Modified Systematic Sampling (MSS) of Singh, Jindal, and Garg (1968) have been considered. Further, Yates corrected estimator has been developed for Linear Systematic Sampling (LSS) with multiple random starts. A detailed numerical comparative study has been carried out using appropriate super population models with the help of R package for statistical computing.

Keywords: Multiple random starts, Systematic sampling, Yates corrected estimator, Average variance.

AMS classification:DMS605.

1. Introduction

Consider a finite population ofN units. The values of the population units with respect to the characteristic under study will be denoted by Y₁,Y₂ ,…, YN.

Let n be the sample size and k a positive integer such that n N

k  . A Linear

systematic sample (LSS) is drawn by choosing at random an integer from 1to k, say r. Starting from the rth unit in the population, every th

k unit is selected until a sample of size n is obtained. This procedure is equivalent to dividing the population of N units into k mutually exclusive and exhaustive groups



S1,S2,...,Sk



of n units each and choosing one of them at random where the

units in the th

r group are given by



r r k r n k



S_r  ,  ,..., ( 1) , r1,2,...,k

(2)

probabilities are zero for several pairs of units which makes variance estimation difficult. To overcome this problem, several alternatives are discussed in the literature by many including Leu and Tsui (1996), Suresh Chandra, Sampath and Balasubramani (1992), Sampath and Uthayakumaran (1998), Chang and Huang (2000) and Kuo (2004). Although these new sampling methods solve some limitations of LSS, they loose the operational simplicity of LSS. Preserving the inherent characteristics of LSS, like operational simplicity, Tukey (1950) and later on Gautschi (1957) suggested LSS with multiple random starts as an alternative to LSS with single random start which is explained below.

In LSS with multiple random starts, instead of choosing only one number at random from 1 to k, a simple random sample of size s is drawn from the first k

elements and then every th

k element is chosen. Ifr1,r2,...,rs are the labels of the

s units sampled from the first k units, then the sample contains units with labels



1



, ,...,

, ₁ ₁

1 r k r n k

r    r₂,r₂ k,...,r₂ 



n1



k,…, r_s,r_s k,...,r_s 



n1



k.

It may be noted that ultimately the sample contains a total of ns units.

As competitors of LSS many systematic sampling schemes have been proposed in literature each one having an edge over LSS in some sense. Among them, two linear trend free sampling methods (methods in which the estimated value coincides with the population value) are the Balanced Systematic Sampling due to Sethi (1965) and Modified Systematic Sampling due to Singh, Jindal and Garg (1968). Even though the properties of these two schemes have been widely studied by several researchers, their performances have not been studied so far when multiple random starts are employed. In the following section, we consider two new systematic sampling schemes with multiple random starts, which inherit the characteristics of BSS and MSS. Their behaviour for populations with linear trend is also studied.

2. New Sampling Schemes

(i) Balanced Systematic Sampling with multiple random starts

Under the BSS, the population units are divided into

2 n

groups (assuming the

sample size n is even) of 2k units each and a pair of units equidistant from the end points are selected from each group. This method is explained in detail as follows:

A random number r is selected from1 tok and units with labels r and 2kr1

will be selected from first group and thereafter from the remaining 1 2 n

groups,

(3)

the BSS of size n corresponding to the random start r is given by the units with labels       _ _ _ _ _ _  1 2 ,..., 1 , 0 , 1 ) 1 ( 2 ,

2jk j k r j n

r sr

Now let us consider an analogue of BSS involving multiple random starts. When

a BSS with t random starts is desired, the population is divided into t n

2 groups of

tk

2 units where the labels of the elements of the th

i group,

t

n

i

2

,...,

2

,

1



, namely

i

G are as given in the following description.

Sampled units

1

G 1 2 ... 2tk

2

G (2tk1)



2tk2



... 4tk

i

G 2

 

i1tk1 2

 

i1tk2 ... 2itk

t n G 2 1 1 2

2  

     _ tk t n 2 1 2

2  

     _ tk t n ... tk t n       2 2

We choose randomly t numbers from 1 to tk. Corresponding to every random number drawn we take pairs of units equidistant from the group ends. Note that

there are t n

2 groups and hence a random number contributes t n

units to the

sample. Thus, the above method of selection leads to a sample of sizen.

Under balanced systematic sampling with multiple random starts the sample

mean Yˆ_BSSM _{is defined by} Yˆ_BSSM =

 





           _ t n j t i r tjk tk j

r_i Y _i

Y n 2 1 1 1 2 ) 1 ( 2 1 (1)

Theorem 1 The sample mean Yˆ_BSSM of the Balanced Systematic Sampling with multiple random starts is unbiased for the population mean.

Proof: The sample mean corresponding to random starts r1,r2,...,rt can be

written as YBSSM

ˆ ₍ _, _,_..., ₎ 2 1 r rt

r =

 





           _ t n j t i r tjk tk j

ri Y i

Y n 2 1 1 1 2 ) 1 ( 2 1

Taking expectations on both sides we get,

 

YBSSM

ˆ ₌







  nk i i i U T n 1 1 tk t

where



_ _

    t n j k j t r

i Yi

T 2

1

2 and



    t n j r tjk

i Y i

(4)

=





tk

i i Y nk 1

1

= Y

Therefore, the sample mean is unbiased for the population mean under BSSM.

RemarkIt can be easily shown that,

 

YBSSM

V ˆ = 2

) (t S tk

t tk

, (2)

where,





2

1 2

1 1





 

 tk

i

i Z

Z tk

S ,Z_i 



T_i T₂_tk__i_₁



,i1,2,...,2tk and





 tk

i i Z tk Z

1

The following theorem proves that the sample mean YBSSM

ˆ _{under balanced}

systematic sampling with multiple random starts coincides with the population mean when the population values satisfy the model

N u

u

Yu   , 1,2,..., (3)

Theorem 2 Under the model Y_u  u,u 1,2,...,N , the sample mean Yˆ_BSSM coincides with the population mean.

Proof In the presence of linear trend, the sample mean under balanced systematic sampling with multiple random starts, may be written as



t



BSSM r r r

Yˆ ₁, ₂,..., =

















  

 











 

 

  





... 1

2 1

2 2

1

1 t j k r t j k

r n

t n

j











 



r_t 2t



j1



k

 



+











 

 





t n

j

r tjk 2

1

1 1 2















2tjkr₂ 1

 

+… +



 



2tjkr_t 1





  

=

   

 

   

 

   



 _

          



 _

       

2 1 2 2 2 2

1 2 2 2 2

1 2 2 t

n t n

kt t

n t n

kt n

n n

 

= 

  



 



(5)

=





2 1

  N 

Hence the proof.

(ii) Modified Systematic Sampling with multiple random starts

This method of sampling developed by Singh, Jindal, Garg (1968) is another scheme meant for populations exhibiting linear trend. Under this method, a sample of size n is drawn by selecting a pair of units equidistant from both the ends of the population in a systematic manner. The details are furnished below.

As in the case of linear and balanced systematic sampling here also a random number r is selected from 1 to k. When the sample size n is even, the sample corresponding to the random start r(r 1,2,...,k) is given by the set of units with labels

   



 _ _ _ _ _ _

 1

2 ,..., 1 , 0 , 1

,N r jk j n

jk r sr

We shall describe MSS with multiple random starts as follows. If a MSS with t random starts is desired, we choose at random t numbers from1 to tk. Each

random number will contribute t n

units to the sample, thus leading to an ultimate

sample of size n. The sample corresponding to the random start r_i(i1,2,...,t) is

given by the set of units with labels

   



 _ _ _ _ _ _ _

t n j

k j t r N k j t

ri i

2 ,..., 2 , 1 , 1 ) 1 ( ,

) 1

( .

The sample mean YMSSM

ˆ _{of modified systematic sampling under multiple random}

starts, corresponding to the random starts r1,r2,...,rt, can be written as







       



 t

n

j t

i

k j t r N k j t r

MSSM Y_i Y _i

n Y

2

1 1

1 ) 1 ( )

1 ( 1

ˆ ₍₄₎

Theorem 3 The sample mean Yˆ_MSSM under modified systematic sampling with multiple random starts, for a choice of t random starts r₁,r₂,...,r_t, is unbiased for the population mean.

Proof Corresponding to a choice of t random starts r1,r2,...,rt the sample mean MSSM

Yˆ under modified systematic sampling with multiple random starts can be

written as











       



 t

n

j t

i

k j t r N k j t r t

MSSM Y_i Y _i

n r r r Y

2

1 1

1 ) 1 ( )

1 ( 2

1

1 ,..., ,

ˆ

(6)

 

YMSSM

E ˆ =









  

tk

i

i U

T n 1

1

where,



_{ }

  

  t

n

j

k j t r

i Y_i

T 2

1

1 and UiYNr_itj1k

=





tk

i i Y nk 1

1

= Y Hence the proof.

Theorem 4 When the population units are modelled according to N

u u

Yu   , 1,2,..., , the sample mean YMSSM

ˆ _{under modified systematic}

sampling with multiple random starts coincides with the population mean.

Proof Choose t random numbers r1,r2,...,rt from 1 to tk. The sample mean MSSM

Yˆ in the presence of linear trend can be expressed as



t



MSSM r r r

Yˆ 1, 2,..., =











   

  t

n

j

k j t r n

2

1

1 1

1





+











rt



j1



k



+…+











  r_t t j1k





+











   

t n

j

k j t r N 2

1

1 1 1 +

+



N r₂ t



j1



k1



+…+



N r_t t



j1



k1





  

= n 1

  

 _ _ _

t t n t n tN t

n t t n t

2 2

 

 

= 

  



 



2 1 N 

 (5)

Hence it can be concluded that the sample mean coincides with the population mean under MSSM in the presence of linear trend.

3. Yates Corrected Estimator

(7)

When the th

r group S_r is drawn as sample, the first and last units in the sample are corrected by the weights



₁ and



₂ respectively and the corresponding sample mean is taken as an estimator for the population mean where the weights

1



and



₂ are selected so that the sample mean coincides with the population mean in the presence of linear trend. That is, the corrected mean

   

 

 







    

1

2

) 1 ( 2 ) 1 ( 1

1 n

j

k n r k j r r

c Y Y Y

n

y



is equated to the population mean Y after substituting Y_u  u,

u



1

,

2

,...,

N

and on comparing corresponding coefficients



₁ and



₂ are obtained. We shall now proceed to extend this for LSS with multiple random starts.

For LSS with t random numbers r1,r2,...,rt , we take



t



c

r

Y

ˆ

₁

,

₂

,...,

=   





 

  











s

v n

j r j k

s

v rv v

Y

tn

1

2 1

1 1

1



+ 



 



  

s

v rv n k

Y

1 1

2



_

 

(6)

as an estimator, where the weights



₁ and



₂ are given respectively to the first and last elements of the groups corresponding to the random starts selected. The remaining units are given unit weights. The weights



₁ and



₂ are chosen in such a manner that the estimator coincides with the population mean when the population units are modelled by (3). The following theorem gives the expressions for the weights



₁ and



₂ occurring in expression (6).

Theorem 5 In the presence of linear trend, the weights



₁ and



₂ in the Yates corrected estimator Yc



r,r ,...,rs



ˆ 2

1 defined by (6) are given by

1



=





 



_

  

 



     

k n s

n k s r r

r

n _s

1 2

... 2

2

1 1 2

and



 





_

  

 



     



k n s

r r

r k

ns _s

1 2

... 2

1

1 1 2

2



Proof Under the model Yu  u,u 1,2,...,N, the expression for the corrected estimator with random starts r1,r2,...,rtcan be expressed as





sn r r r

Yˆ_c ₁, ₂,..., _s  1

 



 



1 1





r



 +



 r₂



...



r_s





+





r



j



k



n

j

1 1

1

2

  











+











r₂ 



j1



k



…+









(8)

















r₂  n1k



+…+



 



r_s 



n1



k





   = sn 1   









      

 ₂ n 2 s r₁ r₂ ... r_s

1   













_       2 1 2 1 2 n n sk k n s     

On equating the above expression to the population mean namely,





2 1

  N

 , we get



  









  



    s r r r s n

sn 2 ...

1

2 1 2

1   













    _ _   2 1 2 1 2 n n sk k n s      =





2 1   N 

Comparing the coefficients of



and  we get,

2 1



 = 2 and















 



2 1 2 1 2 1 2 ... 1 2 2 1 2 1     

 _ _ _ _ _ _ _ _ _ n n N

sk k n s n r r r

sn s   

Solving for



₁and



₂, we get



 







             k n s r r r k ns _s 1 2 ... 2 1

1 1 2

2

 (7)

and

2 1 2



  =





 



_             k n s n k s r r r n _s 1 2 ... 2 2

1 1 2 ₍₈₎

It may be noted that the weights do not require any prior knowledge of



&  .

4. Average Variance

(9)

the survey variables are assumed to be the realized values of random variables having a known probability distribution which involves both known and unknown parametric values. In this approach called the ‘super population model approach’ one tries to identify a sampling-estimating strategy that minimizes the average mean square error or the variance where averaging is done with respect to the probability distribution of the underlying variables, namely, Y₁,Y₂,...,Y_N. In order to compare the efficiency of the sampling strategies proposed and discussed in detail in the previous sections we need to find the average variance of the estimators under the super population model suitable for populations with linear trend. The model is described as follows:

n Y Y

Y₁, ₂,..., are random variables modeled by

u

u u e

Y    ,

where E_M

 

e_u 0,V_M

 

e_u 2ugand cov_M



e_u,e_v



0 uv (9)

Under the assumed model,

 



N u

u Y N Y

1

=









 

N

u

u e u

N 1

1 _ _

=





_

  



 _  _





N

u u e N

N N

N 2 1

1

1 _ _

=









 

 N

u u e nk

1

2 1

 

(i) Average Variance under LSS with multiple random starts

Denoting the Yates estimator described in Section 3 by YLSSM

ˆ _{for the case of}

linear systematic sampling with multiple random starts, we express its average variance under the super population model given in (9) as

 





2

ˆ

ˆ _E _E_Y _Y

Y V

E_M _LSSM  _M _LSSM 





  



 

 

     



  



  



  

g t g

g t

k

r t k

r r

k

r r

r n

t t

k

t t

... ...

1

2 1 2 1 1

1 2

1 1

2 2

2

1 2 1 1

 













 

g t

g g

k n r k

n r k n

r₁ 1 ₂ 1 ... 1

2

2         

(10)

+





g





g



_t





 

g



n j k j r k j r k j

r₁ 1 ₂ 1 ... 1

1 2         



  + g N u u N



1

2 2 1 _ - _



  



   g t g g r r r tnN ... 2 2 1 1 2 







g

k n

r1 1

2  









 



  g k n

r₂ 1 …+











g





t n k

r 1









    1 2 1 1 n j g k j

r 



r2 



j1



k



g++…+







        g

t j k

r 1 (9)

(ii) Average Variance under BSS with multiple random starts

The average variance of the estimator Yˆ_BSSM under balanced systematic sampling with multiple random under the super population model given in (9) is



_ˆ





_ˆ



2

Y Y E E Y V

EM BSSM  M BSSM 

=











           

 



          t n j g t tk r t tk r r tk r r k j t r n t

tk _t _t

2 1 1 2 2 1 1 2 1 1 1 2 1

1 2 1 1











2 2 1



...

g k j t

r



r_t 2t



j1



k



g 



 

 

g   



g _

r tjk r

tjk 1 2 1

2 ₁ ₂



2tjk r_t 1



g)

+ g

N

u u N



1

2 2 1 _ -







       t n j g k j t r nN 2 1 1 2 1 2 2











g

k j t

r₂ 2 1 + +



r_t 2t



j1



k

 

g  2tjkr₁ 1



g 





g

r

tjk 1

2  2  



 

          g t r tjk 1 2 (10)

In the particular case of t2, we obtain from (9),

 





           1 if 2 1 0 if ˆ 2 2 g N g Y V E_M _BSSM



(11)

(iii) Average Variance under MSS with multiple random starts

Under modified systematic sampling with multiple random starts the average variance of the estimator Yˆ_MSSM under the super population model given by (9) is given by



MSSM



MV Y

E ˆ =EME



Yˆ_MSSM Y



2



_MSSM





MV Y

E ˆ













  

 

    

    



  



 



  



  

t n

j

g t

tk

r t tk

r r

tk

r r

k j t r n

t tk

t t

2

1 1 1

1 2

1 1

2 2

1 ...

1

1 2 1 1









r₂ t j1k



g ...



r_t t



j1



k



g 



Nr₁t



j1



k1



g 







   



g 

k j t r

N ₂ 1 1 …+





g



t t j k

r

N   1 1

  

 

+





N

u

g u

N 1

2 2

1 _

-nN

2

2 t





g

n

j

k j t

r1 1

2

1

 





+



r₂ t



j1



k



g

+…+



r_t t



j1



k



g+



Nr₁t



j1



k1



g 







   



g 

k j t r

N ₂ 1 1 …+





g



t t j k

r

N   1 1

    

(12)

As in the case of BSSM, we observe from (12) that wheng 0,t 2,

In the particular case of t2, we obtain from (11),

 





      

 

 

1 if 2

1

0 if

ˆ

2 2

g N

g

Y V EM BSSM



(13)

Hence we may conclude that for the case of two random starts, average variances of the sample mean under BSSM and MSSM are both equal when

0



g and g 1.

5. Comparative Study

(12)

perform with the same efficiency when g 0 for all choices of the number of

random starts, their performance cannot be mathematically studied in view of their complex nature for other cases of

g

. Hence a numerical study has been carried out to compare their performances for different choices ofN,n,g,and t.

The results of the study are given in Table 1. The entries in the table give the values of ratios of average of variance of Yates corrected estimator under LSS with multiple random starts to the average variance of sample mean under BSSM as well as MSSM. The contents of the table clearly support the use of BSSM and MSSM instead of Yates corrected estimator under LSSM for all choices of g and

sample sizes.

Table 1: Ratios of the average variance of Yates Corrected Estimator (YLSSM

ˆ _{to the average variances of the estimators}

BSSM

Y

ˆ

and YMSSM

ˆ

g t

 

BSSM M

LSSM M

Y

V

E

Y

V

E

ˆ

 



MSSM



M

LSSM M

Y

V

E

Y

V

E

ˆ

N=120

n=24

N=180

n=36

N=240

n=60

N=120

n=24

N=180

n=36

N=240

n=60 0

2 4.01 4.01 3.00 4.01 4.01 3.00

3 4.00 4.01 3.00 4.00 4.01 3.00

4 4.00 4.00 3.00 4.00 4.01 3.00

1

2 4.01 4.01 3.00 4.01 4.01 3.00

3 3.99 4.00 3.00 4.00 4.01 3.00

4 3.96 3.99 3.00 4.00 4.00 3.00

2

2 4.01 4.01 3.01 4.01 4.01 3.01

3 3.96 3.99 3.00 4.00 4.01 3.01

4 3.87 3.95 3.00 4.00 4.00 3.00

3

2 4.01 4.02 3.01 4.01 4.02 3.01

3 3.92 3.98 3.00 4.01 4.01 3.01

4 3.71 3.88 3.00 4.00 4.00 3.01

6. Conclusion

Thus in this paper the following three sampling-estimating strategies involving multiple random starts have been compared for populations possessing linear trend.

Strategy – 1 Balanced systematic sampling with multiple random starts – Mean estimator

Strategy – 2 Modified systematic sampling with multiple random starts – Mean estimator

(13)

The following are the findings of the study.

(i) The multiple random start systematic sampling schemes, namely BSSM and MSSM estimate the population mean without any error in the presence of linear trend.

(ii) In the case of BSSM and MSSM, the average variance of the estimator coincides for all choices of random starts and sample sizes forg  0.

(iii) The average variances of the estimators under BSSM and MSSM are equal when we use two random starts for all choices of gand sample sizes.

(iv) It was also observed that BSSM and MSSM were more efficient when compared to LSSM for all choices of g and sample sizes.

(v) Regarding the number of random starts to be used, it is recommended to use two random starts instead of more number of random starts since the numerical study indicates that average variances tend to increase as the number of random starts is increased. This pattern was observed for both BSSM and MSSM. While this is observed in the case of BSSM and MSSM, the average variance under LSSM (with Yates corrected estimator) tends to decrease as the number of random starts increase, for the cases g  2 and g 3.

Acknowledgement

The authors wish to thank the referees for their constructive suggestions which have helped in improving the organization of the paper to a great extent.

References

1. C.H. Leu, and K.W. Tsui, (1996). New partially systematic sampling,

Statistica Sinica, 6:617-630.

2. D. Singh, K.K. Jindal, J.N. Garg, (1968). On modified systematic sampling,

Biometrika55: 541-546.

3. F. Yates, (1948). Systematic Sampling, Phil. Trans. Roy. Soc., London, A241: 345-377.

4. H.J. Chang, K.C. Huang, (2000). Remainder linear systematic sampling,

Sankhya B, 62:376-384.

5. J.W. Tukey, (1950). Some sampling simplified, Jour. Amer. Stat. Assoc.,

45:501-519.

6. K.C. Huang, (2004). Mixed random systematic sampling desgins, Metrika,

59:1-11.

7. S. Sampath, and N. Uthayakumaran, (1998). Markov systematic sampling,

(14)

8. K. Suresh Chandra, S. Sampath, G.K. Balasubramani, (1992). Markov sampling for finite populations,Biometrika, 79:210-213.

9. V.K. Sethi, (1965). On optimum pairing of units, Sankhya B, 27: 315-320. 10. W. Gautschi, (1957). Some remarks on systematic sampling, Ann. Math.