Design- and Model-Based Variance Estimation
7.5 SYSTEMATIC SAMPLING
k∈s
wkXk=
N k=1
Xk
by minimizing the distance function
k∈s
ak(wk− ak)2/Qk, with Qk > 0
subject to the above CE. By the same approach one may derive tGR as a calibration estimator by modifying tR H C as well.
7.5 SYSTEMATIC SAMPLING
Next we consider variance estimation in systematic sampling where we have a special problem of unbiased variance esti-mation because a necessary and sufficient condition for the existence of a p-unbiased estimator for a quadratic form with at least one product term XiXj is that the corresponding pair of units (i, j ) has a positive inclusion probabilityπi j. But sys-tematic sampling is a cluster sampling where the population is divided into a number of disjoint clusters, one of which is se-lected with a given probability. Thus a pair of units belonging to different clusters has a zero probability of appearing together in a sample. Hence the problem of p-unbiased estimation of variance. Let us turn to it.
Let us consider the simplest case of linear systematic sampling with equal probabilities where in choosing a sample of size n from the population of N units it is supposed that Nn is an integer K . Then, the population is divided into K mutually exclusive clusters of n units each and one of them is selected at random, that is, with probabilityK1. If the ith cluster is selected
then one takes ¯yi, the mean of the n units of the ith cluster, i = 1, . . . , K as the unbiased estimator for the population mean Y . Then,
V ( ¯yi) = 1 K
K i=1
¯yi− ¯Y2= S2 n
1+ n− 1ρ
writing S2= nK1 1Knj=1 Yi j − ¯Y2, Yi j = the value of y for the j th member of ith cluster and
ρ = 1
K n (n− 1) S2
K 1
j= j
(Yi j − ¯Y ) (Yi j − ¯Y ).
For the reasons mentioned above one cannot have a p-unbiased estimator for V ( ¯yi) for the sampling scheme employed as above. However, there are several approaches to bypass this problem.
One procedure is to postulate a model characterizing the nature of the yi j values when they are arranged in K clus-ters as narrated above and then work out an estimator based on the sample, for example, v such that Em(v) equals EmV ( ¯yi), which therefore becomes a DM approach (cf. S ¨A RNDA L, 1981).
Second, the N elements are arranged in order, a num-ber r is found out so that nr is an integer m. Then, Kr = L, clusters are formed, and an SRSWOR of r clusters is chosen.
Each of these L clusters has m units and so a required sam-ple of size n = mr is thus realized. This is distinct from the original systematic sampling. To distinguish between the two they are respectively called single-start and multiple-start systematic sampling schemes. For the latter, one may suppose to have drawn r different systematic samples each of size m and the sample mean of each provides an unbiased estima-tor for the population mean. Denoting them by ¯y1, ¯y2,. . . , ¯yr
one may use ¯¯y = 1rr1¯yi as an unbiased estimator for ¯Y and
1 r (r−1)
r
1( ¯y− ¯¯y)2as an unbiased estimator for Vp( ¯¯y). Two vari-ations of this procedure are (a) to choose by SRSWOR method 2 or more clusters out of the K original clusters or (b) to divide the chosen cluster into a number of subsamples, and in either
case obtain several unbiased estimators for ¯Y and from them get an unbiased estimator of the variance of the pooled mean of these unbiased estimators.
A third approach is to first choose a systematic sample from the population and supplement it with an additional SRSWOR or another systematic sample from the remainder of the population. A variation of this is given by SINGH and SINGH(1977), who first make a random start out of all the N units arranged in a certain order, select a few successive units, and then follow up by choosing later units at a constant inter-val in a circular order until a required effective sample size is realized. They call it new systematic sampling, derive certain conditions on its applicability, show thatπi j > 0 for ev-ery i, j for this scheme and hence derive a Yates–Grundy-type variance estimator.
COCHRA N’s (1977) standard text gives several estimators following the first model-based approach. GA UTSCHI (1957), TORNQV IST (1963), and KOOP (1971) applied the second ap-proach. HEILBRON(1978) also gives model-based optimal es-timators of Var (systematic sample mean) as the conditional expectations of this variance given a systematic sample un-der various models postulated on the observations arranged in certain orders.
ZINGER (1980) and WU(1984) follow the third approach, taking a weighted combination of the unbiased estimators of Y based on the two samples and choosing the weights, keeping¯ in mind the twin requirements of resulting efficiency and non-negativity of the variance estimators. For a review one may refer to BELLHOUSE(1988) and IA CHA N (1982).
Finally, we present below a number of estimators for V ( ¯yi) based on the single-start simple linear systematic sample as given by WOLTER(1984).
We consider first the following notations: For the ith (i = 1,. . . , K) systematic sample supposed to have been chosen con-taining n units, let Yi j be the sample values, j = 1, . . . , n. Then,
¯yi = 1 n
n j=1
Yi j.
Let further
Then WOLTER (1984) proposed the following estimators for V ( ¯yi).
For a multiple-start systematic sample with r starts, let ¯yα denote the sample mean based on theαth replicate and
¯y= 1 r
r α=1
¯yα.
Then for V ( ¯y) the estimator is taken as v7= 1− f
r (r − 1)
r α=1
( ¯yα− ¯y)2.
This is also applicable if the ith systematic sample is split up into r random subsamples (cf. KOOP, 1971). Writing
ρˆK = 1 (n− 1)s2
n j=2
(Yi j − ¯yi) (Yi, j−1− ¯yi) another estimator for V ( ¯yi) is
v8= 1 (n− 1)s2
n j=2
Yi j − ¯yi) (Yi, j−1− ¯yi
.
WOLTER (1984) examined relative performances of these es-timators considering Bm(v) = Em[Ep(v)− V ( ¯y)] and Bm(v)/ EmV ( ¯yi) for v as vi, i = 1, . . . , 8 for several models usually postulated in the context of systematic sampling. He also ex-amined how good these are in providing confidence intervals for ¯Y . His recommendations favor v2, and v3, and, to some ex-tent, v8.
The general varying probability systematic sampling is known as circular systematic sampling (CSS) with probabil-ities proportional to sizes (PPS). From MURTHY (1967) we may describe it as follows. Suppose positive integers Xi(i = 1,. . . N ) with a total X are available as size measures and a sample of n units is required to be drawn fromU = (1, . . . , N ).
Then a member K is fixed as the integer nearest to X/n.
A random positive integer R is chosen between 1 and X . Then, let
ar = (R + r K) mod (X), r = 0, . . . , n − 1 and
C0= 0 , Ci=
i j=1
Xj, i= 1, . . . , N .
Then, a CSSPPS sample s is formed of the units i for which Ci−1 < ar ≤ Ci for r = 0, 1, . . . , n − 1
and the unit N if ar = 0.