SYSTEMATIC SAMPLING - Design- and Model-Based Variance Estimation

Design- and Model-Based Variance Estimation

7.5 SYSTEMATIC SAMPLING

k∈s

w_kX_k=

N k=1

X_k

by minimizing the distance function

k∈s

a_k(w_k− ak)²/Qk, with Q_k > 0

subject to the above CE. By the same approach one may derive t_GR as a calibration estimator by modifying t_{R H C} as well.

7.5 SYSTEMATIC SAMPLING

Next we consider variance estimation in systematic sampling where we have a special problem of unbiased variance esti-mation because a necessary and sufficient condition for the existence of a p-unbiased estimator for a quadratic form with at least one product term X_iX_j is that the corresponding pair of units (i, j ) has a positive inclusion probabilityπi j. But sys-tematic sampling is a cluster sampling where the population is divided into a number of disjoint clusters, one of which is se-lected with a given probability. Thus a pair of units belonging to different clusters has a zero probability of appearing together in a sample. Hence the problem of p-unbiased estimation of variance. Let us turn to it.

Let us consider the simplest case of linear systematic sampling with equal probabilities where in choosing a sample of size n from the population of N units it is supposed that ^N_n is an integer K . Then, the population is divided into K mutually exclusive clusters of n units each and one of them is selected at random, that is, with probability_K¹. If the ith cluster is selected

then one takes ¯y_i, the mean of the n units of the ith cluster, i = 1, . . . , K as the unbiased estimator for the population mean Y . Then,

V ( ¯y_i) = 1 K

K i=1

¯y_i− ¯Y²= S² n

1+n− 1ρ

writing S²= _nK¹ ₁^Kⁿ_j₌₁Y_{i j} − ¯Y², Y_{i j} = the value of y for the j th member of ith cluster and

ρ = 1

K n (n− 1) S²

K 1

j= j

(Y_{i j} − ¯Y ) (Yi j − ¯Y ).

For the reasons mentioned above one cannot have a p-unbiased estimator for V ( ¯y_i) for the sampling scheme employed as above. However, there are several approaches to bypass this problem.

One procedure is to postulate a model characterizing the nature of the y_{i j} values when they are arranged in K clus-ters as narrated above and then work out an estimator based on the sample, for example, v such that E_m(v) equals E_mV ( ¯y_i), which therefore becomes a DM approach (cf. S ¨A RNDA L, 1981).

Second, the N elements are arranged in order, a num-ber r is found out so that ⁿ_r is an integer m. Then, Kr = L, clusters are formed, and an SRSWOR of r clusters is chosen.

Each of these L clusters has m units and so a required sam-ple of size n = mr is thus realized. This is distinct from the original systematic sampling. To distinguish between the two they are respectively called single-start and multiple-start systematic sampling schemes. For the latter, one may suppose to have drawn r different systematic samples each of size m and the sample mean of each provides an unbiased estima-tor for the population mean. Denoting them by ¯y₁, ¯y₂,. . . , ¯yr

one may use ¯¯y = ¹_r^r₁¯y_i as an unbiased estimator for ¯Y and

1 r (r−1)

1( ¯y− ¯¯y)²as an unbiased estimator for V_p( ¯¯y). Two vari-ations of this procedure are (a) to choose by SRSWOR method 2 or more clusters out of the K original clusters or (b) to divide the chosen cluster into a number of subsamples, and in either

case obtain several unbiased estimators for ¯Y and from them get an unbiased estimator of the variance of the pooled mean of these unbiased estimators.

A third approach is to first choose a systematic sample from the population and supplement it with an additional SRSWOR or another systematic sample from the remainder of the population. A variation of this is given by SINGH and SINGH(1977), who first make a random start out of all the N units arranged in a certain order, select a few successive units, and then follow up by choosing later units at a constant inter-val in a circular order until a required effective sample size is realized. They call it new systematic sampling, derive certain conditions on its applicability, show thatπi j > 0 for ev-ery i, j for this scheme and hence derive a Yates–Grundy-type variance estimator.

COCHRA N’s (1977) standard text gives several estimators following the first model-based approach. GA UTSCHI (1957), TORNQV IST (1963), and KOOP (1971) applied the second ap-proach. HEILBRON(1978) also gives model-based optimal es-timators of Var (systematic sample mean) as the conditional expectations of this variance given a systematic sample un-der various models postulated on the observations arranged in certain orders.

ZINGER (1980) and WU(1984) follow the third approach, taking a weighted combination of the unbiased estimators of Y based on the two samples and choosing the weights, keeping¯ in mind the twin requirements of resulting efficiency and non-negativity of the variance estimators. For a review one may refer to BELLHOUSE(1988) and IA CHA N (1982).

Finally, we present below a number of estimators for V ( ¯y_i) based on the single-start simple linear systematic sample as given by WOLTER(1984).

We consider first the following notations: For the ith (i = 1,. . . , K) systematic sample supposed to have been chosen con-taining n units, let Y_{i j} be the sample values, j = 1, . . . , n. Then,

¯y_i = 1 n

n j=1

Y_{i j}.

Let further

Then WOLTER (1984) proposed the following estimators for V ( ¯y_i).

For a multiple-start systematic sample with r starts, let ¯y_α denote the sample mean based on theαth replicate and

¯y= 1 r

r α=1

¯y_α.

Then for V ( ¯y) the estimator is taken as v₇= 1− f

r (r − 1)

r α=1

( ¯y_α− ¯y)².

This is also applicable if the ith systematic sample is split up into r random subsamples (cf. KOOP, 1971). Writing

ρˆK = 1 (n− 1)s²

n j=2

(Y_{i j} − ¯yi) (Y_{i, j}₋₁− ¯yi) another estimator for V ( ¯y_i) is

v₈= 1 (n− 1)s²

n j=2

Y_{i j} − ¯yi) (Y_{i, j}₋₁− ¯yi

WOLTER (1984) examined relative performances of these es-timators considering B_m(v) = Em[E_p(v)− V ( ¯y)] and Bm(v)/ E_mV ( ¯y_i) for v as v_i, i = 1, . . . , 8 for several models usually postulated in the context of systematic sampling. He also ex-amined how good these are in providing confidence intervals for ¯Y . His recommendations favor v₂, and v₃, and, to some ex-tent, v₈.

The general varying probability systematic sampling is known as circular systematic sampling (CSS) with probabil-ities proportional to sizes (PPS). From MURTHY (1967) we may describe it as follows. Suppose positive integers X_i(i = 1,. . . N ) with a total X are available as size measures and a sample of n units is required to be drawn fromU = (1, . . . , N ).

Then a member K is fixed as the integer nearest to X/n.

A random positive integer R is chosen between 1 and X . Then, let

a_r = (R + r K) mod (X), r = 0, . . . , n − 1 and

C₀= 0 , Ci=

i j=1

X_j, i= 1, . . . , N .

Then, a CSSPPS sample s is formed of the units i for which C_i₋₁ < ar ≤ Ci for r = 0, 1, . . . , n − 1

and the unit N if a_r = 0.

In document 2. Survey Sampling Theory and Methods (Page 196-200)