Design-Based Estimation and Inference
2. Multistage sampling within selected PSUs results in a single ultimate cluster of observations for that PSU. Variance estimation methods
3.6.3 replication Methods for Variance estimation
3.6.3.2 Balanced Repeated Replication
= ± −
( )
=
1 2
0
α
e.g., 44737 0 008786
0 4737 2 7764 0 00
1 2 4
± ⋅
= ± ⋅
t−α/ , .
. . ( . 88786) ( .= 0 4493 0 4981, . )
Note that this variance estimation technique requires only the full sample weight and the R sets of replicate weights to perform the appropriate vari-ance estimation. This is what enables survey organizations concerned about protecting respondent confidentiality to produce public-use complex sample survey data sets that do not include stratum or cluster codes (which might be used, in theory, to identify a respondent) and include only the required weight variables for estimation of population parameters and replicated vari-ance estimation. In contrast, stratum and cluster codes would be required to estimate variances using Taylor series linearization.
3.6.3.2 Balanced Repeated Replication
The BRR method of variance estimation is a “half-sample” method that was developed specifically for estimating sampling variances under two PSU-per-stratum sample designs. The 2005–2006 NHANES, NCS-R, and 2006 HRS data sets used in the example analyses described in this text employ such a sampling error calculation model (see Section 4.3), with two PSUs (clusters) per stratum. The evolution of the BRR method began with the con-cept of forming replicates by choosing one-half of the sample. For a complex sample design with h = 1, …, H strata and exactly ah = 2 PSUs per stratum, a half-sample replicate could be formed by choosing 1 of the 2 PSUs from each stratum (e.g., in Figure 3.4, choose PSU 1 in strata 1, 2, and 3 and PSU 2 in stratum 4). By default, this choice of one PSU per stratum would define a half-sample complement (e.g., PSU 2 in Strata 1, 2, and 3 and PSU 1 in Stratum 4). For H strata and 2 PSUs per stratum, there are 2H possible differ-ent half-samples that could be formed—a total of 16 for the simple sample in Figure 3.4, and 4,398,000,000,000 for the H = 42 strata in the NCS-R design.
Since a complex sample design with H strata and 2 PSUs per stratum pro-vides only H degrees of freedom for variance estimation, the formation of R
> H half-sample replicates will not yield additional gains in efficiency for a replicated half-sample variance estimate.
BRR variance estimation proceeds using the following five steps.
3.6.3.2.1 BRR Step 1
So what is the optimal procedure for selecting which half-sample repli-cates to employ in the variance estimation? McCarthy (1969) introduced the concept of balanced repeated replication in which individual replicates are formed according to a pattern of “+” and “–” symbols that are found in the
rows of a Hadamard matrix. The optimal efficiency of the BRR method for variance estimation based on half-samples is due to its “balancing”—that is, complete algebraic cancellation of unwanted between-stratum cross-prod-uct terms, such as (yh1−yh2) (⋅ yg1−yg2), that enter the half-sample variance computation formula. Readers interested in a more complete mathematical development of this idea are referred to Wolter (2007).
Fortunately, contemporary software for survey data analysis makes it easy to apply the BRR variance estimation method. Using only sampling error stratum and cluster codes (see Section 4.3.1) provided with the survey data set, the software will invoke the correct form of the Hadamard matrix to construct the sample replicates. Figure 3.6 illustrates the 4 × 4 Hadamard matrix than can be used to define BRR replicates for the four-strata sample design in Figure 3.4. Each row of the matrix defines one BRR replicate. For BRR replicate 1, the “+” sign in the columns for Strata 1, 2, and 3 indicate that the first PSU in the stratum is assigned to the replicate. The “–” sign in the Stratum 4 column indicates the second PSU is to be included in replicate 1.
Likewise, BRR replicate 2 will include PSU 1 from stratum 1 and PSU 2 from strata 2, 3, and 4.
Figure 3.7 illustrates the form of the first BRR replicate for the Figure 3.4 data set.
Hadamard matrices are defined only for dimensions that are multiples of four. Whenever the number of primary strata defined for a complex sample design is a multiple of four, exactly H BRR replicates are defined according to the patterns of “+/–” indicators in the rows and columns of the H × H Hadamard matrix. The corresponding BRR variance estimates are said to be fully balanced. If the number of primary strata in the complex sample design is not a multiple of four, the Hadamard matrix of dimension equal to the next multiple of 4 > H is used. For example, a Hadamard matrix of dimension 44
× 44 is used to define the 44 half-sample replicates for the NCS-R, which has H = 42 strata for variance estimation. In such cases, the corresponding BRR variance estimates are said to be partially balanced.
BRR
Replicate Stratum (h)
1 2 3 4
1 + + + –
2 + – – –
3 – – + –
4 – + – –
Figure 3.6
Hadamard matrix used to define BRR replicates for a H = 4 strata design.
© 2010 by Taylor and Francis Group, LLC
80 Applied Survey Data Analysis
3.6.3.2.2 BRR Step 2
A new replicate weight is then created for each of the h =1, …, H BRR half-sample replicates created in Step 1. Replicate weight values for cases in the complement half-sample PSUs are assigned a value of “0” or “missing.”
The replicate weight values for the cases in the PSUs retained in the half-sample are formed by multiplying the full half-sample analysis weights by a factor of 2.
3.6.3.2.3 BRR Step 3
Following the same procedure outlined for JRR and using the replicate weights developed for each BRR replicate sample, the weighted replicate estimates of the population statistic are computed. The full sample estimate of the population statistic is also computed:
Stratum PSU (Cluster) Case yi wi,rep
1 1 1 .58 1x2
1 1 2 .48 2x2
1 2 1 . .
1 2 2 . .
2 1 1 .39 1x2
2 1 2 .46 2x2
2 2 1 . .
2 2 2 . .
3 1 1 .39 1x2
3 1 2 .47 2x2
3 2 1 . .
3 2 2 . .
4 1 1 . .
4 1 2 . .
4 2 1 .47 2x2
4 2 2 .50 2x2
Figure 3.7
Illustration of BRR replicate 1 for the example data set.
ˆ
The BRR estimate of sampling variance of the sample estimate is computed using one of several simple formulas. Here we illustrate the computation using one of the more common half-sample variance estimation formulae:
var ( ) var ( ˆ)BRR w BRR (ˆ ˆ)
Several software packages such as WesVar PC permit users to choose alterna-tive half-sample variance estimation formulae including a method proposed by Fay (Judkins, 1990). Interested users are referred to Rust (1985) or Wolter (2007) for more information on alternative half-sample variance estimation formulae.
3.6.3.2.5 BRR Step 5
A 100(1 – α)% confidence interval for the population parameter is then con-structed (recall that in the case of BRR, df = H):
CI q q t df BRR q
© 2010 by Taylor and Francis Group, LLC
82 Applied Survey Data Analysis