Preparation for Complex Sample Survey Data Analysis
4.3 Understanding and Checking the Sampling Error Calculation ModelSampling Error Calculation Model
4.3.2 building the NCS-r Sampling error Calculation Model
We now consider the sample design for the National Comorbidity Survey Replication as an illustration of the primary concepts and procedures for
* The SDA system is available on the Web site of the University of Michigan Inter-University Consortium for Political and Social Research (ICPSR) and is produced by the Computer-Assisted Survey Methods Program at the University of California–Berkeley. Visit http://
www.icpsr.umich.edu for more details.
constructing a sampling error calculation model for a complex sample sur-vey data set. Table 4.2 presents a side-by-side comparison of the features of the original NCS-R complex sample design and the procedures employed to create the corresponding sampling error calculation model. Interested read-ers can refer to Kessler et al. (2004) for more details on the original sample design for the NCS-R.
Note from Table 4.2 that in the NCS-R sampling error calculation model the assumption of with-replacement sampling of ultimate clusters described in Chapter 3 is employed to address the analytic complexities associated with the multistage sampling and the without-replacement selection of NCS-R PSUs. The assumption that ultimate clusters are selected with replacement within primary stage strata ignores the finite population correction factor (see Section 2.4.2) and therefore results in a slight overestimation of true sam-pling variances for survey estimates.
To introduce several other features of the NCS-R sampling error calculation model, Table 4.3 illustrates the assignment of the sampling error stratum and cluster codes for six of the NCS-R sample design strata. In self-representing Table 4.2
Original Sample Design and Associated Sampling Error Calculation Model for the NCS-R
Original Sample Design Sampling Error Calculation Model The sample is selected in multiple stages.
Primary stage units (PSUs), secondary stage units (SSUs), and third stage units are selected without replacement (WOR).
The concept of ultimate clusters is employed (see Chapter 3). Under the assumption that PSUs (ultimate clusters) are sampled with replacement, only PSU-level statistics (totals, means) are needed to compute estimates of sampling variance.
The ultimate clusters are assumed to be sampled with replacement (SWR) at the primary stage. Finite population corrections are ignored, and simpler SWR variance formulas may be used for variance estimation.
Sixteen of the primary stage strata are self-representing (SR) and contain a single PSU. True sampling begins with the selection of SSUs within the SR PSU.
Random groups of PSUs are formed for sampling error calculation. Each SR PSU becomes a sampling error stratum. Within the SR stratum, SSUs are randomly assigned to a pair of sampling error clusters.
A total of 46 of the primary stage strata are nonself-representing (NSR). A single PSU is selected from each NSR stratum.
Collapsed strata are formed for sampling error calculation. Two similar NSR design strata (e.g., Strata A and B) are collapsed to form one sampling error computation stratum. The Stratum A PSU is the first sampling error cluster in the stratum, and the Stratum B PSU forms the second sampling error cluster.
© 2010 by Taylor and Francis Group, LLC
102 Applied Survey Data Analysis
(SR) design strata 15 and 16, the “area segments” constitute the first actual stage of noncertainty sample selection—hence, they are the ultimate cluster units with these two strata. To build the sampling error calculation model within each of these two SR design strata, the random groups method is used to assign the area segment units to two sampling error clusters. This is done to simplify the calculations required for variance estimation. As illus-trated, NCS-R nonself-representing strata 17–20 contain a single PSU selec-tion. The single PSU selected from each of these NSR strata constitutes an ultimate cluster selection. Because a minimum of two sampling error clus-ters per stratum is required for variance estimation, pairs of NSR design strata (e.g., 17 and 18, 19 and 20) are collapsed to create single sampling error strata with two sampling error computation units (PSUs) each.
Randomly grouping PSUs to form a sampling error cluster does not bias the estimates of standard errors that will be computed under the sampling error calculation model. However, forming random clusters by combining units does forfeit degrees of freedom, so the “variance of the variance esti-mate” may increase slightly.
If the collapsed stratum technique is used to develop the sampling error cal-culation model, slight overestimation of standard errors occurs because the
Table 4.3
Illustration of NCS-R Sampling Error Code Assignments
Original Sample Design
Sampling Error Calculation Model
Stratum PSUa Stratum Cluster
SR 15 1 2 3 4 5 6
7 8 9 10 11 12 15 1 = {1, 3, 5, 7, 9, 11}
2 = {2, 4, 6, 8, 10, 12}
16 1 2 3 4 5 6
7 8 9 10 11 12 16 1 = {1, 3, 5, 7, 9, 11}
2 = {2, 4, 6, 8, 10, 12}
. . .
nSR 17 1701 17 1 = 1701
18 1801 2 = 1801
19 1901 18 1 = 1901
20 2001 2 = 2001
a Recall from Section 2.8 and Table 4.2 that in self-representing (SR) strata, sampling begins with the selection of the smaller area seg-ment units. Hence, in the NCS-R, the sampled units (coded 1–12) in each SR stratum (serving as its own PSU) are actually secondary sampling units. We include them in the PSU column because this was the first stage of non-certainty sampling in the SR strata.
Unlike the SR PSUs, which serve as both strata and PSUs (each SR stratum is a PSU, that is, they are one and the same), the NSR strata can include multiple PSUs. In the NCS-R, one PSU was randomly selected from each NSR stratum (e.g., PSU 1701). NSR strata were then collapsed to form sampling error strata with two PSUs each to facilitate variance estimation.
collapsed strata ignore the true differences in the design strata that are col-lapsed to form the single sampling error calculation stratum. The following section provides interested readers with more detail on the combined strata, random groups, and collapsed strata techniques used in building sampling error calculation models for complex sample survey data.
4.3.3 Combining Strata, randomly grouping