Preparation for Complex Sample Survey Data Analysis
4.2 Analysis Weights: Review by the Data User
Experience as survey statisticians and consultants has taught us that many survey data analysts struggle with the correct use of sampling weights in survey estimation and inference. For whatever reason, many otherwise sophisticated data users wish to place a “black box” around the process of weight development and the application of weights in their analysis.
As described in Section 2.7, the final analysis weights provided in sur-vey data sets are generally the product of a sample selection weight, wsel, a nonresponse adjustment factor, wnr, and a poststratification factor, wps: wfinal i, =wsel i, ⋅wnr i, ⋅wps i, . For the reasons just outlined, the data producer is responsible for developing individual weights for each sample case and linking the final analysis weight variable to each observational unit in the survey data file.
The analysis weight assigned to each respondent case is a measure of the number of population members represented by that sample case or, alter-natively, the relative share of the population that the case represents. When weights are applied in the statistical analysis of survey data, weighted cal-culations simply expand each sample case’s contribution to reflect its rep-resentative share of the target population. Because the process of weight development has been discussed extensively in Section 2.7, the aim of this section is to remove any remaining mystique surrounding the analytic weights that are provided with a survey data set.
Although data analysts will not typically be responsible for the actual weight calculations, they should familiarize themselves with the analysis weight variables and how weighted analysis may influence estimation and inference from their survey data. Key steps in this process of verification and familiarization include the following:
Verifying the variable name for the appropriate weight for the
•
intended analysis.
* http://www.cdc.gov/nchs/data
Checking and reviewing the scaling and general distribution of
•
the weights.
Evaluating the impact of the weights on key survey statistics.
•
The subsequent sections describe these three activities in more detail.
4.2.1 identification of the Correct Weight Variables for the analysis The data user will need to refer to the survey documentation (techni-cal report, codebook) to identify the correct variable name for the analysis weight. Unfortunately, there are no standard naming conventions for weight variables, and we recommend great caution in this step as a result. A num-ber of years ago, a student mistakenly chose a variable labeled WEIGHT and produced a wonderful paper based on the NHANES data in which each respondent’s data was weighted by his or her body weight in kilograms. The correct analysis weight variable in the student’s data file was stored under a different, less obvious variable label.
Depending on the variables to be analyzed, there may be more than one weight variable provided with the survey data set. The 2006 Health and Retirement Study (HRS) data set includes one weight variable (KWGTHH) for the analysis of financial unit (single adult or couple) variables (e.g., home value or total net worth) and a separate weight variable (KWGTR) for individual-level analysis of variables (e.g., health status or earnings from a job). The 2005–2006 NHANES documentation instructs analysts to use the weight variable WTINT2YR for analyses of the medical history interview variables and another weight variable (WTMEC2YR) for analyses of data collected from the medical examination phase of the study. The larger sam-ple of the National Comorbidity Survey Replication (NCS-R) Part I mental health screening data (n = 9,282) is to be analyzed using one weight variable (NCSRWTSH), while another weight variable (NCSRWTLG) is the correct weight for analyses involving variables measured for only the in-depth Part II subsample (n = 5,692).
Some public-use data sets may contain a large set of weight variables known as replicate weights. For example, the 1999–2000 NHANES public-use data file includes the replicate weight variables WTMREP01, WTMREP02,
…, WTMREP52. As mentioned in Chapter 3, replicate weights are used in combination with software that employs a replicated method of variance esti-mation, such as balanced repeated replication (BRR) or jackknife repeated rep-lication (JRR). When a public-use data set includes replicate weights, design variables for variance estimation (stratum and cluster codes) will generally not be included (see Section 4.3), and the survey analyst needs to use the pro-gram syntax to specify the replicated variance estimation approach (e.g., BRR, JRR, BRR-Fay) and identify the sequence of variables that contain the replicate weight values (see Appendix A for more details on these software options).
© 2010 by Taylor and Francis Group, LLC
94 Applied Survey Data Analysis
4.2.2 Determining the Distribution and Scaling of the Weight Variables In everyday practice, it is always surprising to learn that an analyst who is struggling with the weighted analysis of survey data has never actually looked at the distribution of the weight variable. This is a critical step in pre-paring for analysis. Assessing a simple univariate distribution of the analysis weight variable provides information on (1) the scaling of the weights; (2) the variability and skew in the distribution of weights across sample cases; (3) extreme weight values; and (4) (possibly) missing data on the analysis weight.
Scaling of the weights is important for interpreting estimates of totals and in older versions of software may affect variance estimates. The variance and distribution of the weights may influence the precision loss for sample estimates (see Section 2.7). Extreme weights, especially when combined with outlier values for variables of interest, may produce instability in estimates and standard errors for complete sample or subclass estimates. Missing data or zero (0) values on weight variables may indicate an error in building the data set or a special feature of the data set. For example, 2005–2006 NHANES cases that completed the medical history interview but did not participate in the mobile examination center (MEC) phase of the study will have a posi-tive, nonzero weight value for WTINT2YR but will have a zero value for WTMEC2YR (see Table 4.1).
Table 4.1
Descriptive Statistics for the Sampling Weights in the Data Sets Analyzed in This Book
n 5,692 9,282 5,563 5,563 18,467 18,467
Sum 5,692 9,282 217,700,496 217,761,911 75,540,674 82,249,285 Mean 1.00a 1.00a 39,133.65 39,144.69 4,144.73 4,453.85
SD 0.96 0.52 31,965.69 30,461.53 2,973.48 3,002.06
Min 0.11 0.17 0b 1,339.05 0b 0b
Max 10.10 7.14 156,152.20 152,162.40 16,532 15,691
Pctls.
1% 0.24 0.36 0 2,922.37 0 0
5% 0.32 0.49 2,939.33 4,981.73 0 1,029
25% 0.46 0.69 14,461.86 16,485.70 2,085 2,287
50% 0.64 0.87 27,825.71 28,040.22 3,575 3,755
75% 1.08 1.16 63,171.48 62,731.71 5,075 5,419
95% 2.95 1.85 100,391.70 96,707.20 10,226 10,847
99% 4.71 3.17 116,640.90 113,196.20 12,951 14,126
a Suggests that the sampling weights have been normalized to sum to the sample size.
b Cases with weights of zero will be dropped from analyses and usually correspond to indi-viduals who were not eligible to be in a particular sample.
c 2005–2006, NHANES adults.
Table 4.1 provides simple distributional summaries of the analysis weight variables for the NCS-R, 2006 HRS, and 2005–2006 NHANES data sets.
Inspection of these weight distributions quickly identifies that the scale of the weight values is quite different from one study to the next. For example, the sum of the NCS-R Part I weights is
NCSRWTSHi i
∑
=9 282,while the sum of the 2006 HRS individual weights is KWGTRi
∑
i = 75 540 674, ,With the exception of weighted estimates of population totals, weighted estimation of population parameters and standard errors should be invari-ant to a linear scaling of the weight values, that is, wscale i, = ⋅k wfinal i, , where k is an arbitrary constant. That is, the data producer may choose to multiply or divide the weight values by any constant and with the exception of estimates of population totals, weighted estimates of population parameters and their standard errors should not change.
For many surveys such as the 2005–2006 NHANES and the 2006 HRS, the individual case weights will be population scale weights, and the expected value for the sum of the weights will be the population size:
E wi N
i n
∑
=
=
1
For other survey data sets, a normalized version of the overall sampling weight is provided with the survey data. To “normalize” the final overall sampling weights, data producers divide the final population scale weight for each sample respondent by the mean final weight for the entire sample:
wnorm i wi w ni w wi
i
, = / / /
=
∑
Many public-use data sets such as the NCS-R will have normalized weights available as the final overall sampling weights. The resulting normalized weights will have a mean value of wnorm= 1 0. , and the normalized weights for all sample cases should add up to the sample size:
© 2010 by Taylor and Francis Group, LLC
96 Applied Survey Data Analysis
wnorm i n
i
∑
, =Normalizing analysis weight values is a practice that has its roots in the past when computer programs for the analysis of survey data often misin-terpreted the “sample size” for weighted estimates of variances and covari-ances required in computations of standard errors, confidence intervals, or test statistics. As illustrated in Section 3.5.2, the degrees of freedom for vari-ance estimation in analyses of complex sample survey data are determined by the sampling features (stratification, clustering) and not the nominal sample size. Also, some data analysts feel more comfortable with weighted frequency counts that closely approximate the nominal sample sizes for the survey. However, there is a false security in assuming that a weighted fre-quency count of
wi
∑
= 1 000,corresponds to an effective sample size of neff = 1,000. As discussed in Section 2.7, the effective sample size for 1,000 nominal cases will be determined in part by the weighting loss, Lw, that arises due to variability in the weights and the correlation of the weights with the values of the survey variables of interest. Fortunately, normalizing weights is not necessary when analysts use computer software capable of incorporating any available complex design information for a sample into analyses of the survey data.
4.2.3 Weighting applications: Sensitivity of Survey estimates to the Weights
A third step that we recommend survey analysts consider the first time that they work with a new survey data set is to conduct a simple investigation of how the application of the analysis weights affects the estimates and stan-dard errors for several key parameters of interest.
To illustrate this step, we consider data from the NCS-R data set, where the documentation indicates that the overall sampling weight to be used for the subsample of respondents responding to both Part I and Part II of the NCS-R survey (n = 5,692) is NCSRWTLG. A univariate analysis of these sampling weights in Stata reveals a mean of 1.00, a standard deviation of 0.96, a mini-mum of 0.11, and a maximini-mum of 10.10 (see Table 4.1). These values indicate that the weights have been normalized and have moderate variance. In addition, we note that some sampling weight values are below 0.50. Many standard statistical software procedures will round noninteger weights and set the weight to 0 if the normalized weight is less than 0.5—excluding such cases from certain analyses. This is an important consideration that underscores
the need to use specialized software that incorporates the overall sampling weights correctly (and does not round them).
We first consider unweighted estimation of the proportions of NCS-R Part II respondents with lifetime diagnoses of either major depressive episode (MDE), measured by a binary indicator equal to 1 or 0, or alcohol dependence (ALD; also a binary indicator), in Stata:
mean mde ald if ncsrwtlg != .
Variable Mean MDE 0.3155 ALD 0.0778
Note that we explicitly limit the unweighted analysis to Part II respon-dents (who have a nonmissing value on the Part II sampling weight vari-able NCSRWTLG). The unweighted estimate of the MDE proportion is 0.316, suggesting that almost 32% of the NCS-R population has had a lifetime diagnosis of MDE. The unweighted estimate of the ALD proportion is 0.078, suggesting that almost 8% of the NCS-R population has a lifetime diagnosis of alcohol dependence.
We then request weighted estimates of these proportions in Stata, first identifying the analysis weight to Stata with the svyset command and then requesting weighted estimates by using the svy: mean command:
svyset [pweight = ncsrwtlg]
svy: mean mde ald
Variable Mean MDE 0.1918 ALD 0.0541
The weighted estimates of population prevalence of MDE and ALD are 0.192 and 0.054, respectively. The unweighted estimates for MDE and ALD therefore have a positive bias (there would be a big difference in reporting a population estimate of 32% for lifetime MDE versus an estimate of 19%).
In this simple example, the weighted estimates differ significantly from the unweighted means of the sample observations. This is not always the case. Depending on the sample design and the nonresponse factors that contributed to the computation of individual weight values, weighted and unweighted estimates may or may not show significant differences. When this simple comparison of weighted and unweighted estimates of key population parameters shows a significant difference, the survey analyst should aim to understand why this difference occurs. Specifically, what are
© 2010 by Taylor and Francis Group, LLC
98 Applied Survey Data Analysis
the factors contributing to the case-specific weights that would cause the weighted population estimates to differ from an unweighted analysis of the nominal set of sample observations?
Consider the NCS-R example. We know from the Chapter 1 description that, according to the survey protocol, all Part I respondents reporting symp-toms of a mental health disorder and a random subsample of symptom-free Part I respondents continued on to complete the Part II in-depth interview.
Therefore, the unweighted Part II sample contains an “enriched” sample of persons who qualify for one or more mental health diagnoses. As a con-sequence, when the corrective population weight is applied to the Part II data, the unbiased weighted estimate of the true population value is substan-tially lower than the simple unweighted estimate. Likewise, a similar com-parison of estimates of the prevalence of physical function limitations using 2005–2006 NHANES data would yield weighted population estimates that are lower than the simple unweighted prevalence estimates for the observed cases. The explanation for that difference lies in the fact that persons who self-report a disability are oversampled for inclusion in the NHANES, and the application of the weights adjusts for this initial oversampling.
Repeating this exercise for a number of key variables should provide the user with confidence that he or she understands both how and why the application of the survey weights will influence estimation and inference for the population parameters to be estimated from the sample survey data. We note that these examples were designed only to illustrate the calculation of weighted sample estimates; standard errors for the weighted estimates were not appropriately estimated to incorporate complex design features of the NCS-R sample. Chapter 5 considers estimation of descriptive statistics and their standard errors in more detail.
4.3 Understanding and Checking the