• No results found

G. SAMPLING ERROR ESTIMATION

1. Background

The CTS Household Survey sample design is complex and therefore requires specialized techniques for estimation of sampling variances. Standard statistical packages, such as SAS and SPSS, compute variances using formulas under the assumption that the data are from a simple random sample from an infinite population. Although the simple random sample variance may approximate the sampling variance, in some surveys it is likely to substantially underestimate the sampling variance with a design as complex as the CTS’s. Departures from a simple random

The sampling variance is a measure of the variation of an estimator attributable to having sampled a portion of the full population of interest using a specific probability-based sampling design. The sampling variance represents the average squared differences of the observations from their expected value over all possible samples of the same size and using the same sampling design. The classical population variance is a measure of the variation among the observations in the population, whereas a sampling variance is a measure of the variation of the estimate of a population parameter (for example, a population mean or proportion) over repeated samples. The population variance is different from the sampling variance in the sense that the population variance is a constant, independent of any sampling issues, whereas the sampling variance decreases in size as the sample size increases. The sampling variance is zero when the full population is observed, as in a census.

Based on the sampling variance, a series of measures of reliability can be computed for a parameter estimate or statistic. The standard error is the square root of the sampling variance. Over repeated samples of the same size and using the same sampling design, we expect that the true value of the statistic would differ from the sample estimate by less than twice the standard error in approximately 95 percent of the samples. The degree of approximation depends on the distributional characteristics of the underlying observations. The relative standard error is the standard error divided by the sample estimate and is usually presented as a percentage. In general, an estimate of a population parameter with a relative standard error of 50 percent is considered unreliable and is not reported. Furthermore, an estimate with a relative standard error of greater than 30 percent may be reported but also may be identified as potentially unreliable.

For the CTS Household Survey, the sampling variance estimate, called the design-based

sampling variance, is a function of the sampling design and the population parameter being

which are derived from the sampling design, with adjustments to compensate for nonresponse and for ratio-adjusting the sampling totals to external totals (for example, to data on population totals by age and race/ethnicity generated by the Bureau of the Census from the Current Population Survey).

For combined national estimates at the person level, the average design effect over a representative set of variables is 2.6. With 59,956 observations, the Household Survey has the equivalent precision of a simple random sample with a size of about 22,675. Note that the design effect is generally lower for subclasses of the population because there is less clustering of observations.

The data files for the CTS Household Survey contain a set of fully adjusted sampling weights and information on analysis parameters (that is, stratification and analysis clusters) necessary for the estimation of the sampling variance for a statistic. Because of the stratification and unequal sampling rates, it was necessary to account for the sampling weights and the sampling design features in order to compute unbiased estimates of population parameters and their associated sampling variances. The estimation of the sampling variance required the use of special survey data analysis software or specially developed programs designed to accommodate the population parameter being estimated and the sampling design. The CTS Household Survey Public Use File for Round Two (Technical Publication No. 25. www.hschange.org), contains tables of standard errors for various types of estimates and provides a link to ICPSR.

correlation and regression coefficients. In general, the variances of nonlinear statistics cannot be expressed in a closed form. Woodruff (1971) suggested a procedure in which a nonlinear estimator is linearized by a Taylor series approximation. The sampling variance equation is then used on this linear form (called a linearized variate) to produce a variance approximation for the original nonlinear estimator.

Most common statistical estimates and analytic tools (such as percentages, percentiles, and linear and logistic regression) can be implemented using Taylor series approximation methods. Survey data software, such as SUDAAN (Shah et al. 1997), uses the Taylor series linearization procedure and can handle the multistage CTS Household Survey design, joint inclusion probabilities, and the stratification and clustering components of variance.

Other software packages use the Taylor series approximations (for example, Stata, SAS SurveySelect, and PC-CARP), but they do not account for the survey design as completely as does SUDAAN. A major advantage of SUDAAN is that site selection for the Household Survey used a high sampling rate, with unequal selection probabilities, and without replacement sampling. The SUDAAN estimation algorithm incorporates a finite population correction factor. Failure to account for the finite population correction causes an overestimate of the variance for national estimates based on the site sample.

The alternative to using the Taylor series approximations is to use a replication technique, such as balanced repeated replications, jackknife, or boot strapping. WESVAR uses replication techniques to estimate sampling errors, but the current version does not allow for the incorporation of the finite population correction for unequal probability sampling.

Related documents