Part 1: Sampling Errors
8.5 Weighting adjustment
8.5.1 The basic method
The population total of a survey variable y is estimated by k
å
wkyk , where the sum isacross respondents. The basic idea is that each responding unit ‘represents’ w populationk
units. The weight may be expressed as wk =wskwnrk, where w is the sampling weight andsk
nrk
w the nonresponse weight. Various methods may be used to construct the weights. In
practice a single set of weights will usually be used for all survey variables. This is desirable not only for simplicity of computation but also to ensure that arithmetic relationships between variables (for example total capital expenditure is the sum of the components of capital expenditure) are preserved in the estimates. For this reason weighting, is the standard procedure used to adjust for unit nonresponse (which applies to all variables in a uniform way) but is usually unsuitable for item nonresponse, since different weights will be necessary for variables for which values are missing for different units.
8.5.2 Use of auxiliary information
In order to reduce nonresponse bias it is necessary to use auxiliary information about units which are not respondents. Two broad kinds of information may be used. First, certain information may be available on nonrespondents in the sample but not for other population units. One example arises in a monthly business survey when the sample consists of the same
businesses each month. In this case information may be available on sample businesses in February, say, which may be used to weight for nonresponse in March. Such weighting is called sample-based weighting. Quantitative information on nonrespondents, such as reported values from the previous month in a monthly survey, is more likely to be used for imputation than for weighting. Categorical information, such as an industrial classification, might be used to define response homogeneity groups within which the nonresponse weights may be determined by the inverse response rates.
The second broad kind of information is that available on the whole population, most obviously information recorded on the business register. Weighting methods based on such information are called population-based weighting. The following two sections concern different methods of such population-based weighting.
8.5.3 Poststratification
This method is applicable when a classification of business is available which was not used for sampling. The classification partitions businesses into ‘poststrata’ g, where the number of
businesses Ng within poststratum g is known. An example arises when the classification of
businesses by industry or size is updated and considered to be more accurate than the original classification used for sampling (Hidiroglou et al., 1995). The poststratified estimator of a
total takes the weighted form
å
wskwnrkyk in section 8.5.1, where the nonresponse weightfor all units in poststratum g is wnrk =Ng Nˆg , and Nˆ is obtained by summing the sampleg
weights w across responding units in poststratum g.sk
8.5.4 Regression estimation and calibration
Poststratification is a special case of regression estimation which itself is a special case of calibration estimation (Deville & Särndal, 1992; Lundström, 1997). Methods of ratio estimation used widely for business surveys are also special cases.
The simplest approach to handling unit nonresponse in these methods is to treat the respondents as the achieved sample with inclusion probabilities proportional to the sample inclusion probabilities. If the regression relationship between the survey variable and the auxiliary variables is the same for respondents and nonrespondents, the corresponding regression (or calibration) estimator will remove bias due to nonresponse (Hidiroglou et al., 1995, p.491). This is essentially the missing at random condition referred to earlier. Under departures from this assumption, regression estimation may still be useful for reducing nonresponse bias. A more complex approach involves first adjusting the sample inclusion probabilities by estimated nonresponse probabilities. Bethlehem (1988) argues that this adjustment may be expected to reduce bias.
8.5.5 Weighting and nonresponse errors
Weighting may be expected to affect both the bias and the variance arising from nonresponse. The aim is to remove nonresponse bias although, in practice, this is unlikely to be fully
achieved. A comparison of alternative weighted estimators provides some idea of how bias may vary according to different assumptions. These assumptions will be of the form ‘missing at random given measured auxiliary variables’. These auxiliary variables might, for example, be those used to define response homogeneity groups in the sample, or to define poststrata for population weighting. A comparison of weighted estimators therefore represents a sensitivity analysis with respect to a limited set of assumptions.
Weighting will also generally affect the variance of the total survey errors in two ways. First, poststratification and more generally calibration weighting can act to reduce the variance if the auxiliary variables used help to predict the survey variables within strata. Second, variability in the weights can inflate the variance and this variance inflation tends to increase as the amount of auxiliary information increases (Nascimento Silva & Skinner, 1997).
8.5.6 Variance estimation
There exists a number of variance estimators in the presence of nonresponse. The simplest is to treat the nonresponse weights as fixed quantities for which variation between weights inflates the variance. This approach fails to allow for the reduction of variance achieved by population weighting. This variance reduction is allowed for by standard variance estimators for calibration estimation (for example Deville & Särndal, 1992). More complications arise if sample-based weighting is also involved. In this case, more complicated variance estimators are required, which include components both at the sample level and at the respondent level (Särndal & Swensson, 1987; Lundström, 1997). All of these estimators effectively make a missing at random assumption and thus do not allow for the possibility of informative nonresponse. See the chapter on model assumption errors (section 9.7) for further discussion of this case.