• No results found

Statistical considerations

2.5.4 ‘Quantity weights’ approach

2.7 Statistical considerations

In this chapter we have examined the reasons for applying weighting and the methods of doing it. Broadly, weighting is aimed at improving theaccuracy of sample-based

estimates.

The following chapter is devoted to the estimation of the effect of both sampling and weighting on the precision of estimates. But to round off this discussion of

the processes of weighting it may be appropriate to summarise (even at the risk of some repetition) the effects of the weighting itself. The price of increased accuracy is (normally) a loss in precision. This loss may be small and affordable, and the net effect can (and should) be an improvement in the overall quality of the information yielded by the sample. But if the weighting is not carried out with some discrimination and with care the result will not necessarily be an improvement.

Weighting involves a calculable increase in the variance of estimates and thus a widening of confidence limits and a loss of discrimination in significance testing. The loss of precision is related to the range of weights applied. The greater the range of weights, the greater the loss.

For any sample-based estimate, a measure of the overall loss of precision can be obtained by calculating the ‘effective sample size’. This is, as it were, the size of the ‘pure’ sample which would have the same degree of precision as the weighted actual sample. It is this figure, rather than the unweighted sample size, that should be used in significance tests and in estimating confidence limits. Where a subset of the sample is chosen the same formula is applied within that subset. If the subset corresponds to one cell in a cell weighting matrix, for example, all the respondents will have the same weight so the weighting will have no effect on the effective sample size.

The weighting effect

The ratio of actual sample size to effective sample size is known as the ‘design effect’. Design effects are typically (but not always) greater than 1.0 (i.e. the results are less precise than the unweighted sample size suggests). However, the design effect is not a constant for the whole survey. It can vary from question to question and even for different answer categories within a question as well as within different subgroups of the sample. The weighting effect, however, remains constant for all questions within any one subsample. Where weighting has been employed the contribution of the weighting process to the overall design effect can readily be calculated. The ‘weighting effect’ is a component of the design effect that can be independently estimated. Fortunately the calculation is a simple one involving just the sum of the weights and the sum of their squares.

Calculation of the weighting effect will give exactly the same result whether the weights are calculated to average 1.0 or are scaled to represent a population total or sum to some other arbitrary total. The scaling of weights has no effect on the statistics. However, some statistical packages do not deal correctly with weighted data when performing significance tests and may treat the sum of the weights as though it were the (unweighted) sample size. If the sum of the weights is greater than the effective sample size such packages will overstate the significance of results. If their sum is smaller, the significance will be understated. The usual, but unsatisfactory, response in these cases is ‘run significance tests using unweighted data’. As weighting was presumably done to redress biases or imbalances, the difference being tested for may be greater or smaller in the unweighted data than in the final weighted data and the weighted difference is presumably the better estimate. As will be demonstrated in Chapter 4 the most commonly used significance testscan, with care, be performed

77 2.8 ethical considerations

satisfactorily with weighted data. However, for potential problems with statistical software see section 3.3.8.

The picture is more complicated where the sample has been stratified and/or clus- tered. Stratification may benefit some estimates. Clustering erodes the effective sample size. Calculation of the combined effects of stratification, clustering, non-response weighting, etc. is a complex issue, discussed in some detail in Chapter 3, but the more complex the sample, by and large, the greater will be the design effect. However, it is worth repeating that the combined effect of all these design parameters is not constant across the survey but can vary from question to question.

Whatever method of weighting is applied it is essential to check what effect the weighting process has had. Design effects and weighting effects should be calcu- lated for some of the principal measures. Methods of assessing the design effect are discussed in Chapter 3.

Even simple checks such as the variance of the weights (within strata if appropriate) will give an indication of the likely loss of precision caused by the weighting process. In a multi-stage weighting procedure this should be done at each stage so that the contribution of each stage to the variance of weights can be assessed. This should be a part of the output of any weighting software.

And finally, how much has the weighting actually changed the original estimates? How different are the weighted from the unweighted estimates? If the weighting is so complex that it broadens the confidence intervals of our estimates out of all proportion to the changes it imposes on them, then it may be positively unhelpful. An estimate of 50%± 2% may be of more use than a probably more correct estimate of

49%± 4%.