Utility of weighting - 3 Statistical effects of sampling and weighting

3 Statistical effects of sampling and weighting

3.3.7 Utility of weighting

Having chosen and implemented a weighting procedure for a given survey, it is reasonable to ask whether the weighting is ‘useful’ or not. In other words, is there any

121 3.3 weighting effect and calibrated sample size

‘gain’ for survey estimates when the weighting is introduced? Furthermore, it would be even more beneficial to have not just a simple ‘yes/no’ answer but a mathematical tool that would measure how useful the weighting is, a measure of the utility of weighting. It is clear that the utility should be measured for each estimate individually, similarly to the design effect, because different estimates are affected by weighting in different ways. In this subsection, we introduce one such measure of utility which is easy to compute and which could be used in practice.

There are two main factors that should, in our opinion, be reflected in the definition of any measure of weighting utility. The first factor is the difference between an unweighted survey estimate and the same estimate after weighting. The utility should then increase or decrease with this difference: the bigger the difference, the greater the utility of weighting is presumed to be. It may happen, of course, that the difference is high simply because the weighting is ‘wrong’ for this estimate. However, the assumption here is that the weighting was done in the best possible way so that weighted estimates are assumed to be more accurate than unweighted ones. In fact they represent our best estimates, based on the sample data.

These arguments can be illustrated by a simple example. Assume, for instance, that we have a disproportionately stratified random sample. In this sample, the stratification weights will have a big impact on estimates that are widely varied between strata. For these estimates, therefore, weighting is very beneficial because unweighted estimates could be very wrong. If, however, a variable is constant or almost constant across all strata, its estimates will not be strongly affected by weighting and so the corresponding weighting utility ought to be low.

The second factor to be included into any measure of weighting utility is related to the difference between the final confidence limits and the confidence limits from an unweighted sample. In other words, one should assess how much ‘wider’ a confidence interval of an estimate becomes once weighting has been taken into account. The most natural thing to consider would be, perhaps, the difference between the standard errors (the final error less the unweighted-sample error). However, a standard error reflects

all sampling and weighting effects, not just the effects of weighting. Therefore, if we

are interested solely in weighting, it is more appropriate to talk in terms of calibrated errors (see section 3.3.2 for details) rather than standard errors (although for a simple random sample, the calibrated error is in fact the same as the standard error). In other words, we consider the difference between the final calibrated error and the calibrated error that would come from an unweighted sample. This factor, therefore, works in the opposite direction: if the difference between calibrated errors is increasing, the utility of weighting should go down and if the difference is decreasing, the utility should go up.

Based on these arguments, the authors propose the following measure of util- ity of weighting. Let x1, . . . , xn be observations of variable x in a sample with

n respondents. Denote by ¯x the final, weighted estimate of the mean value

of x: ¯ x= n i=1wixi n i=1wi ,

wherewiis the weight of respondent i . On the other hand, denote by ¯xunwthe original

unweighted estimate of the mean value: ¯ xunw= 1 n n i=1 xi. Utility defined

Then the utility of weighting for estimate ¯x (notation U ( ¯x)) is defined as the ratio of the difference between estimates to the difference between calibrated errors

U ( ¯x)= |¯x − ¯xunw|

|c.e.(¯x) − c.e.(¯xunw)|.

(3.27) Recall that the calibrated error is the simple random sample error times the square root of the weighting effect. Therefore, the calibrated error of the unweighted sample estimate ¯xunw is simply the standard error S/

√

n (because the weighting effect is

1.0), where S is the standard deviation of variable x. On the other hand, the final calibrated error of ¯x is defined by formula (3.24) so that the utility can be expressed as

U ( ¯x)=

√

n|¯x − ¯xunw|

S(√WE− 1), where WE is the weighting effect.

In the case of proportion estimates, the standard deviation S is computed as √

p(1− ˆp) so that in this case we obtain U ( ˆp)= | ˆp − ˆpunw|

|c.e.( ˆp) − c.e.( ˆpunw)|

= √ n| ˆp − ˆpunw| √ ˆ p(1− ˆp)(√WE− 1). (3.28)

The utility of weighting depends, like the standard error, on the square root of the sample size: the higher n, the higher the utility (which makes sense because in this case the confidence interval becomes narrower).

It is also clear that if utilities of weighting for several estimates from the same sample and over the same subset are to be compared, the sample size n and weighting effect WE will be constant. Therefore, the relative difference between utilities in this case will depend only on the term|¯x − ¯xunw|/S (or | ˆp − ˆpunw|/

√ ˆ

p(1− ˆp) for

proportions).

What are ‘high’ and ‘low’ values for the utility? There is no ‘mathematical’ answer to this question, and an interpretation of utility values depends on the particular sample design. One fairly obvious criterion would be that the change in calibrated errors (see section 3.3.2) should not exceed the change in estimates so that utility values less than 1.0 should be considered as very ‘low’. In other words, the conclusion in the case of a utility value less than 1.0 would be that the weighting produces more loss than gain, and if it happens for many estimates, the whole weighting procedure should perhaps be revised.

To illustrate this, consider, for example, the case when the weighting effect is 2.0 and the sample contains 100 respondents. Table 3.7 gives utility values for several selected values of a proportion estimate ˆp and the difference| ˆp − ˆpunw|.

123 3.3 weighting effect and calibrated sample size

Table 3.7. Utility of weighting for n = 100 and WE = 2.0

| ˆp − ˆpunw| ˆ p 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.01 2.426 4.853 7.279 9.706 12.132 14.558 16.985 19.411 21.837 24.264 0.02 1.724 3.449 5.173 6.898 8.622 10.347 12.071 13.796 15.520 17.244 0.03 1.415 2.830 4.246 5.661 7.076 8.491 9.907 11.322 12.737 14.152 0.04 1.232 2.464 3.696 4.928 6.160 7.392 8.624 9.856 11.088 12.320 0.05 1.108 2.215 3.323 4.431 5.539 6.646 7.754 8.862 9.969 11.077 0.06 1.017 2.033 3.050 4.066 5.083 6.099 7.116 8.133 9.149 10.166 0.07 0.946 1.892 2.839 3.785 4.731 5.677 6.623 7.570 8.516 9.462 0.08 0.890 1.780 2.670 3.560 4.449 5.339 6.229 7.119 8.009 8.899 0.09 0.844 1.687 2.531 3.374 4.218 5.062 5.905 6.749 7.592 8.436 0.10 0.805 1.609 2.414 3.219 4.024 4.828 5.633 6.438 7.243 8.047 0.11 0.772 1.543 2.315 3.086 3.858 4.630 5.401 6.173 6.944 7.716 0.12 0.743 1.486 2.229 2.972 3.715 4.458 5.200 5.943 6.686 7.429 0.13 0.718 1.436 2.154 2.871 3.589 4.307 5.025 5.743 6.461 7.179 0.14 0.696 1.392 2.087 2.783 3.479 4.175 4.870 5.566 6.262 6.958 0.15 0.676 1.352 2.028 2.704 3.381 4.057 4.733 5.409 6.085 6.761 0.20 0.604 1.207 1.811 2.414 3.018 3.621 4.225 4.828 5.432 6.036 0.25 0.558 1.115 1.673 2.230 2.788 3.345 3.903 4.460 5.018 5.575 0.30 0.527 1.054 1.580 2.107 2.634 3.161 3.688 4.215 4.741 5.268 0.35 0.506 1.012 1.518 2.025 2.531 3.037 3.543 4.049 4.555 5.062 0.40 0.493 0.986 1.478 1.971 2.464 2.957 3.450 3.942 4.435 4.928 0.45 0.485 0.971 1.456 1.941 2.426 2.912 3.397 3.882 4.367 4.853 0.50 0.483 0.966 1.449 1.931 2.414 2.897 3.380 3.863 4.346 4.828

For other values of n and WE multiply the figures in Table 3.7 by 0.0414√n

√

WE− 1.

Thus if the sample size were 1000 and the weighting effect the same, all the above utility figures would be increased by a factor of over three (√10). It must be borne in mind, of course, that where the weighting effect is close to 1.0 the changes in estimates induced by the weighting will probably be small. It turns out that there is a close relationship between utility of weighting and calibrated confidence limits. The word ‘calibrated’ in this context means that the standard error is replaced by the calibrated error. For example, the 95% calibrated confidence limits for ¯x are defined as ¯x± 1.96 c.e.(¯x). These calibrated limits can be compared with the original ‘unweighted sample’ confidence limits ¯xunw± 1.96 c.e.(¯xunw), and this com-

parison is related to the utility of weighting. More precisely, denote, for simplic- ity, by e the calibrated error of ¯x and by eunw the calibrated error of ¯xunw (so that

U ( ¯x)= |¯x − ¯xunw|/(e − eunw)). Then we have the following result (see Appendix E

for a proof).

Proposition 3.2 Let k be a positive real number. Using the notations above, the statement U ( ¯x)≤ k is equivalent to the statement that the interval [¯x − ke, ¯x + ke] contains the interval [ ¯xunw− keunw, ¯xunw+ keunw].

In particular, the equality U ( ¯x)= k implies that either the left ends or the right ends of the two intervals are the same.

This result allows us to say, for example, that utility values less than 1.96 can still be considered as too low: a value less than 1.96 means that the 95% ‘calibrated’ confidence limits for ¯x will contain the ‘original’ 95% confidence limits for ¯xunw.

This means in fact that our ‘confidence’ in the estimate has decreased so that the weighting for this estimate was not ‘useful’.

But even 1.96 is only a lower bound for ‘low’ values and it is up to the user to decide which utility values are ‘low’ for a particular sample. The critical utility value is equal to the z-score associated with the confidence interval the user has in mind. Besides, the utility function is only an estimate and subject to its own sampling error. Finally, we indicate several situations where calculation of utility of weighting could be beneficial.

r As was discussed earlier, the utility of weighting can be used to identify variables for which weighting does not have any gain. If the sample size is small, there could be many estimates with a low utility of weighting (because standard errors are relatively high). If this is the case, it would be worthwhile to revise the weighting procedure to improve the overall utility of weighting.

r Formulae (3.27) and (3.28) can be applied to find out which estimates are most affected by weighting. This could produce a useful insight into the sample design and perhaps even indicate how the weighting procedure could be improved. r ‘Weighting performance’ can be compared for various groups of variables by com-

puting the average utility of weighting for each group. For instance, the overall average weighting utility can be compared with the average utility among key vari-

ables. If the latter figure is much lower than the overall average utility of weighting,

it is a good reason for concern about sample design. If there are expectations that one group of variables should be affected by weighting more than another group, this kind of analysis could be especially beneficial.

r Average weighting utility figures (either across all variables or across groups of variables) can be compared with similar figures from another sample of a similar size and design.

An example of low utility of weighting is given by Ward et al. [59] in their examination of the Politz–Simmons ‘not-at-home’ correction method (see [45]), where they found a severe erosion of precision through the use of weights with high variance was accompanied by only small changes in most estimates.

125 3.3 weighting effect and calibrated sample size

In document Statistics for Real-Life Sample Surveys (Page 132-137)