3 Statistical effects of sampling and weighting
3.2 Design effect and effective sample size
Design effect (notation DE) is defined as the ratio of the variance of an estimate derived
from a survey to the variance of an estimate of the same measure based on a simple random sample of the same size. The actual variance is therefore the variance of a simple random sample estimate multiplied by the design effect. Hence, the standard
error of an estimate is the standard error of a similar estimate from a simple random
sample of the same size multiplied by the square root of the design effect. Effective
sample size (notation ne) is the original unweighted sample size n divided by the
design effect. In other words, it is the size of a ‘simple random’ sample that would yield an estimate with equivalent variance.
These definitions may seem a trifle artificial but there is a simple and natural logic behind them. To illustrate this, denote by x a variable we measure and by ¯x an estimate of the mean value of x. The letters V and Vrandenote the actual variance and variance
91 3.2 design effect and effective sample size
for simple random sampling, respectively. It is well known that for a simple random sample there is the following relationship:
Vran( ¯x)=
V (x) n .
Therefore, the idea for a general case is to replace n by something else so that the same formula could be applied. It is then natural to call this substitute for n an ‘effective sample size’. It can easily be seen that the definitions above do agree with this idea. In fact, by definition DE= V (¯x)/Vran( ¯x) so that
V ( ¯x)= V (x) V (x)/V (¯x) = V (x) nVran( ¯x)/V (¯x) = V (x) n/DE= V (x) ne .
Summarised below are the most important general properties of the design effect. r A design effect applies to a specific attribute among a specific subset of the sample,
not to the survey, or even to a question within the survey, or to the sample, as a whole. It is customary to carry out design effect calculations for selected key indicators to provide a general indication of the general levels of design effect that should be factored into any interpretation of the results.
r If the design effect is less than one, the effective sample size will be greater than the initial sample size, which means that simple random sampling would require a
greater sample size to achieve the same precision. This seldom happens in practice
and typical values of the design effect for a complex survey are from 1.5 to 3.5. r For an estimate of a proportion, the design effect is symmetric, that is, it will be
the same as for the estimate’s complement. If the proportion estimate is zero or one, neither the design effect nor the effective sample size is defined (because both variances in the design effect definition will then be zero).
r Clustering and weighting practically always result in an increase of the design effect while stratification, if used intelligently, may decrease the design effect.
r Design effect does not depend on the scale of the weighting factors so that it will stay the same when all weights are multiplied or divided by a constant.
r Design effect depends much more on the complexity of the sample design than on the complexity of an estimate.
When two population groups are combined, the combined design effect can be greater or less than those of the individual groups. When the two group estimates are simi-
lar, a disproportionate sampling will usually markedly increase the design effect for
the combined group. If the estimates are different, an optimum sampling allocation (which minimises the overall variance) for the two groups will usually help to reduce the combined design effect. If the allocation is far from optimum, the combined effect again can go up. For instance, if we have two equal population groups within which the incidence of an attribute is 50% and 1%, respectively, then sampling 100 respondents out of 1000 from the first group and 900 from the second one will pro- duce a high combined design effect. To minimise the combined effect, clearly more respondents should be allocated to the first group because the first group will pro- duce a greater contribution to the combined variance than the second group, for that
proportion. It must be remembered, however, that an allocation that is optimal for one estimate may be less beneficial or have adverse effects for other estimates in the same survey.
Notice that a small design effect does not necessarily mean that the standard error will be small. For instance, for an estimate that is a ratio of ratios the standard error may already be high for simple random sampling. Therefore, even a ‘good’ design effect, of say 1.1, will still increase the standard error.
3.2.1
Effects of stratification and clustering
Stratification normally has a beneficial influence on the design effect when it is used to ensure that the strata are sampled in proportion to their populations and there is a difference between strata in the incidence of the attribute being measured. Where strata differ in this way or in their (internal) variability or in the cost of fieldwork it is possible to optimise the allocation of interviews between strata, for instance to obtain the greatest overall precision within a given budget.
One example of an allocation with a minimum variance within a given budget is given by Cochran [9], theorem 5.6. The cost C is supposed to be a linear function:
C = c0+
csns, where ns is the sample size in stratum s, c0a fixed overhead cost
and csthe per interview marginal cost in stratum s. The result is that, in the case of
simple random sampling in each stratum, the variance is a minimum for a specified cost (and the cost is a minimum for a specified variance) if ns is proportional to
λs√Vs/cs, whereλs is the population proportion of stratum s and Vsis the variance
of the variable in stratum s. An analysis of the proof of this result in [9] shows, however, that the proof will remain the same even in the general case if we introduce the effective sample size and multiply the strata variances by the corresponding design effects. We summarise this in the following proposition.
Proposition 3.1 Let ¯xs be a mean estimate in stratum s and ¯x=
sλsx¯sbe the
total mean estimate, whereλs is the population proportion of stratum s. Let Vs be
the variance of x in stratum s, DEs the design effect for xs and C a linear cost
function: C = c0+
csns. Then, if the sample size ns in stratum s is proportional
toλs
√
VsDEs/cs, the variance is a minimum for a specified cost and the cost is a
minimum for a specified variance.
Dispropor- tionate stratification
The variance of x and the design effects are, of course, not known before a survey is conducted. Therefore, expected values must be used when trying to optimise the sample size at the planning stage.
Where sampling is not proportionate to population, and is not disproportionally allocated to maximise the benefits in terms of the known or anticipated distribution of the measured attributes across the strata, then it is likely to be disadvantageous in terms of the precision of total-sample estimates. This is not necessarily a bad thing: it depends on the priorities. Surveys are generally carried out with a range of
93 3.2 design effect and effective sample size
objectives in view and some of these may benefit from a particular stratification plan while others may not. This can be illustrated by a simple and extreme hypothetical example.
Example 3.1 Suppose we conduct a survey to determine what proportion of the adult population in Australia has tertiary qualifications, and how different the Australian Capital Territory (ACT) is in this respect from the rest of the country. We take a pure and perfectly executed simple random sample of 1000 interviews spread across the nation, 16 (1.6%) of which are in the ACT (i.e. in due proportion to popu- lation). This gives us a reasonable estimate (say 29% incidence with a standard error of 1.4%) for total Australia, but our estimate for the ACT by itself, whatever it might reasonably be, is obviously very wobbly and for practical purposes virtually useless. The relative standard error of the estimated difference between the ACT and the rest is so high as to make the second objective unachievable.
If the dominant objective of the survey had been to measure the difference between the ACT and the rest of Australia (with the same budget) we might have taken two similar simple random samples of 500 interviews in the ACT and 500 in the rest of Australia. This would now give us two estimates of comparable precision for the ACT and the rest of Australia. Assume these to be 47.1% (s.e. 2.23%) for ACT and 28.7% (s.e. 2.02%) for the rest of Australia: because these are true (or more correctly fictitious!) simple random samples the design effect of each is 1.0. We can now be confident that we have a reasonably precise measure of the difference (18%) which has a standard error of√2.232+ 2.022= 3.01%. Dispro-
portionate sampling has helped us to maximise the sensitivity of the comparison. (If we had anticipated the higher incidence in ACT we could have optimised the division of the 1000 interviews between it and the rest of Australia by allocat- ing the numbers such that we got two estimates with equal standard errors, but the extra precision of the ‘standard error of the difference’ would have been very small.)
However, a national estimate synthesised from the two samples must be dominated by the ‘rest of Australia’ component, which makes up 98.4% of the final weighted estimate, at the expense of the ACT component which makes up 1.6% of the estimate. This means that the precision of the national estimate is governed by the precision of the ‘rest of Australia’ estimate, with the 500 ACT interviews contributing little influence, much less than their raw proportion and their share of the overall cost. In fact, the national estimate is 0.016 · 47.1% + 0.984 · 28.7% = 29.0% with standard error √0.0162· 2.232+ 0.9842· 2.022 = 1.99%, which is very little less than the
standard error of the dominant ‘rest of Australia’ component. The standard error for a simple random sampling is
0.29 · 0.71
1000 · 100% = 1.43%.
Therefore, the design effect is 1.992/1.432= 1.93 so that the national esti-
1000/1.93 = 517 respondents (the effective sample size). Combining strata has therefore increased the design effect for the combined estimate. The disproportion- ate number of interviews in the ACT contributes very little to the precision of the estimate, but in this case the precision of the national estimate was legitimately subordinated to the precision of the difference. In the end it all comes down to priorities.
An extreme case
As a demonstration of the lengths to which this can be taken, consider the extreme case where the attribute being measured is ‘membership of stratum s’. The incidence of this attribute is, of course, 100% within stratum s and zero outside it. The variance of each of these components is zero and therefore the variance of the combined estimate is zero. As the estimate is entirely dependent on the estimate of the proportion of the population that is in stratum s, and as this is fixed by the sample design, any number of samples drawn using this design will yield exactly the same estimate. The design effect is therefore zero.
This is an example of an attribute that a survey is not designed to measure but that is nevertheless likely to be reported in the course of analysis and illustrates the extreme effect that stratification can have. It points to the frequently overlooked fact that any estimate of an attribute that is defined wholly in terms of strata, whether these are strata proper or post-stratification groupings, must have zero variance and hence, of course, zero standard error.
There is, in fact, a general formula that allows us to calculate the total design effect from strata design effects (see Appendix E). The formula can be applied not only for stratification but, more generally, when several independent esti- mates are combined into one estimate. For simplicity, we consider only propor- tion estimates.