The electoral register as a sampling frame

42  Download (0)

Full text


The electoral register as a sampling frame

Kate Foster

1. Introduction

The Postcode Address File (PAF) and the electoral register (ER) are the most complete and accessible national frames of residential addresses in Great Britain and both are extensively used for drawing random samples for general population surveys. Although an ideal sampling frame would cover the whole population of interest it is well known that both of these frames are, in practice, incomplete. Among survey practitioners there is considerable interest both in monitoring changes in the coverage of the PAF and the ER and also in defining the characteristics of those who are omitted, since this may be a source of bias in survey results.

The coverage of both sampling frames is currently being assessed by Social Survey Division (SSD) using the sample of households selected for the Electoral Register Check, which was carried out by SSD in conjunction with the Census Validation Survey. This paper reports on the coverage of the 1991 electoral register as a sampling frame of the private household population and updates a similar analysis presented as part of the report on electoral registration in 1981.1 The coverage of the PAF will

be dealt with in a later paper.

Although there is general interest in monitoring changes over time in the coverage of sampling frames, there was particular concern that the electoral register’s coverage might have suffered over recent years because some individuals wanted to avoid registration for the community charge and hence did not register as electors. The check on the 1991 register reported in this paper showed that 94.7% of households were in addresses listed on the register and that 95.4% of the usually resident adult population were in listed addresses. These results indicate a slight deterioration in the coverage of the frame since 1981 when 96.4% of households and 96.5% of adults were in listed addresses.

The register’s coverage was lower among single adult households, those in privately rented accommodation, those in London, especially Inner London, and, elsewhere, in non-metropolitan areas of Great Britain. Coverage among adults was lower for individuals who had moved in the previous 12 months, those in the 20-29 age range, and among non-White ethnic groups.

An assessment of the deterioration of the register as a frame of households over time gave similar results to the 1981 study, suggesting that coverage might

decrease by around 1% over the year in which the register was available for use.


The use of the electoral register as

a sampling frame

The electoral register is compiled as a list of all people eligible to vote in the United Kingdom; this includes citizens of the Commonwealth and the Irish Republic as well as of Great Britain and Northern Ireland who are aged 18 or over or who will become 18 during the life of the register. The register is compiled on October 10th each year, and is in force

from February 16th of the following year for a period

of one year. Because of the time required to bring together in one place the different parts of the new register, it is normally available for sampling purposes roughly from April of the year in which it comes into force through to March of the following year (ie. form April 1991 to March 1992 for the 1991 register).

The register is known to be an incomplete list even of electors, and it will obviously not list adults who are not eligible to vote. The 1991 Electoral Register Check2 showed that, for Great Britain as a whole,

7.1% of eligible people who were recorded in the Census were not included on the register, but that the non-registration rate varied by the individual’s age, length of residence at that address, ethnic origin and region of residence.

Although the register is primarily a list of adults, it is preferable to use it as a sampling frame of addresses because the coverage of addresses is known to be more complete than the coverage of named electors. This is primarily because an address is listed so long as at least one elector is registered there, but coverage of addresses may also be improved because of the practice in some areas of including the same information as on the previous year’s register if no form has been returned by the household.


The method of assessing coverage

The usual method of assessing the coverage of a sampling frame is to identify a representative sample of the target population drawn from an independent source and to check whether the sample members are covered by the frame. In 1991, as in 1981, a suitable sample for an assessment of the electoral register as a sampling frame was provided by the sample for the Electoral Register


Kate Foster The electoral register as a sampling frame

Check (ERC). This survey was carried out alongside the Quality Check element of the Census Validation Survey (CVS) which used a sample of private households drawn from census records.

The sample design for the CVS was a multi-stage probability sample to select 6,000 households in Great Britain that had returned a census form. The sampled households contained about 11,300 usually resident adults aged 17 or over on 15 February 1991. Visitors staying at the sample of addresses on census night (21 April 1991) were excluded from the analysis as they were also listed on the census form for their usual residence. Since the CVS over-sampled areas where enumeration was expected to be difficult, the achieved samples of households and of adults were weighted in the analysis to reflect their different probabilities of selection. The tables in this paper give weighted bases only.

On the electoral Register Check, interviewers transcribed census information onto the questionnaire before checking the entry for the household on the electoral register. This information was therefore available for all cases regardless of their response to the ERC interview. Further items of information relating to informants’ eligibility for inclusion on the register were collected in the interview, including the previous address of adults who had moved since the qualifying date in October. The results in this paper are based on households which returned a census form, although the sample of adults also includes any people in that sample of households who were identified by the CVS as not having been enumerated on the Census. The under-count for the Census is estimated to be 2% of the resident population (around one million people),

about one fifth of whom were identified by the CVS as being missed form enumerated households. Thus about 1.6% of the resident population were missed both in the Census and in this element of the CVS. Insofar as there was deliberate evasion of both the Census and the CVS, it is likely that the ERC sample will tend to underestimate the level of non-registration of individuals on the electoral register. It is, however, probable that this under-enumeration has much less effect on the register’s coverage of addresses than of named individuals.


The coverage of households and of


Although the electoral register is mainly used as a sampling frame of addresses, surveys are generally concerned with households or adults. The adequacy of the electoral register as a sampling frame is therefore assessed by the proportion of households and of adults that are included in listed addresses. Both coverage rates are shown in Table 1.

The study showed that 94.7% of all private households that were occupied on census night were at addresses listed on the electoral register. This coverage rate for households compares with a figure of 96.4% for the 1981 register, so there is evidence of a slight deterioration in the register as a sampling frame for households.

The coverage rate for adults is based on those who were defined as being usually resident at the ERC sample of addresses. Some 95.4% of this sample of adults were in addresses listed on the register. Most of the sample of adults (88.0%) were themselves listed on the register at their April address and a further 7.4% lived in addresses that were listed even though they themselves were not. The difference between the coverage rates for adults and households reflects the variation in rates of coverage by household size, as shown in Table 2.

Table 1 The registration of adults and their addresses in April 1991

Households Adults

Percentage Number Percentage Number

Person on the register at April address na na 88.0 8720

Person not on the register but April

address is na A 7.4 736

Total in addresses on the register 94.7 4862 95.4 9456

Address not on the register 5.3 271 4.6 451

Total (weighted bases) 100 5133 100 9907*

* The sample base for adults is all aged 17 years or over on 15.2.91 who were usually resident on census night in the sampled households.


Kate Foster The electoral register as a sampling frame

Table 2 Households and adults on the register by the number of usually resident adults in the household

Number of adults usually

resident in the household Households Adults

Proportion in addresses on the register Base=100% Proportion in addresses on the register Base=100% One 92.2% 1552* 93.2% 1527 Two 95.5% 2698 95.5% 5396 Three 97.4% 615 97.4% 1844 Four or more 95.7% 268 95.3% 1140 All households/adults 94.7% 5133 95.4% 9907*

* Includes a small number of households with no usually resident adults under census definitions.


Variation in the coverage of households

and adults

We now look at variation in the register’s coverage by selected characteristics of households and of individuals. The conclusions reached are broadly similar to those reported in the 1981 assessment of the electoral register but some new analyses are also presented.

Characteristics of the household

The register’s coverage of households was lowest (92.2%) among those comprising only one adult. Coverage increased with household size up to 97.4% for households with three usually resident adults but was slightly lower (95.6%) for adults but was slightly lower (95.6%) for households for four or more adults (Table 2). An improvement in coverage with increasing

Table 3 Households and adults on the register by region and country

Region/country Households Adults

Proportion in addresses on the register Base=100% Proportion in addresses on the register Base=100% North 95.1% 292 96.2% 537

Yorkshire & Humberside 97.3% 483 97.3% 939

North West 96.9% 563 97.2% 1045

East Midlands 93.5% 379 95.2% 744

West Midlands 96.5% 450 97.1% 889

East Anglia 96.3% 191 96.3% 387

South East (exc London) 93.9% 965 94.5% 1886

South West 93.6% 457 94.8% 909

London 91.2% 622 91.8% 1171

Inner London 87.3% 234 87.9% 426

Outer London 93.7% 388 94.0% 745

Regions exc. London

Metropolitan 97.1% 1012 97.6% 1930

Non-Metropolitan 94.4% 3047 95.2% 5939

England 94.7% 4402 95.3% 8505

Wales 93.0% 280 94.9% 534

England & Wales 94.6% 4682 95.3% 9039

Scotland 96.3% 451 97.1% 868


Kate Foster The electoral register as a sampling frame

Table 4 Households and adults in addresses on the register by housing tenure

Housing tenure Households Adults

Proportion in addresses on the register Base=100% Proportion in addresses on the register Base=100% Owned outright 97.9% 1214 98.5% 2234

Buying with a mortgage 95.7% 2168 96.3% 4710

Local authority rented 96.7% 1051 97.1% 1793

Other rented 82.5% 670 82.8% 1144

All households/adults 94.7% 5133* 95.4% 9907*

* Includes a few cases where hosing tenure was not known household size is to be expected since households comprising more adults will generally contain more electors and hence there is a greater chance that one elector is listed. The fact that this improvement in coverage did not extend to the largest households, comprising four or more adults, suggests that such households may differ from smaller ones in other significant respects.

The register’s coverage of households and adults varied according to region of residence, as shown in Table 3 which also gives summaries by country. At the national level the electoral register gave slightly better coverage of households, and also of adults, in Scotland than in England and Wales; 96.3% of households in Scotland were in addresses listed on the register compared with 94.6% in England and Wales.

Within England, London has the lowest coverage rate but this was markedly worse for households in Inner

London (87.3%) than in Outer London (93.7%). Outside London, coverage tended to be better in metropolitan areas: 97.1% of households in metropolitan areas (excluding London) were in listed addresses compared with 94.4% of households in non-metropolitan areas.

Table 4 shows coverage by housing tenure group. The strongest pattern to emerge is the lower rate of coverage for households in the privately rented sector, which includes those renting accommodation with their job or business as well as those living in furnished or unfurnished rented accommodation. Around 83% of households and of adults in this tenure group were in addresses listed on the register compared with more than 95% for each of the other major tenure groups – those who owned their accommodation outright, those buying on a mortgage and those living in local authority rented accommodation.

Table 5 Adults whose April address was on the register by how recently they had moved to the address

Whether had moved in 12 months

before census Adults on the register at April address Adults whose April address was on the register

Base = 100%

Had moved in previous 12 months 31.7% 76.1% 993

Of whom

- Had moved in 6 months since

qualifying date 5.6% 69.8% 415

- Had moved in 6 months before qualifying date

81.9% 93.0% 356

Had not moved in previous 12

Months 94.3% 97.6% 8914


Kate Foster The electoral register as a sampling frame

Table 6 Adults whose April address was on the register by age

Age Adults on the register at

April address Adults whose April address was on the register Base = 100% 17 71.1% 95.3% 154 18-19 81.6% 94.5% 346 20-24 67.7% 89.2% 964 25-29 75.9% 91.7% 1007 30-49 89.5% 95.2% 3503 50 and over 96.1% 98.3% 3930 All adults 88.0% 95.4% 9907*

* Includes a few individuals whose age was not known

Characteristics of individuals

We now turn to variation in the register’s coverage of adults by selected characteristics of the individuals involved. The tables also show, for reference, the proportion of adults in the different categories who were themselves listed on the register.

Table 5 looks at the likelihood of adults being listed on the register, or of living in an address that was listed, by whether and when they had moved in the previous year. As would be expected, only a very small proportion (5.6%) of those adults who had moved in the 6 months since the qualifying date for the register, that is between October10th and April 21st, were themselves

listed on the register. Those who had moved in the six months before the qualifying date were also less likely than non-movers to be listed; 81.9% were on the register compared with 94.3% of non-movers.

With respect to the use of the register as a sampling frame, about three quarters (76.1%) of those who had moved in the previous year, and 69.7% of those who had moved in the previous six months, were in addresses that were listed on the register compared with 97.6% of non-movers. There are a variety of reasons why the April addresses of movers might not have been listed: the addresses may not have existed or may have been unoccupied at the qualifying date or they may have been occupied by people who were either ineligible for inclusion on the register or were eligible but not listed.

As found in previous checks, adults under the age of 30 were not only less likely than older people to be listed themselves on the register but were also less likely to live at addresses that were listed. Table 6 shows that the proportion of individuals who were themselves listed was lowest for the 20-24 age group (67.7%), rising to

Table 7 Adults whose address was on the register by age and whether they had moved in the previous year

Age Proportion

of movers Adults whose April address was on the register Base = 100%

Not moved Moved All Not

moved Moved All 17 10% 98.4 (10) 95.3 139 (15) 154 18-19 16% 98.6 72.7 94.5 291 55 346 20-24 28% 94.3 76.1 89.2 694 270 964 25-29 23% 95.7 77.7 91.7 781 227 1007 30-49 9% 97.0 75.3 95.2 3205 298 3503 50 and over 3% 98.9 77.7 98.3 3803 127 3930 All adults 10% 97.6 76.1 95.4 8914* 993* 9907*


Kate Foster The electoral register as a sampling frame

75.9% among the 25-29 age group, but most of those who were themselves not listed lived in addresses that were listed 89.2% and 91.7% respectively of adults in these age groups lived in addresses that were listed on the register. Although only a relatively small proportion (71.1%) of 17 year olds were themselves listed on the register, presumably because it was the first year in which they were eligible for inclusion, they were no less likely than all adults to live at addresses that were listed.

Table 7 explores whether the lower coverage of younger age groups is related to their greater mobility. The second column of the table gives the proportion of adults in each age group who had moved in the previous 12 months and shows clearly that adults in the 20-24 age group (28%) and those aged 25-29 (23%) were most likely to have moved; 10% of all adults had moved in that period. As we have already seen, the coverage rate for those who had moved in the 12 months before the census was much lower than for non-movers (76.1% compared with 97.6%) and Table 7 shows the coverage rates by age for these two groups. There was little variation with age in the coverage of movers while coverage rates for non-movers were only slightly lower for adults in their twenties. Thus most of the under-representation of adults in their twenties can be explained by the group’s higher mobility although non-movers in these age groups were also less likely than non-movers in other age groups to be in listed addresses.

Finally, Table 8 looks at variation in the electoral register’s coverage of adults according to their ethnic group. Eligibility for inclusion on the register as an elector is defined by an individual’s citizenship rather than their ethnic group but it is, of course, likely that there is an association between these two attributes. The census question gave a choice of nine ethnic groups which, because of the relatively small number of ethnic minority households in the survey sample, have been grouped to four categories. These are White,

Black (including Black-Caribbean and Black-African), Indian (including Pakistani and Bangladeshi), and Other (including Chinese). Those classified as White by ethnic group were the most likely group to be living at addresses listed on the register but there was little difference in the coverage of those classified as Black, Indian, or Other. Some 95.9% of White adults were in addresses that were listed compared with 84%-88% of the other groups.


Movement and deterioration of the frame

The register will deteriorate as a frame of household addresses to the extent that new addresses become occupied over time; such addresses may either be new buildings or existing addresses which were unoccupied at the time that the list was drawn up. The coverage rate may also be affected if some addresses that were occupied at the time that the list was drawn up become unoccupied over time, and hence ineligible for inclusion in the sampling frame. This will result in an increase in the proportion of “deadwood” among listed addresses but such changes will only cause a deterioration in the coverage of the frame if listed addresses are more likely than unlisted addresses to be affected.

The main effect over time on the register’s coverage of occupied addresses is, therefore, the extent to which newly constructed and previously vacant addresses are (re-)occupied. It was not possible to pursue analyses of this type using the ERC data-set since that survey did not collect information on the history of the addresses occupied by recent movers. Some relevant information is available from the Department of Environment (DOE) which collects data on house constructions although not on the proportion of dwelling units which are temporarily vacated or re-occupied over a given period. The DOE data show that house construction resulted in a 0.9% increase in the dwelling stock over the year in which the 1991 register was in use as a sampling frame (April 1991 to March 1992).

Table 8 Adults whose April address was on the register by ethnic group

Ethnic group Adults on the register at

April address Adults whose April address was on the register Base = 100%

White 88.8% 95.9% 9380

Black 69.6% 88.3% 170

Indian 79.9% 87.1% 274

Other 64.8% 84.2% 82


Kate Foster The electoral register as a sampling frame Deterioration in the coverage of individuals

The ERC data does, however, enable us to make a rough estimate of the deterioration over time in the frame’s coverage of adults since it includes information on the previous address and date of move of adults who had moved into their census address during the

previous year. In general it would be expected that the movement of individuals during the lifetime of the register would have a much greater effect on the accuracy of the register (ie. whether adults are listed at their latest address) than on its coverage (ie. whether they are living at an address which is listed on the register at all).

The deterioration in the register’s coverage of adults over time was estimated with reference to those adults who had moved between the qualifying date for the register (in October 1990) and the date of the census (in April 1991). The frame’s coverage of adults will deteriorate to the extent that these adults moved from listed addresses to unlisted addresses, but this will be offset to the extent that adults moved from unlisted to listed addresses. An estimate of the net percentage

change in coverage can be obtained by expressing these two groups as a percentage of all adults in the sample. There were some problems in carrying out this analysis on the1991 ERC database due to the large number of cases in which either the date of moving or the October address was missing. Analysis on those cases for which complete information was available gave similar results to those obtained in the 1981 study, which suggested a deterioration of about 0.4% in the frame’s coverage of adults over a six month period. If movement is assumed to have continued at the same rate over the period in which the register was available for use, then this would imply a deterioration in the coverage of adults of about 1% over the 12 month period from April 1991. References

1. Todd J and Butcher B. Electoral registration in 1981. OPCS (1982)


The use of substitution in sampling

Dave Elliot



On a number of occasions recently, the issue has arisen of whether and how to use substitution to make up for a shortfall in sample numbers, from whatever cause. Many people in SSD have a knee-jerk reaction at the mere suggestion, believing it to be inextricably associated with quota sampling and massaging response rates and therefore not the sort of method that a good survey organisation should ever contemplate using. In this paper I take a different view – that substitution for non-respondents when used with proper controls may sometimes be a useful addition to the

survey sampler’s toolkit. However, on other occasions, especially its superficially more innocuous use in replacing ineligible units, the method may sometimes result in significant biases.

By substitution I mean the replacement of some specific unit in the set sample which fails to yield a usable response with another unit from the population. I shall illustrate this issue with four recent examples before moving on to generalities.


Four examples

2.1 Survey of the homeless

The first comes (indirectly) from the planned survey of Psychiatric Morbidity, as part of which OPCS plan to include the homeless. This mobile group is particularly problematic to sample for a number of obvious reasons and the planned design draws heavily on the lessons learnt in a pioneering survey of single homeless people undertaken by SCPR1. In discussing the details of the

methods used in sampling in short-stay hostels, Lynn describes how the establishments were sampled and then a random sample of beds was used in the selected hostels. Substitution was used at both stages – hostels that declined to co-operate were substituted (twice in some cases) and sampled beds that were unoccupied on the date of the survey were substituted. Occupants of sampled beds that refused the interview or who could not be located were however not substituted. Likewise respondents who were not eligible for the survey were not substituted.

In justification of this procedure, Lynn writes:

“These strict probability sampling methods were deemed necessary in order to ensure the accuracy of the survey results. Allowing for

refusals or non-contacts would have biased the sample towards more co-operative and more available respondents.”

In all cases the substitute use was randomly selected using methods similar to those used in selecting the initial sample.

Another part of the sample consisted of users of day centres. In this case people entering the centres were selected using a constant interval, ineligibles and non-respondents were noted but disregarded and the sampling was continued until the set sample size was achieved. Despite the statement that “No substitutes were selected to replace refusals or people who were screened out”, the procedure described can be interpreted as substitution by another name.

2.2 Sampling institutions

A second example concerns some advice I gave on sampling for a planned survey of children. The sample design has three stages: children within schools within a stratified sample of local authority areas. The plan is to select just one secondary and a number of primary schools in the selected areas and then seek the co-operation of the schools in selecting and interviewing children. If a secondary school declines to take part in the survey, I advised selecting a substitute but the project officer was not happy with this advice believing that the substitution method is fundamentally flawed. 2.3 Sequential sampling within institutions

My third example concerns a design we suggested in response to an invitation to tender for a survey of fees paid to private residential homes. The specification suggested a design in which 5 eligible residents were selected from each of a sample of institutions. Since eligibility could not always be easily determined prior to selection, the suggested method was to sample residents one at a time, determine their eligibility and continue sampling until the target number was achieved. This is an example of “sequential sampling” and mirrors closely the method of sampling visitors to day centres described in the first example.

It was particularly problematic in this case because the primary aim of the survey was to produce grossed estimates of total expenditure and there was no reliable independent measure of the size of the eligible population. Consequently it was essential to know and control the selection probabilities in order to gross up


Dave Elliot The use of substitution in sampling

the survey means. With the method suggested, these probabilities could not be determined nor even estimated without bias, which in turn would bias the grossed estimates. It may be useful to run through the argument for this assertion in a simple case before moving on to a more general discussion of the effects of substitution in different situations.

Suppose we need a sample of 5 eligible residents from a home with 10 residents, exactly 5 of whom are eligible. We are aiming to produce an estimate of the probability of sampling any eligible resident. The true value of this is 1, as we are taking a sample of 5 eligible residents from a population of only 5. So using the sequential sampling method suggested, each eligible resident must eventually be selected.

The sample could be achieved in a number of ways, the two most extreme of which are that the first 5 selected residents are all ineligible. In the first case we would estimate the selection probability as only 1/2, as we would assume that the non-selected residents are similar to those selected and are also eligible. In the second case we should estimate the selection probability as 1, as we will have actually selected all 10 residents and will know that only 5 are eligible, and that all of these are bound to be chosen in the sample. Obviously in no case should we estimate a probability greater than 1. Thus averaging over all the possible sequences would produce a mean value less than one and thus the estimated probability is biased downwards in this case.

The effect of this underestimation of the selection probabilities is that population totals will inevitably be overestimated (as they are obtained by dividing the sample totals by the erroneous probabilities). The extent of the bias depends on several factors.

i. The average number of eligible residents per institution – as this increases, the bias reduces. In this case we knew that the average number of residential places per home was just 17 but with a wide variation around this figure.

ii. The ineligibility rate. With zero ineligibility there is no bias. As the ineligibility rate increases, so does the bias. A previous feasibility study had suggested between 5% and 15% ineligibility overall (although this estimate was based on a purposive sample and is therefore not reliable) but ineligibility rates in different homes will inevitably vary greatly around the overall value.

iii. The eligible sample size per home. This bias problem is particularly serious for small sample sizes.

iv. Any positive correlation between survey variables

(average fees) and the number of eligible residents will tend to increase the bias. A negative correlation will reduce it.

Table 1 below shows the % bias in estimates of the total eligible population for varying population sizes and ineligibility rates for a fixed eligible sample size of 5 residents. In the absence of any correlation (see (iv) above) the same bias will occur in all survey estimates.

Table 1 Percentage bias in eligible population estimates by population size and ineligibility rate

Population in

Home Ineligibility Rate (%) % Bias

10 10 1.9 20 10 1.8 50 10 1.7 10 20 3.8 50 20 3.6 10 30 6.0 50 30 5.6 10 40 8.3 50 40 7.7 10 50 10.8 50 50 10.0 50 60 12.5 50 80 18.3 2.4 PAF used address procedure

Substitution is currently used within the PAF sampling system developed in OPCS. PAF addresses that have been selected for one OPCS survey are normally tagged and excluded from reselection in any other OPCS survey for a fixed time period (currently three years for most surveys). The way this is implemented is that such addresses are left on the file and so are liable to be reselected on another survey. When this occurs they are immediately substituted by a neighbouring address. The rationale is that as the first sample which marked them

as a used address was random, the substitutes that

happen to have such addresses as neighbours can also be regarded as a random sample of the population. In fact insofar as the ordering of addresses within the PAF places addresses with similar characteristics close together, then systematic sampling will act as a kind of implicit stratification and we could expect some

efficiency gains as a consequence. This stratification effect will be preserved by substituting a neighbouring address rather than simply boosting the initial sample size to compensate for these special “non-respondents”.


Dave Elliot The use of substitution in sampling


Substituting for non-respondents

Substitution is being used or considered for use in two quite different contexts in these four examples: replacing initial non-respondents (both refusals and non-contacts) and replacing selected units which are later discovered to have been ineligible for the survey. In both cases the main aim is identical – to recover the number of cases that have been lost from the sample and hence boost precision. However the effect is different in the two cases.

Under a simple model for non-response, all members of the population will either respond or fail to respond to a survey and different samples will pick up these two groups in different proportions by chance. If the mean for respondents differs from that for non-respondents, the normal survey estimate, excluding the non-respondents will be biased. If we substitute the initial non-respondents with a further random sample from the population, the mean of the combined sample of the two groups of respondents will be biased to exactly the same extent as the mean of the first group of respondents, but the sample size will undoubtedly be larger and so the estimate will be more precise. Clearly we could extend the substitution procedure by continuing to select and approach people until we achieve a set target of interviews. So long as the substitutes are randomly selected, the procedure clearly does not affect the bias in either direction.

Moving now to a slightly more realistic model of the non-response mechanism, suppose that the tendency to respond to the survey is different amongst different groups of people in the population and that once again the means of respondents and non-respondents differ within the groups. Then on average any random sample will select people from these groups in the proportion that they occur in the population and survey estimates will again be biased. If the initial sample is selected with equal probability, then the remaining population will have exactly the same means as the full population and so an additional sample taken to substitute for the initial non-respondents will not affect the bias of sample estimates.

If no substitutions are made and post-stratification by these groups is used to reduce the non-response bias, the effect is artificially to boost the size of the groups

with the lowest response rates by giving them larger weights. This will often reduce but not eliminate the bias. An alternative which can be used if the groups can be identified on the sampling frame would be to boost

the size of these groups directly by substituting the non-respondents. In this case the effect on the bias is identical to that post-stratification.

A problem with the approach occurs when the units are not being selected with equal probability – the most likely situation occurs when one is selecting aggregates such as institutions where these are often selected with probability proportional to size. In this case the residual population of institutions, having selected a sample, will have a different mean from the total population. The only way to deal satisfactorily with this situation in general is to take a larger sample than is needed initially and hold part of it in reserve to be used a s substitutes. In most cases, this should be the preferred method of implementing substitution even if the units are being selected with equal probabilities.

The design in example 2.1 seems inconsistent, as substitution was allowed for non-cooperating hostels but not for non-responding individuals and the basis of Lynn’s argument against substitution of individuals is unclear. However the apparent inconsistency might be due to concerns about the effect on interviewer motivation and response rates if substitution of individuals had been allowed. This is discussed further in section 5, below.


Substituting for ineligibles

In the example in 2.3 above substitution for ineligibles would have made the estimation of selection probabilities and hence of any survey estimates particularly problematic. As the discussion above makes clear, the bias is likely to be most serious when ineligibility rates are high and target sample sizes are small. However it does not disappear entirely in other cases whereas the most straightforward alternative to substitution, boosting the initial sample size in line with overall expected ineligibility, is unbiased. The bias arises because of the necessity of estimating the

different selection probabilities. Consequently substitution for ineligibles will only be unproblematic when the units involved are being selected with equal probabilities at that particular stage in the sampling or when the true ineligibility rate for the sampling unit is known. Although this may be true at the final stage of some multi-stage samples, the widespread (and highly desirable) use of pps sampling means that such examples will be rare and that consequently

substitution for ineligibles cannot be recommended in anything other than simple random samples.


Dave Elliot The use of substitution in sampling


Other considerations

Bias and precision should never, of course, be the sole criteria in determining sampling procedures. We should also consider their impact on OPCS staff and particularly on interviewers. Substitution or something very like it is widely used in quota and other non-random sampling methods and we must beware of giving interviewers (or anyone else involved with the survey) the impression that any informant is as good as the one we initially selected.

There are two separate risks involved here. Firs that interviewers will try less hard to secure a high response rate if they know that a substitute will always be provided – the discussion in Section 3 assumes that the methods we use to produce a high response rate will continue to be used on both the initial set sample members and the substitutes. If the result of permitting substitution is a reduction in response rates then we may be more prey to non-response biases of the argument used above on the absence of any change in non-response bias will fall. Secondly there is a risk that if we permit substitution in some cases, interviewers may make their own non-random substitutions in other cases to boost their apparent response rates.

There is also a third rather less tangible risk that by introducing a method which interviewers may associate with lower quality research, they might start to feel less confident in our own commitment to quality methods which could in turn affect their motivation to maintain high standards.



Substitution of non-respondents with randomly selected alternatives while not in any way reducing non-response bias, in principle does not increase it either. Its use would increase the sample size more efficiently than boosting the set sample since it would fix the final sample size. However the argument of the last selection on the potential psychological impact on interviewers of permitting some substitution when none has been allowed before I believe sways the argument against its

widespread introduction in SSD.

However in those situations where its use does not impinge on interviewers, for example in replacing non-cooperating institutions, it appears to offer some advantages. This is especially true when the substitute can be selected from the same population group as the initial non-respondent, when its effect is akin to post-stratification, ie it may reduce non-response bias. Substitution of ineligible units, although not affecting interviewers in any obvious way, may introduce biases in certain cases and should in general be avoided.


1. Lynn, P. Survey of single homeless people,

Technical Report. SCPR.


Characteristics of non-responding households on the Family

Expenditure Survey


Sarah Cheesbrough

Response rates on the FES are normally in the range of 68-72%. These figures are lower than on most SSD surveys due to the demand placed on respondents to complete a long questionnaire on income and expenditure and then keep a detailed diary of expenditure for the two weeks following the interview. Non-response is a problem in all sample surveys of the general population; on the FES the rather high rate of non-response means that there is a danger of under-representing important groups of spenders. Researchers need to seek ways of improving response and, at the same time, to develop methods of compensating for non-response after the event. It is the latter approach which is the subject of this paper.

Methods of weighting the data to compensate for non-response are being investigated. Every 10 years the comparison of FES data with Census records provides the most accurate analysis of the characteristics of non-respondents. The variables compared could then provide a source to re-weight for non-response. This and other methods of re-weighting survey data are described and discussed in a recent monograph in

SSD’s New Methodology Series2. One drawback of

using Census data is the time lag of up to 10 years between the Census measures and the survey data that are being re-weighted. This project was set up to consider a method of collecting information on non-responding households alongside the main survey. The emphasis of this exercise was to evaluate the feasibility of interviewers gathering information direct from non-responders on the doorstep.

Other studies of non-respondents have looked at the use of a ‘basic question’ that is asked of non-responding households in order to identify a key characteristic of these households that is relevant to the subject of the survey3. For the FES a pool of basic questions was considered in terms of their importance for re-weighting variables and how practical it would be to collect this information without affecting the main survey response rate or antagonising members of the public.

1. Method

The study of non-responding households was carried out for one fieldwork month (January 1993) in conjunction with the normal FES quota. A very short Response Characteristics Questionnaire (RCQ) was

completed for every household, whatever the outcome code. On return to the office the RCQs were keyed into a Blaise CADI program. This process allowed the comparison of responding and non-responding households to be completed separately from the normal keying and editing timetable of the FES.

The RCQ covered basic information about household members such as age and sec, relationships within the household and working status. Additionally there were questions on the type and tenure of the accommodation and on car ownership. A second section required the interviewer to record his/her observations of the ethnic group and main language spoken by the household. Finally, there were three questions at which the interviewers’ impressions of the household were entered, covering any ill health in the household that might affect response, wealth of the household and any other relevant information.

Interviewers were briefed on various methods of introducing the RCQ; a flexible approach in gaining the information was the key to not damaging normal FES response rates.



2.1 Reception by informants

Completed RCQs were available for a total of 772 eligible households. Some forms were not returned in time for analysis but the outcome codes for the RCQ exercise were in exact proportion to the figures for the total of 819 households that were eligible in January. The January 1993 FES response rate of 73.3% was below that of the previous January (74.6%), when an increased incentive payment for co-operating households had just been introduced; but it was above that of any of the previous six calendar months and slightly above the monthly average for the whole of 1992 (72.8%). It seems safe to conclude, then, that the exercise did not damage the FES response rate.

There was also no evidence, from interviewers’ comments or complaints to OPCS from members of the public, that people were antagonised by the small amount of extra probing for information from non-responders that the exercise required.


Sarah Cheesbrough Non-responding households on the FES

Table 1 Main source of information for non-responding households Main source

of answers Refusal before/during Interview

Refusal at

diary stage Non- Contacts Total

Member of Household 105 14 0 119 Neighbour etc 7 0 4 11 Interviewer Observation 58 0 8 66 Total Number of households 170 14 12 196

2.2 Introducing the Response Characteristics


Once it was clear that at least one member of the household had refused to co-operate with the survey there were two distinct methods used by interviewers to collect the information. This varied both according to the interviewer and the type of household. The first type of introduction explained briefly the exercise to the household member…

“People refuse for all types of reasons and we are interested in seeing whether we are losing similar groups of the population, so if you could just spare me a few moments I’d be very grateful if I could just ask..”

Alternatively it was often more appropriate to use indirect methods to obtain the answers to the RCQ questions. An interviewer reported..

“I never asked the questions as questions but tried to ask them as part of general conversation. Someone who is telling you about how disgraceful this big brother attitude is is hardly going to turn round and tell you how many bedrooms they have.”

Interviewers found it easier to gain co-operation where either only one member of the household had refused to participate or where it was only the income section of the survey that the household objected to.

Although interviewers were given the options of using the questionnaire or a small prompt card on the doorstep, the majority found it easier to memorise the questions so that there were no physical interruptions to the primary task of converting the household to a response.

2.3 Souurce of information

Interviewers were asked to report on the methods they had used to collect the information on non-respondents. In table 1 the methods used are shown against the type of non-response at the household. In the few cases on non-contact interviewers still sometimes managed to

obtain some information from neighbours. Encouragingly, 62% of households who refused to participate in any part of the survey did give basic information for the RCQ.

2.4 The quality of information

It was a particular concern of the project to evaluate whether the information obtained about non-responding households was of a high enough quality to compare with main FES data. The refusing cases were examined and results are reported according to the method by which the information was obtained.

Questions directed to a member of the household

In general, if the co-operation of a member of the household was gained, the information was very accurate. As shown in table 2, basic demographic information was readily given, whilst information about the accommodation and vehicles was harder to obtain. Table 2 Proportions of refusing households where no information available

Question % of


Sex of HOH 1

Age of HOH (no exact age or band) 8

Marital status of HOH 1

Working status of HOH4

Sex of other household members 1

Age of other household members 8

(no exact age of band)

Marital status of other household

members 2

Working status of other household

members 5

Number of bedrooms in accommodation 15

Household tenure 16

Car or van available 16

Age of vehicle 19


Sarah Cheesbrough Non-responding households on the FES

Two questions on the RCQ required the interviewer to observe the ethnic group and the main language spoken by members of the household. This did not present any problems for interviewers but obviously the results are the opinion of the interviewer rather than the


Interviewers’ impressions

The interviewers were asked to give their impressions of the health and wealth of the household. Although the questions were nearly always answered many interviewers commented on how their experience had shown how misleading initial impressions could be.


Comparison of responding and

non-responding households

In the following analysis only non-responding households who refused directly to the interviewer, either before or during any interviewing or later at the diary stage, are included.

3.1 Information about individuals

Previous studies comparing responding and non-responding households have matched addresses selected for the FES sample to Census records4. The

sample size for the response characteristics exercise would be too small for any comparable significance

tests of differences between the responding and non-responding groups.

However, using the variables found to be significant in the 1981 Census comparison as a basis, some distributions for particular questions were compared. With the emphasis on household non-response, analysis concentrated on information about the head of household (HOH).

Age of HOH

The 1981 Census comparison found that response declined with increasing age of HOH: young adults in general might be hard to contact or reluctant to co-operate but response was high where a young adult was actually HOH.

For this study in households where the age of the HOH was established, it was apparent that a larger proportion of non-respondents fell into older age brackets. Overall the mean age of HOH for non-responding households was 54 years old (n=127) compared to 50 years old (n=566) for responding households.

Figure 1 show the distribution of the age of the HOH in the responding and non-responding households. The graph show that age groups more likely to have


Sarah Cheesbrough Non-responding households on the FES

Table 3 Economic activity of HOH


households Responding households 1991 FES

% % % Economically active 60 63 62 of which.. Employed 45 48 48 Self-Employed 11 7 9 Unemployed 5 7 5 Economically inactive 40 37 38 Total 100 100 100 Base = 100% 169 566 7056

dependant children form a greater proportion of responding households whilst the non-responding group contains a larger proportion of households with an older HOH.

The 1981 Census comparison found a positive association between households with dependant children and survey response. The results from the RCQ confirmed this finding. Whilst 33% (n-566) of responding households contained one or two adults with at least one child under 16 years old, this was the case for only 19% (n=185) of non-responding households.

Employment status

A larger proportion of heads of household in the non-responding group were economically inactive. Interviewers were very successful in determining the employment status of those who were working. Many reported that the nature of the non-respondent’s job is often mentioned in any explanation for refusal. If a person was not working it was more difficult to clarify whether they were economically active or not.

Not surprisingly, the non-responding group contained a larger proportion of self-employed people. In table 3 results are shown beside those for the FES in 19915.

The lower level of economic activity for non-responding households is consistent with a higher proportion of HOHs that were of retirement age. The lower proportion of unemployed HOHs in the non-responding group could also be a result of the higher average age of the group. However, there were an additional 8 non-responding households where it was not clear whether the HOH was unemployed or economically inactive.

3.2 Information about the household Accommodation type

Interviewers were able to observe the type of accommodation for all refusing households and then ask some non-respondents how may bedrooms there were within the household. With this small sample there were no clear differences between the groups.

Table 4 Tenure of responding and non responding households

Non-responding Responding

% %

Owned, including with a mortgage 61 67

Rented from a Local Authority,

New town, Housing Association etc. 27 23

Rented from Private Landlord 12 9

Total 100 100


Sarah Cheesbrough Non-responding households on the FES

Table 5 Number of vehicles available to household


Responding Responding 1991 FES Responding


% % %

No car or van 36 30 32

One car or van 40 50 45

Two or more cars or vans 23 19 23

Total 100 100 100

Base 156 566 7056


Interviewers were successful in ascertaining tenure at 84% of refusing households. A larger proportion of responding than non-responding households in January were owner occupiers.


The 1981 Census Comparison found that response was lowest amongst multi-car households, possibly reflecting non-response among those with high income. Information was available for nearly 85% of refusing households. However interviewers felt this was the most unreliable question unless it could be asked directly of household members. Table 5 shows the RCQ figures beside FES 1991 results.

Non-responding households do appear more likely to have two or more cars available and this group also tend to be in a higher income bracket. Most notably, 80% of HOHs from this group are over 40, 28% of HOHs are self-employed and 87% of households are owner-occupiers. At the other end of the scale the higher proportion of non-responding households without a car reflects the larger proportion of elderly people in this group.

Recording the age of the car is not normal practice for the FES. For this study interviewers were asked to collect this additional detail. 45% of responding households had cars which were less than 5 years old compared to 41% of non-responding households. However there was more variation in age of car for the non-responding groups which seemed to reflect the proportions of types of household in the group; whilst pensioner households tended to have older cars, the higher income refusing households tended to have very recently registered cars.

Ethnic group and main language of household

Interviewers were required to record their impressions of the ethnic group and main language of the household. Non-responding households consisted of a slightly higher proportion of ethnic minority households (5% compared to 4% responding, n=181 and 565 respectively). Two of the non-responding households had no members who spoke English compared to only two of the all the responding households. Information on this group of non-responders was more limited than average and inconclusive.

Health and wealth of household

As mentioned earlier, interviewers often found ill health discussed when reasons were given for refusing the survey. Interviewers noted that at 25% (n=172) of non-responding there was some or much ill health compared to 17% (n=564) of responding households. However, this information was very closely related to the fact that non-respondents included many elderly households.

A scale of wealth was given to rank the household approximately from 1 as the every poorest to 6 as the very richest household. Although interviewers were instructed to record their initial impressions of all households before any interviewing, RCQs were, in general, completed after the main interview and could well be coloured by more detailed knowledge that would not be possible with non-respondents. This information has therefore not been used for comparisons. In future tests, the use of income bands on a show card for use on the doorstep could be considered.


Sarah Cheesbrough Non-responding households on the FES



4.1 Feasibility of the fieldwork

Despite some initial reservations, the fieldwork was very successful. The interviewers’ reports made it clear that any exercise of this nature should use a very small number of key questions that are easy to memorise. This allows the interviewer to adapt to the situation on the doorstep and prevents distraction from the main task of persuading the household to co-operate with the survey.

4.2 Distinguishing variables

Questions about the household

Where the interviewer obtained the information from a household member the quality of the information was very high. Interviewers often commented that obtaining ‘household grid’ details forms a natural part of their doorstep introduction. The information on age, household composition and occupation in particular distinguished between responders and non-responders. Questions on accommodation and car ownership were also successfully completed and provided useful information on non-responders’ characteristics.

Interviewers’ observations

Recording the ethnic group of the household presented no problems and there appeared to be some slight difference in response rate that could interact with other variables.

Interviewers’ impressions

Long term ill health in the household was relatively easy to record but was highly correlated with the age of the head of household. Overall impression of wealth presented the greatest difficulty to interviewers and did not clearly distinguish between responders and non-responders.

Future work using records from the 1991 Census should provide the accurate information on variables which distinguish between responding and non-responding households.

4.3 Proposals for future work

The following recommendations are made:-

Partial interviews

The FES should consider accepting information from partial interviews with:-

(a) HOH and partner from the household, when

non-dependant children or other household members refuse to con-operate;

(b) elderly people who consent to the interview but fail to complete the diary.

Basic questions

In a future repeat of this exercise the following basic questions should be retained:

Household level

- type of accommodation

- number of rooms (if converted property)

- number of bedrooms

- tenure

- number of cars - ethnic group

- main language of household Individual level - age - sex - marital status - relationship to HOH - employment status

- personal outcome on FES

Basic questions would be useful in targeting:- (a) elderly households put off the full FES by its length;

(b) higher-income households, often with HOH in late

middle age, who have reservations about the financial nature of the survey or object to the invasion of privacy;

(c) self-employed persons in all age groups who object to

questions that probe for details of financial arrangements.

People who refuse for more general reasons, such as dislike of all surveys or the government, may also give some valuable information.

Implementation using computer assisted interviewing

The practicalities of collecting non-response information must be considered in the context of the transferring of the FES to computer assisted interviewing (CAI) in 1994.

Currently, on CAI trials, the interviewers record household and personal outcome records in a section of the interview program known as the administration block. This is separate from the main interview and always completed when the interviewer is at home. If a household has refused the interviewer is required to give any reasons for refusal at both the household and individual level. This provides a greater level of detail than is currently recorded on the calls and outcome records for the paper survey. A non-response exercise carried out using CAI could include some basic questions, built into the administration block, that would appear if a refusal code is used.


Sarah Cheesbrough Non-responding households on the FES

1. This article is based on a paper sent to the Central

Statistical Office in March 1993. It forms part of ongoing investigations into the use of re-weighting techniques to compensate for unit non-response on the Family Expenditure Survey (FES).

2. Elliot, D. (1991). Weighting expenditure and income

estimates from the UK Family Expenditure Survey to

compensate for non-response. Survey Methodology

Bulletin, No. 28, pp.45-54.

3. Kersten, HMP. & Bethlehem, JG. (1984). Exploring and

reducing non-response bias by asking the basic question.

Statistical Journal of the UN. ECE2, pp369-380.

4. Redpath, R (1986). Family Expenditure Survey: a second

study of differential response, comparing Census characteristics of FES respondents and non-respondents.

Statistical News. Vol. 72, pp.13-16.

5. Central Statistical Office (1992). Family Spending, a report on the 1991 Family Expenditure Survey. London: HMSO.


Sarah Cheesbrough Non-responding households on the FES

The use of standardisation in survey analysis

Kate Foster



Survey analysts are often interested in comparing the rate for some even or characteristic aceoss different subgroups of a population or for the same population over time. Comparison of the overall rates or proportions is not a problem if the populations are similar with respect to factors associated with the measure concerned, such as age, sex or marital status. When this is not the case, a direct comparison of overall rates may be misleading.

One commonly used solution is to present three-way (or more) tables to control for other confounding variables which are associated with the measure of interest and also with the main independent variable. An example would be to take account of age when looking at the relationship between cigarette smoking and social class by tabulating prevalence of cigarette smoking by social class for a number of different age groups. The resulting tables may, however, be difficult to interpret and suffer from small cell sizes. In addition, they do not provide a single summary measure which is suitable for comparison between groups.

A more statistically sophisticated solution is to model the data which, for categorical survey data, would normally involve the use of log-linear modelling. However, this approach may not be the most appropriate where the requirement is to produce simple summary measures across a large number of analyses that are suitable for tabulation and can be readily interpreted.

An alternative approach to the problem which provides output in recognisable tabular format is standardisation. The technique allows direct comparison between rates or proportions measured for populations which differ in a characteristic which is known to affect the rate being measured by, in effect, holding the confounding variable constant. In the example mentioned above, it would provide a measure of cigarette smoking prevalence for each social class group having adjusted for the effects of age.

This paper gives some background to the use of standardisation in Social Survey Division (SSD) and presents the results of recent work on the estimation of standard errors for age-standardised ratios.


Methods of standardisation

Standardisation has most commonly been used within SSD in relation to health indicators, which often show a strong relationship with age. The technique provides a way of comparing health indicators between different subgroups of the sample after making allowance for differences in age structure between the groups and provides a single numerical summary of the age-specific rates for each subgroup of interest. There are two commonly-used ways of deriving a summary of age-specific rates, known as direct and indirect standardisation. These methods are illustrated below with some comments about their limitations and advantages. Examples and commentary can also be found in Marsh (1988)1 and Fleiss (1981)2.

Standardisation, by whichever method, is not a substitute for comparison of the age-specific rates for the subgroups of interest. Even when the technique is used it is advisable to look at the relevant three-way table and, in particular, at whether the relationship between the health measure and the characteristic of interest varies with age. If there are interactions in the data, for example where the percentages of people with the characteristic of interest in two subgroups are lower for some age bands but higher for others, then standardisation will tend to mask these differences. In these circumstances the results of the standardisation may be misleading and should be treated with caution.

2.1 Direct standardisation

Direct standardisation is widely used in medical statistics and the output is normally a rate (proportion) for each subgroup. The method applies the observed age-specific rates for each subgroup to a standard population distribution, often that of the total sample and the standardised rate is obtained by summing these values across all strata (age groups) in the subgroup. This is given by the equation


Standardised rate = ∑ rijwi


Where rij is the observed rate (proportion) for the cell

defined by the ith stratum and jth subgroup, and wi is

the number of cases in the total stratum (age group) as a proportion of the total sample

The use of direct standardisation is illustrated by the example at Table 1. The example uses data from the


Sarah Cheesbrough Non-responding households on the FES

Table 1 Example of direct standardisation.

Reported longstanding illness by marital status, age and sex Age group

{Strata (i)}

Marital status groups Subgroups (j)}

Total sample {Standard Population}

Marital status group Subgroups (j)} Married, Cohabiting Single Widowed, Divorced, Separated Married, Cohabiting Single Widowed, Divorced, Separated

Percentage reporting longstanding

Illness (rij) Proportion in Stratum (wi) Expected rate (∑rijwwii)) Men 16-44 23.5 21.0 32.1 0.52 (12.2) (10.9) (16.7) 45-64 41.7 47.3 43.0 0.30 (12.5) (14.2) (12.9) 65 or over 60.1 57.1 66.3 0.18 (10.8) (10.3) (11.9) All men 37.0 25.2 51.6 1.00 35.5 35.4 41.5 Women 16-44 21.8 24.8 30.0 0.50 (10.9) (12.4) (15.0) 45-64 38.8 43.0 50.7 0.28 (10.9) (12.0) (14.2) 65 or over 55.2 64.0 61.1 0.22 (12.1) (14.1) (13.4) All women 32.4 30.0 52.3 1.00 33.9 38.5 42.6

1991/92 General Household Survey on the proportion of men and women in each of three marital status groups with a long-standing illness or disability, which is an indicator of chronic sickness. For simplicity the example uses only three age bands, but the technique can readily be applied to more strata. The age-standardised proportion of chronically sick in each marital status category is the proportion which would result if a standard population (given here by the total sample of men or women) were to experience the age-specific rates observed by that subgroup.

From the observed percentages in Table 1 we see that men and women who were widowed, divorced or separated were much more likely than those in other groups to have reported a long-standing illness. Also, among men only, the observed rate was lower for those who were single than for the married or cohabiting group. These results may be misleading since there is a strong association between marital status and age as well as between the incidence of chronic sickness and age: single people are on average younger than others, while those who are widowed, divorced or separated are on average older than the married or cohabiting group.

The direct standardised rates for the marital status groups are shown in the right hand part of Table 1. As would be expected, once age has been taken into account there was less variation between the subgroups in the percentage chronically sick. The directly

standardised rates were, however, still higher for men and women who were widowed, divorced or separated indicating that this group had higher rates of chronic sickness even after allowing for their age distribution. There is also some evidence that married or cohabiting women reported lower rates of chronic sickness than would be expected on the basis of their age distribution. These results can be seen to be consistent with the observed age-specific rates for the subgroups. For most of the age-sex strata observed rates of long-standing illness were higher among informants who were widowed, divorced or separated, and married women in each age band had lower observed rates of chronic sickness than single women.

Although direct standardisation is initially attractive because the resulting statistic is a rate (proportion), the method has more rigorous data demands than does indirect standardisation. The major requirement is that the age-specific rate for the measure under investigation must be known for each population subgroup being considered. In most survey contexts this level of detail is available in the data but the sample size for each cell in the cross-tabulation may be too small to give reliable measures and hence the resulting standardised rates may be unstable.

The other requirement in order to calculate a directly standardised rate is that the age structure of the standard population is known. In cross-sectional surveys the age distribution for the total sample (of men