Part 1: Sampling Errors
6.3 Detecting measurement error
6.3.1 Comparison at aggregate level with external data sources
Survey estimates may be compared with aggregate figures from another source, such as another survey, an administrative source or trade organisation data. Such a comparison may reveal bias from measurement error, although it may be difficult to disentangle measurement error bias from nonresponse bias and it may be difficult to determine to what extent the
difference between estimates is attributable to error in the survey of interest or error in the other data source.
Case Study 1. Comparison of mail survey with interview survey
In the 1980s Statistics Sweden conducted an annual survey on forestry (logging) among private owners (as opposed to large corporations, the government or the Church). The private owners make up about 50% of all forestry in Sweden. This survey was done by a conventional mail questionnaire design and involved a sample of 7,000 such owners (owning less than 1,000 hectares each). The aim was to estimate at the national level, among other quantities, the total volumes (in million cubic meters) logged by final felling (that is a whole area is cut down), thinning (selected trees only) and miscellaneous felling (in ditches, under power lines etc). Because of concerns about quality, it was decided in 1988 to divide the survey into two parts on an experimental basis: a mail questionnaire was distributed to about 4,500 owners while about 2,500 owners were included in an interview survey, about 100 local forestry experts performing the interviews. The results are given in the following table.
π-weighted estimate of
proportion of owners doing activity
Estimated volume (million cubic meters)
Mail Interview Mail Interview
Final felling: 20% 21% 17.6 19.0
Thinning: 32% 39% 9.7 11.3
Miscellaneous: 18% 38% 1.9 3.7
Total logging: 29.2 34.0
The estimated volume for the mail survey tends to be less than for the interview survey, especially for the miscellaneous category. This may be explained by the much greater numbers of zeros (owners not undertaking the activity) in the mail survey, especially for the miscellaneous category. Many of these zeros represent either measurement error (the failure to report actual activity) or item nonresponse (a blank return where an actual return may be difficult). Final felling is easy to identify and quantify (for example lots of paperwork is involved to get a permit), while thinning and particularly miscellaneous logging are harder to identify, quantify and remember. It was concluded that the quality of the results from the mail questionnaire was unacceptable and the survey was changed to an interviewer mode from 1989.
6.3.2 Comparison at unit level with external data sources
A more useful comparison is possible if the respondent records can be matched to records from another source such as a tax register, containing related variables. Such comparisons
might only be made with a subset of sample records, for example the responses of just the businesses in the completely enumerated stratum might be compared with information in publicly available annual reports. Gross errors might be detected in values which do not follow the normal relationship with variables in the external source. Differences in definitions between the two sources, in particular differences in reference periods, will often complicate such comparisons, however. It may also be that the external source, for example an audited set of company accounts, only becomes available after the survey estimates have been published, so that measurement error estimates can only be made retrospectively.
Case Study 2. Comparison of questionnaire responses with values on VAT register The survey on ‘domestic trade in the service sector’ at Statistics Sweden aims to estimate quarterly turnover by industry (4 digit NACE) in the service sector. A probability sample of legal units is drawn from the Business Register (BR) and a questionnaire is mailed to these units.
In 1997 a study was made to find out whether the mail questionnaire could be replaced by data taken directly from the VAT register. Such a shift would reduce costs considerably, for Statistics Sweden as well as for respondents, and at the same time make it possible to shift from a sample of about 4,500 to a total enumeration of about 110,000 enterprises.
Two estimates of turnover by 4-digit NACE were compared. The first was a π-weighted
estimate from the original survey observations. The second was a modified estimate, with the questionnaire observations replaced by the corresponding VAT observations (except in the take-all strata).
Differences between the estimates were reasonably small in most NACE groups compared with the random variation in the survey. However, in some NACE groups the differences were much larger than one would expect from the random variation. For 114 legal units the
π-weighted difference between questionnaire and VAT data exceeded 50 million SEK. About
one third (37) of these were selected for a telephone interview to find out the reasons for the discrepancies. For practical reasons the interviews had to be done during the holiday season in the summer, and only 21 interviews were completed. Nevertheless, a lot was learned from these interviews:
1) In 10 cases (legal units) the large discrepancies were due to the choice of unit. These
legal units turned out to be part of multi-legal unit enterprises. The turnover in the sample cases may be reported to the VAT register from another legal unit within the same multi-unit enterprise, and this VAT-reporting unit may even be an out-of-scope unit, for instance a manufacturing unit. In some cases the selected unit reported zero turnover while the corresponding VAT turnover was substantial. In some cases it was agreed (with the respondent) that the questionnaire turnover was indeed the correct one while the VAT turned out to be the correct figure in other cases.
2) In 3 cases the respondents had by mistake given the wrong numbers (turnover) on the
questionnaire. This had been corrected during the discussion, making questionnaire- and VAT data coincide.
3) Two cases were due to data entry errors made by Statistics Sweden but not detected by editing.
4) Two cases were due to errors in NACE classification in the BR. The respondents had
reported ‘Manufacturing’ instead of the service sector code found in the BR. These units had been classified as over-coverage in the survey and given value of turnover equal to zero.
5) One case, a wholesale trade agent (NACE = 51.1) had included as turnover the whole
traded turnover instead of only its own turnover as requested in the questionnaire.
6) Two cases were traced to misunderstanding of the questionnaire.
7) One case was due to reference period problems. This enterprise was involved in a 6
month long project. The VAT payments were divided into six monthly equal sums while actual payment took place on one or two occasions. It so happened that the ‘questionnaire-turnover’ was attributed to another quarter than the one in the study while the VAT data seemed to be very consistent from month to month.
It is clear that such comparisons with external sources can reveal many sources of error in addition to measurement error. In particular the most striking additional type of error in this study consists of frame errors arising from problems in delineating units. Such comparisons may also suggest methods for improving quality. This study suggests, for example, that VAT data may be useful for editing. A large difference between questionnaire responses and VAT turnover would be a good reason for a telephone contact.
6.3.3 Internal comparison and editing
A simpler approach is to examine the internal consistency of the values reported in the survey as part of the usual editing process (Hidiroglou & Berthelot, 1986; Pierzchala 1990; Granquist and Kovar, 1997). Thus, one may check accounting identities, for example where components sum to a total, and inequalities, for example that some variables are positive. Comparisons may be made with values reported in previous surveys by the same respondent. For example, a variable with month to month variation normally not in excess of 5%, which suddenly changes by 1000% is a likely case of gross measurement error. See chapter 7, on processing errors, for further discussion.
6.3.4 Follow-up
When edit constraints are failed, there are generally two options. First, the reported values may be modified so that they do obey the constraints, for example following the procedure of Fellegi and Holt (1976). Second, the respondent may be followed up in order to clarify the reason for the failed edit constraint and hence to establish, if necessary, a value with reduced measurement error. Such follow-up may be expected to provide more information about the nature and size of the measurement error. It may be selective, that is only values considered likely to have a non-negligible effect on the statistical estimates might be followed up.
Follow-up can range from a simple telephone call to check a single value through to a more detailed reinterview, aimed at establishing the sources of information used as well as the
respondents’ understanding of questions and instructions. Dippo, Chun & Sander (1995, p.295) refer to this as a response analysis survey. Such a survey may reveal measurement errors directly, for example through misunderstandings displayed, or may suggest subgroups for which the quality of the data may be worst. For example, respondents might be asked whether their responses were based on memory or involved reference to appropriate information sources. The proportion of respondents using memory might be taken as an indicator of poor data quality and might be compared between different subgroups of businesses.
Reinterviews appear to be relatively uncommon in European business surveys. An illustration of response variability is provided by a study of Friberg (1992) in which reinterviews arose by accident! He reports on a Statistics Sweden survey on environmental investments and costs in Sweden. A reminder was distributed at some point to those enterprises that had not yet responded. Five enterprises among those receiving the reminder had in fact sent in their questionnaires just one of two days before. It so happened in those five cases that a different person at the enterprise than the one who had already responded (and then possibly gone on holiday - this happened in the summer) filled in the questionnaire. This made it possible for Statistics Sweden to compare the two versions from each of the five enterprises. Very large differences were found between the responses of the pairs of respondents from each of the five enterprises. This seems to reflect the large degree of error in measuring a variable such as environmental investment, which is difficult to define and quantify.
6.3.5 Embedded experiments and observational data
Randomised experimental designs may, in principle, be used to detect measurement error bias by comparing alternative measuring instruments (Biemer & Fecso, 1995, p.268). For example, different form designs or different modes (for example mail versus telephone) might be assigned randomly between different respondents. See Case Study 1 in Section 6.3.1 for an example.
Randomised assignment may often be difficult to implement in practice. For example, although an agency may request that a form be answered by a particular category of staff, it may be difficult in practice to enforce this. It might therefore be difficult to implement a randomised experiment comparing the effect of using, for example, management versus clerical staff as respondents. It may, however, be possible to record observational data on the category of staff responding in an ongoing survey. The fact that the allocation of staff is not experimentally assigned makes the interpretation of differences in the survey outcomes between different categories of staff more difficult, because of potential confounding with other variables, but not impossible (Biemer & Fecso, 1995, p. 269).