Part 1: Sampling Errors
6.1 Nature of measurement error
6.1.1 True values
Measurement error is defined relative to the value of a given variable (that is a question) reported by a given respondent. The basic assumption is that there exists a true value of this variable for this unit, so that there is no ambiguity in the definition of the variable. Given this assumption, the measurement error is defined as the difference between the reported value and the true value. This is not an operational definition, of course. Even if it is accepted that there can be no ambiguity in the definition of the true value, there may be no operational way for an agency to obtain the true value with certainty. Instead, various indirect methods may be used to detect measurement errors as described in this chapter.
6.1.2 Sources of measurement error
In this report measurement errors will be equated with ‘response errors’, that is errors arising because the respondent fails for some reason to provide the true value desired. Errors on the part of the data collection agency, for example falsely transcribing values from questionnaires or misrecording values reported by telephone, will be treated as processing errors (see chapter 7). Errors in auxiliary variables recorded on a business register will, furthermore, be treated as frame errors (see chapter 5). These errors may be attributable simply to out-of-date information on register variables but may also arise for similar reasons to response errors, that is because a business fails for some reason to provide the true value of the variable required.
Response errors may arise from three sources. True value unknown or difficult to obtain
Sometimes the business may keep information according to different definitions, for example many businesses maintain accounts according to different financial years and it may be difficult to report values with respect to a different time period, for example a calendar year, requested by the agency. In such circumstances the business may report the value of the variable according to the closest definition available, for example the business’s financial year.
Sometimes the business may not keep the information required, for example both the ‘value’ and ‘quantity’ of gas or electricity purchased, as asked in ONS’s Annual Business Inquiry. Alternatively, the business may be unwilling to go to the effort required to retrieve the information. In such cases the value may be guessed or the question left blank. The occurrence of such measurement errors may therefore be indicated by high rates of item nonresponse on a question.
Such errors may have a particular effect on ‘other’ categories. For example, the ONS’s ABI requires that expenditures in different areas should sum to the total expenditure reported. One
of the last expenditure questions is for ‘other services purchased’. It is possible that this is used as a ‘balancing box’, according to which businesses simply work out what expenditure for the year has not already been accounted for.
Misunderstanding of question or other slips
Instructions on questionnaires may be misunderstood or simply not read. A common example of an error is the reporting of a value in the wrong units. For example, a question may ask for a value to be reported in units of thousands of pounds. A true value of £2,488,500 should therefore be reported as 2,489. A business may, however, erroneously report the figure as 2,488,500. Some forms include boxes within which digits should be recorded for scanning and businesses may complete these wrongly, for example writing ‘NIL’ through the boxes. The questions themselves may also be fundamentally misunderstood. For example, a construction firm might record the value of ‘retail turnover’ on the ABI as the firm’s expenditure on construction of retail outlets, whereas the true value should be zero.
Errors in information used by the respondent
Finally, it is possible that the information used by the respondent, for example from a business information system, is itself subject to error.
6.1.3 Types and models of measurement error Four kinds of measurement error may be distinguished. Continuous variables: major occasional errors
Examples of major occasional errors are the occasional reporting of values in the wrong units (for example in single currency units rather than 1000 currency units) or the occasional recording of expenditure under the wrong heading (so that expenditure under one heading is greatly reduced and expenditure under another heading is greatly increased). These errors will often be identifiable under close inspection as outliers (Lee, 1995). These are outliers which arise from error rather than outliers which are unusual but correct. If possible they should be detected and treated as part of the editing process (see section 6.3.3).
A stochastic model for such error in a measured variable Y would be that Y equals the true
value with probability 1-ε and is drawn from a very different distribution with probability ε,
where ε is a small number, for example 0.01.
Continuous variables: misreporting of zeros
A specific instance of major error is the misreporting of zeros. One example is the setting above where expenditure is recorded under the wrong heading so that expenditure under the correct heading may be erroneously zero whereas expenditure under another heading may be erroneously non-zero. Such errors may cancel out under aggregation of headings.
Other erroneous reportings of zero may arise when information is unavailable or difficult to obtain, a question is left blank and then imputed as zero. In this case, measurement error is closely related to item nonresponse (see Case Study 1 in Section 6.3.1).
Continuous variables: other error
Guessing of values and errors due to minor differences in reference periods might be expected not to lead to major errors but rather to errors which might be represented by the ‘classical error model’
Y = +y e (6.1)
where Y is the reported value, y the true value and e is the measurement error drawn from a continuous probability distribution. Sometimes the distribution of the errors might reasonably be supposed to be centred about zero, for example under honest guessing by an experienced reporter, so that the measurement error may be viewed as approximately unbiased. Sometimes, bias may be expected.
Categorical variables: misclassification
Measurement error in categorical variables involves misclassification. The basic model in this
case involves a misclassification matrix with elements qij, the probability of classifying
category i as category j. The diagonal elements of this matrix should be close to one and the off-diagonal elements small.