Types of nonresponse - Sampling Errors - Model Quality Report in Business Statistics

Part 1: Sampling Errors

8.2 Types of nonresponse

8.2.1 Patterns of missing data

Unit nonresponse arises when a unit fails to provide any data for a given round of a survey. There are two broad reasons for such nonresponse:

(i) noncontact – the form may not reach an appropriate respondent for various reasons, for example change of address, failure of the postal system, failure to forward from within the business;

(ii) refusal – the form does reach an appropriate respondent but the respondent does not

return the form.

Unit nonresponders may be classified into two types according to the information available about the unit to the agency:

units which have never previously responded (these will consist primarily of smaller units which are sampled afresh at each survey occasion, or those newly recruited to the sample in rotating schemes) – for such units the only information available may be that recorded on the frame;

units which have previously responded (wave nonresponse) – these units will usually consist either of completely enumerated units which are sampled on every occasion or else larger units which are sampled over several occasions in a rotation design – patterns of nonresponse over the rounds of the survey might be denoted XXOXOOXX, for example, where X denotes response and O nonresponse and the most recent round of the survey is on the right.

Item nonresponse arises when a form is returned from the unit but responses to some questions are missing. Such missing data may arise, for example, because questions were overlooked or because the information required to answer the question was not available to the respondent. A particular problem in business surveys is the separation of item nonresponse from zeros. Respondents will often leave blank answers to questions about amounts, for example the value of production in a certain category, when the answer is zero.

8.2.2 Missing data mechanisms

In order to assess the errors which may arise from nonresponse it is necessary to establish a statistical framework within which the mechanism of nonresponse may be considered. Formally, nonresponse may be represented by 0-1 response indicator variables of the form

î í ì = se) (nonrespon missing is value if 0 (response) recorded is value if 1 R

Unit nonresponse may be represented by a series of indicator variables Rk, defined for each

unit k in the sample. This definition may be extended in various ways. To allow for repeated

rounds of a survey, one may define variables Rtk for occasions t and units k. Item nonresponse

may be represented by a series of response indicators, one for each variable for which missing values may occur. There is a number of alternative statistical frameworks within which the nonresponse mechanism may be represented. See Lessler & Kalsbeek (1992, Chapter 7) for a literature review.

The deterministic approach assumes that response indicator variables Rk are defined for all

units k in the population and that their values are fixed. Thus, in the case of unit nonresponse, it is supposed that the population is divided into two ‘strata’: the respondents who always respond and the nonrespondents who never respond. The nature of the errors arising from nonresponse will depend on how well the estimation methods used to handle nonresponse compensate for differences between these two strata.

The stochastic approach treats the response indicator variables Rk as outcomes of random

variables. A number of different stochastic frameworks is possible. In the case of unit

nonresponse, one approach is to treat the set of respondents (those sample units for which Rk

= 1) as a random subsample of the selected sample obtained through a process analogous to two-phase sampling (Särndal & Swensson, 1987). The nature of errors arising from nonresponse then depends on assumptions about how the subsampling occurs.

In the remainder of this report a stochastic approach is adopted, corresponding to modern

statistical modelling. Both the response indicators Rk and the survey variables yk are

conceived of as outcomes of random variables and assumptions about the missing data

mechanism are represented through assumptions about the joint distribution of the Rk and the

yk. This approach is particularly flexible for handling different kinds of nonresponse, for

example both unit nonresponse and item nonresponse, and for extending to an integrated framework which allows for both nonresponse and measurement errors.

The above framework is very general and in order to make useful progress in assessing nonresponse errors or in adjusting for nonresponse it is necessary to make more specific assumptions about the nature of the missing data mechanisms. Three terms will be useful for describing such mechanisms.

Missingness is said to occur completely at random if Rk is stochastically independent of the

relevant survey variables. For example, if unit nonresponse in a survey of production is being considered, this condition would imply that businesses with low levels of production would

be as likely to respond as businesses with high levels of production. This condition is a very strong one and may arise only rarely in practice.

Missingness is said to occur at random given an auxiliary variable (or variables) xk if Rk is

conditionally independent of relevant survey variables given the values of xk. Suppose, for

example, that xk is a measure of size, such as employment or turnover, available on the frame.

In a survey of production, nonresponse would occur at random given the size variable if nonresponse is unrelated to production amongst firms of any given size. The distribution of nonresponse could vary, however, between firms of different sizes. This assumption is generally less stringent than the assumption that data are missing completely at random. It is also an assumption which underlies many adjustment methods by judicious choice of measured auxiliary variables.

A missing data mechanism which does not occur at random given available auxiliary variables is said to be informative or non-ignorable in relation to the relevant survey variables. Consider, for example, item nonresponse on a complex variable, for which the higher the value of the variable, the more work will tend to be required of a business of a given size to retrieve the information. In such circumstances, it may be that even after controlling for measurable factors, such as size of the business, the rate of item nonresponse tends to increase as the value of the variable increases. Item nonresponse on this variable would therefore be informative in relation to this variable.

In document Model Quality Report in Business Statistics (Page 130-132)