Errors in Surveys
• Models of survey error: What, why, different perspectives • Total survey error and mean square error
• Views of the components of error • Types of error
Errors in survey: Overview
• Models of survey error are statements in mathematical notation about the factors that influence a survey answer • All models make assumptions and simplifications
• Models from different traditions use different formulations and have different conceptualizations of validity and reliability
What are models of survey error?
• Descriptive
• Identify and label components of error
• Indicate where in the process of survey measurement errors originate
• Theoretical and statistical
• Imply conditions or designs under which errors can be estimated
• Make assumptions about error explicit
Why models of survey error?
• Explore ways of reducing errors and test success of results • Evaluate trade-offs between errors and costs
Why models of survey error (continued)
When we have designs that allow us to measure the
impact of errors on survey estimates, then we can:
• What is the statistic of interest?• Will constant errors affect the statistic of interest? • Errors can affect means and regression coefficients
differently
Three questions about models of error
(Groves, 1991, pp. 3-5)
• Which features of data collection are fixed and which are variable?
• Affects which components can be estimated and interpretation of observations
• Example: Sampling statistician is concerned with variation over samples from same population with same design. Analyst may be concerned with values of a specific sample.
• Example: Psychometric notions of reliability consider response error over repeated applications of same item to same sample, not over different samples.
• What assumptions are made about persons not measured or the properties of observational errors--are errors
eliminated by model assumptions?
Three questions about models of error
(continued)
• Disciplines
• Sampling statistics
• Population means and totals • Total survey error
• Psychometrics • Econometrics • Orientation • Reduce or measure • Collect or analyze • Describe or model
Y
y
y
y
Y
Y
trueY
Y
trueE y
( )
E y
( )
Example: Sampling distribution (Kish 1965, p. 12)
is the population quantity estimated by under a set of essential survey conditions.
The diagram shows the sampling distribution of . Each of the "." represents a specific sample estimate, .
If there is bias in the estimator of provided by the sample design (sampling bias), will differ from . We will ignore the possibility of sampling bias.
We will consider only the total distance between the and , and we will think of the distance as total bias.
Example: Sampling distribution (Kish 1965) (continued)
Y
trueMSE y( )=Var y( )+[ (E y−Y)]2
Y
Y
trueMean square error
The variable component can be generalized to include variable errors in addition to sampling error.
We cannot usually estimate the bias component of MSE because we do not have information about , which we can think of here as equivalent to .
Sometimes we can ask about something for which there is an external criterion, that can be used as an estimate of , even if it is not a perfect estimate.
The mean square error of an estimator includes both variable error and bias:
MSE
=
var
iance
+
bias
2Y
trueY
Y
• Deviations are taken from the population value, . • Variable errors are those that vary over replications.• For example, we imagine conducting replications of the survey under the same essential survey conditions, drawing different respondents each time.
• Each trial gives a different estimate, and the distribution of estimates is the sampling distribution of the statistic. • Bias includes sources of error that are constant over
repeated replications of the survey design, that is, may not be equal to .
• Models differ in what is considered fixed and what is considered variable
• Which components of error can be estimated depends on the survey design.
Mean square error: Variability and bias
• Consider two estimators with different properties • One has high variance and low bias
• One has low variance and high bias • What is the total MSE of each?
Mean square error: Variability and bias
(Biemer and Lyberg 2003)
Mean square error (continued): Variance and
bias (Biemer and Lyberg 2003)
Left Target: Low Bias, High Variance Right Target: High Bias, Low VarianceHit
Distance from Hit to Center of Hits
Squared Distance from
Center Hit
Distance from Hit to Center of Hits Squared Distance from Center 1 (2.2 - 0.15) = 2.05 4.20 1 (3.1 - 4.5) = -1.40 1.96 2 (-3.6 - 0.15) = -3.75 14.06 2 (3.7 - 4.5) = -0.80 0.64 3 -4.65 21.62 3 0.80 0.64 4 6.65 44.22 4 0.40 0.16 5 4.95 24.50 5 1.60 2.56 6 -7.35 54.02 6 -0.10 0.01 7 -4.05 16.40 7 -1.70 2.89 8 5.15 26.52 8 1.60 2.56 9 -1.95 3.80 9 0.00 0.00 10 2.95 8.70 10 -0.40 0.16 Avg. 0.0 21.81 (=variance) Avg. 0.0 1.17 (=variance) Bias = (0.15-0.0) = 0.15 Bias2 = 0.023 Bias = (4.5-0.0) = 4.5 Bias2 = 20.25
MSE = Bias2+Variance = 21.8 MSE = Bias2+Variance = 21.4
RMSE= 57. 2n ( ) RMSE= 5 72 n+322 . .
y
Y
trueExample: Contribution of bias to total error
(Kish 1965, p. 513)
Homeowners interviewed about home value = mean in sample = 9,200
standard deviation = 5,700
= appraiser estimate = 8,880 if appraiser estimate is true mean, bias = 320 = 3.5% of mean without bias, with bias, 5 72 . n
(
5 7. 2 n)
+ 322(
)
5 7. 2 5 7. 2 n+ 0 32. 2 * n= 100 1,000 10,000 .57 .18 .06 .65 .37 .32 76 240 308Example: Contribution of bias to total error
(continued)
We can see the relative contributions of variable error and bias to total error by examining RMSE under different sample sizes:
* number of observations from unbiased design that would yield the same total error as n observations from design with a mean bias of .32 = measure of the effect of the bias on the total error for different sample sizes
Sampling variance as a component of total MSE,
Comparing MSE across designs (Biemer and
Lyberg 2003, p. 60-61)
Comparing MSE across designs – bias
(continued)
• Design A - face-to-face
• Frame bias: Area frame sample, so all houses will be listed and coverage error should be low
• Response bias: Highest response rate, nonresponse bias probably lowest
• Measurement bias: Likely to be lowest
• Design B - telephone interviewing
• Frame bias: RDD frame omits those without phones
• Response bias: Moderately low response rate, nonresponse bias probably intermediate
• Measurement bias: Likely to be larger than A
• Design C - self-administered by mail
• Frame bias: Telephone directory-type listing for obtaining addresses omits both nontelephone and unlisted households • Response bias: Lowest response rate, nonresponse bias
probably highest
Comparing MSE across designs - variance
(continued)
• Design A - face-to-face
• Measurement variance: Largest because of presence of interviewer
• Sampling variance: largest because budget allows smallest sample size
• Design B - telephone interviewing
• Measurement variance: Intermediate because uses interviewer, but interviewer not in the room
• Sampling variance: intermediate because cost of interviews allow intermediate sample size
• Design C - self-administered by mail
• Measurement variance: Smallest because of absence of interviewer
• Sampling variance: smallest because budget allows largest sample size
• Error and the steps in a survey (Groves et al. 2004) • Error in the survey process (Groves et al. 2004)
• Sampling and five major sources of nonsampling error (Biemer and Lyberg 2003)
• Total survey error perspective
Views of the components of error
Error and quality in the steps of a survey
• Sampling error • Specification error • Concepts • Objectives • Data elements • Frame error • Omissions • Erroneous inclusions • Duplications
Major sources of error (Biemer and Lyberg 2003,
p. 39)
• Nonresponse error • Whole unit • Within unit • Item
• Incomplete information from open questions • Measurement error
• Information systems or records consulted by respondents • Setting
• Mode of data collection • Respondent
• Interviewer • Instrument
Major sources of error (continued)
• Processing error • Editing
• Data entry • Weighting • Tabulation
Major sources of error (continued)
Total survey error: Types of error (based on
Groves 1989)
Errors of Nonobservation Coverage Sampling Nonresponse Errors of Observation Questionnaire, Mode Interviewer Respondent Errors of Processing Editing Coding ImputationTotal survey error: Types of error (Groves
1989)
Methods of measuring errors
P Universe statistics P Frame statistics P Sample statistics P Records, true values
on R's P Pre-post comparison P Replication of frame construction P Randomized, repeated selections P Randomized, multiple recruiters P Interpenetration of multiple measurers P Interpenetration of processors Coverage Sampling Nonresponse Measurement Processing Bias Variance