New Measures of Question Difficulty - Ability, Motivation, and Difficulty

5.6 Ability, Motivation, and Difficulty

6.2.5 New Measures of Question Difficulty

The analysis in the above sub-sections suggests that respondent characteristics such as education and political informedness affect “no opinion” response behavior. These characteristics, however, are not fully able to explain the variance in “no opinion” response behavior suggesting that it might be important to include difficulty measures into the analysis. Because difficulty measures were not collected with the original NES

completely separate sample from those who answer the survey (third-party measure) or a subset of the survey participants might be asked to provide difficulty ratings in addition to answering opinion questions. There is a clear trade-off between the cost of asking all respondents to provide difficulty ratings on each opinion question and the power of the data provided by respondent-reported difficulty measures.

29_{Of course this is based on the assumption that respondents are able to assess the difficulty of}

questions. In Chapter 7, respondents’ relatively high levels of variance in difficulty rating are used to support the idea that respondents are actually capable of assessing difficulty.

surveys, third-party difficulty ratings are analyzed. The results from this analysis suggest that there is not a very close relationship between third-party difficulty ratings and “no opinion” response behavior. However, the fact that difficulty ratings varied significantly across subject, even after controlling for objectively measurable question characteristics, suggests that objective measures of difficulty may not be appropriate. Instead, it may be useful to obtain objective measures of difficulty. With this in mind, attention is now turned to the 1998–1999 Multi-Investigator Study (MI1998) survey in which respondent-reported difficulty measures and response times were collected. These two measures provide alternative ways of capturing respondent perceptions of question difficulty. Assuming that “no opinion” responding is an indicator of shirking, by combining these new measures of question difficulty with respondent characteristics a better understanding of the causes of shirking can be established.

Difficulty Measures in the 1998–1999 Multi-Investigator Study

The 1998–1999 Multi-Investigator Study (MI1998) attempted to collect data about how people form their opinions.30 As part of this effort, a series of six policy questions concerning affirmative action, welfare, and immigrant rights were paired with questions asking respondents to indicate how difficult these questions were to answer. For example, after answering the question “When it comes to setting aside a certain number of government construction contracts for businesses that are owned and op- erated by minorities, are you for or against this?” respondents were asked “How hard was it for you to make up your mind on that last question—not hard at all, not very hard, somewhat hard, or very hard?”

Table 6.11 indicates the distribution of difficulty ratings for these six policy questions. Respondents who answered “don’t know” or who refused to provide a response to the opinion question were not asked for their difficulty ratings, thus their data is coded as missing. It appears that most respondents did not find it very difficult to

30_{See Appendix A.3 for a further discussion of the 1998–1999 Multi-Investigator survey. This}

survey was designed such that each of the questions examined in this section were only administered to two-thirds of the sample.

answer these questions.31 _{Even if respondents who said “don’t know” on the opinion}

question have their difficulty ratings coded as very hard, the percent of very hard ratings would, on average, be less that ten percent. At the other end of the spec- trum, approximately fifty percent of respondents thought the questions were not hard at all. In fact, assuming that missing difficulty ratings should be set at the maxi- mum, the average difficulty ratings for the questions were all around 1.8 (Standard Deviation = 1.0) corresponding to a difficulty of just below not very hard.32

Table 6.11: Tabulation of Difficulty Ratings by Question

Percent Percent Percent Percent

Not hard Not very Somewhat Very Percent Sample

Question at all hard hard hard Missing Size

Set-Asides 49.51 20.17 22.99 4.65 2.68 709

Flexible Admit. Stand. 51.06 22.57 15.94 7.05 3.38 709

Workfare 53.97 17.56 18.41 8.36 1.70 706

Welfare Limits 50.99 18.84 22.24 7.22 0.71 706

Welfare for Legal Immig. 52.85 21.70 19.05 5.70 0.70 719

Edu. for Illegal Immig. 57.02 21.00 14.88 6.40 0.70 719

Although overall averages are very similar across questions, it is possible that individual respondents find some questions easier to answer than others. To deter- mine if respondents’ difficulty perceptions vary, difficulty ratings provided by each respondent were compared across the questions.33 _{In total, 145 respondents provided}

the same difficulty for all four questions, 187 respondents provided ratings that were within one point of each other (i.e., rated all of them as not hard at all and not very

31_{It is important to note that this analysis assumes that the difficulty questions provide a good}

way of assessing how difficult respondents perceive each question to be. Evidence explored below concerning individual level variance in these difficulty ratings suggests that respondents do not usually provide the same difficulty rating for all six questions. While this is not concrete evidence that these difficulty measures are meaningful, it does indicate that respondents are considering these questions differently.

32_{For this analysis difficulty ratings are coded from 1, not hard at all, to 4, very hard. See}

Table A.5 for a summary of the difficulty ratings for these questions.

33_{As discussed in A.3, questions were randomized across the full sample, thus, each respondent}

hard), 303 respondents provided difficulty ratings that differed by two points (i.e., not hard at all and somewhat hard), and 432 respondents rated at least one of the questions as not hard at all and another as very hard. This suggests that most respondents’ difficulty perceptions vary across questions even though average difficulty ratings were very similar.

Response Times in the 1998–1999 Multi-Investigator Study

The 1998–1999 Multi-Investigator Study also collected information about how long respondents took to answer the six policy questions about affirmative action, welfare, and immigrant rights mentioned above.34 The active response times (length of time between when the interviewer finishes asking the question and the respondent begins answering) for these questions ranged between zero seconds and over two minutes.35

Most respondents were able to answer the questions in less than twenty seconds but there are a few outliers who took significantly longer to answer questions.36 _Huckfeldt,

Sprague & Levine (2000) suggest that many problems may occur in the coding of activated timers and recommend that response times of zero seconds and response times that were more than three standard deviations above the sample mean be coded as missing.37

34_{There are two main types of response timers: active and latent timers. Active response timers}

measure the time between when the interviewer finished asking a question and when the respondent began answering the question. Latent response timers measure the time between the beginning of one question and the beginning of the next. Thus latent response timers measures include not only how long the respondent takes to answer but also the length of time interviewers take to read the question and to provide explanations possibly introducing confounding effects from interviewers. However, as discussed in Mulligan et al. (2003), latent and active response timers are highly correlated.

35_{Response times of zero seconds suggest that participants provided responses before the inter-}

viewer finished asking the question.

36_{Over ninety-five percent of respondents answered each of these question in less than twenty}

seconds.

37_{Only a few response times of zero seconds were recorded for these questions: 4 for the question}

about set-asides, 2 for the flexible admission question, 2 for the workfare question, and 2 for the welfare limits questions; the response times for the welfare for legal immigrants and education for illegal immigrants were all positive. Slightly more response times were recorded that were more than three standard deviations above the sample mean: 18 for the question about set-asides, 21 for the flexible admission question, 12 for the workfare question, 6 for the welfare limits questions, and 15 for both questions related to immigrants. Trimming these respondents from the dataset amounts to losing less than three percent of the response times for each question. Analysis which included these response but re-coded zeros as 0.5. provided similar results.

Figure 6.6 contains histograms of the trimmed response times, in hundredths of seconds, for answers to each of these questions. The first thing to note is that, de- spite the trimming procedure, response times are highly skewed to the right. Having considered several different transformations of response times, it appears that the distribution of the log of response times most closely resembles a normal distribution.38 Thus, subsequent analysis will incorporate the log of the trimmed response times.

0 .001 .002 .003 De n s ity 0 500 1000 1500 2000 2500 Flex. Admin. Trimmed Timer

0 .001 .002 .003 De n s ity 0 500 1000 1500 2000 2500 Welfare for Legal Immig. Trimmed Timer

0 .001 .002 .003 De n s ity 0 500 1000 1500 2000 2500 Edu for Illegal Immig. Trimmed Timer

0 .001 .002 .003 De n s ity 0 500 1000 1500 2000 2500 Set-Asides Trimmed Timer

0 .001 .002 .003 De n s ity 0 500 1000 1500 2000 2500 Workfare Trimmed Timer

0 .001 .002 .003 De n s ity 0 500 1000 1500 2000 2500 Welfare Limits Trimmed Timer

Figure 6.6: Histogram of Response Times for Questions in MI1998

Since in some of the following analysis, response times will be used as a measure of how difficult respondents found the question to answer, Figure 6.7 contains the distribution of response times by respondent-reported difficulty measures for the question about government set-asides for minority owned businesses.39 _{This figure suggests}

38_{Transformation considered include: the cubic, identify, square, square-root, natural log, inverse,}

inverse of the square, inverse of the cubic, inverse of the square-root.

39_{Respondents who take longer to answer the question are assumed to perceive the question as}

more difficult. For a similar interpretation of response timers, see Albertson, Brehm & Alvarez (Forthcoming, 2004).

0 5.0e-04 .001 .0015 .002 0 5.0e-04 .001 .0015 .002 0 2000 4000 6000 0 2000 4000 6000

Not hard at all Not very hard

Somewhat hard Very hard

Density

Set-Asides Timer

Figure 6.7: Histogram of Response Times by Difficulty

that response times and reported question difficulties are positively correlated. The same relationship is evident for each of the other five public policy questions under consideration. In fact, simple correlations between response times and reported question difficulty ratings were between 0.22 and 0.39. Hence, it appears reasonable to include response times together with the difficulty measures in the analysis of shirking as identified by “no opinion” response behavior.

6.2.6 Incorporating New Difficulty Measures into the Anal-

In document The shirking model--a theory of how people answer survey questions (Page 89-94)