B EHAVIOR C ODING

A REA F RAME

Behavior coding concerns the systematic assignment of codes to the overt behavior of interviewer and Behavior Coding 53

respondent in survey interviews. The method was developed by Charles Cannell and his colleagues at the University of Michigan in the 1970s. Behavior coding is a major tool used to evaluate interviewer performance and questionnaire design. Behavior cod-ing is sometimes referred to as ‘‘interaction analysis,’’

although interaction analysis is usually more specifi-cally used in the sense of applying behavior coding to study the course of the interaction between inter-viewer and respondent.

The three main uses of behavior coding are (1) eval-uating interviewer performance, (2) pretesting question-naires, and (3) studying the course of the interaction between interviewer and respondent.

Evaluating Interviewer Performance The use of behavior coding to evaluate interviewer performance primarily concerns how the interviewer reads scripted questions from the questionnaire. Typical codes include ‘‘Reads question correctly,’’ ‘‘Reads question with minor change,’’ ‘‘Reads question with major change,’’ ‘‘Question incorrectly skipped,’’ and

‘‘Suggestive probe.’’ Usually the number of different codes for the purpose of evaluating interviewer perfor-mance ranges from five to 15.

Evaluating interviewer performance is usually part of the main field work. To this end, the interviews from the actual survey are audio-recorded. A sufficiently large sample of interviews from each interviewer is drawn (preferably 20 or more of each interviewer) and subjected to behavioral coding. Results may be in the form of ‘‘Interviewer X reads 17% of the questions with major change.’’ These results are used to give the interviewer feedback, retrain him or her, or even with-draw him or her from the study.

Pretesting Questionnaires

If a particular question is often read incorrectly, this may be due to interviewer error, but it may also be a result of the wording of the question itself. Perhaps the question has a complex formulation or contains words that are easily misunderstood by the respon-dent. To prevent such misunderstandings, the inter-viewer may deliberately change the formulation of the question.

To gain more insight into the quality of the ques-tions, the behavior of the respondent should be coded too. Typical codes for respondent behavior include

‘‘Asks repetition of the question,’’ ‘‘Asks for clarifica-tion,’’ ‘‘Provides uncodeable response’’ (e.g., ‘‘I watch television most of the days,’’ instead of an exact num-ber), or ‘‘Expresses doubt’’ (e.g., ‘‘About six I think, I’m not sure’’). Most behavior coding studies use codes both for the respondent and the interviewer. The num-ber of different codes may range between 10 and 20.

Unlike evaluating interviewer performance, pre-testing questionnaires by means of behavioral coding requires a pilot study conducted prior to the main data collection. Such a pilot study should reflect the main study as closely as possible with respect to inter-viewers and respondents. At least 50 interviews are necessary, and even more if particular questions are asked less often because of skip patterns.

Compared to other methods of pretesting question-naires, such as cognitive interviewing or focus groups, pretesting by means of behavior coding is relatively expensive. Moreover, it primarily points to problems rather than causes of problems. However, the results of behavior coding are more trustworthy, because the data are collected in a situation that mirrors the data collection of the main study. Moreover, problems that appear in the actual behavior of interviewer and res-pondent are real problems, whereas in other cases, for example in cognitive interviewing, respondents may report pseudo-problems with a question just to please the interviewer.

Interviewer–Respondent Interaction If one codes both the behavior of interviewer and respondent and takes the order of the coded utterances into account, it becomes possible to study the course of the interaction. For example, one may observe from a pretesting study that a particular question yields a dis-proportionately high number of suggestive probes from the interviewer. Such an observation does not yield much insight into the causes of this high number.

However, if one has ordered sequences of codes avail-able, one may observe that these suggestive probes almost invariantly occur after an uncodeable response to that question. After studying the type of uncodeable response and the available response alternatives in more detail, the researcher may decide to adjust the formula-tion of the response alternatives in order to decrease the number of uncodeable responses, which in turn should decrease the number of suggestive probes.

In contrast, if the researcher merely looked at the sheer number of suggestive probings, he or she might 54 Behavior Coding

have decided to adjust the interviewer training and warn the interviewers not to be suggestive, especially when asking the offending question. This may help a bit, but does not take away the cause of the problem.

As the previous example shows, interviewer–

respondent interaction studies are focused on causes of particular behavior, that is, the preceding behavior of the other person. Because the researcher does not want to overlook particular causes, each and every utterance in the interaction is usually coded and described with some code. Hence, the number of dif-ferent codes used in these studies can be quite high and exceeds 100 in some studies.

Behavior Coding Procedures Recording Procedures

In a few cases, interviews are coded ‘‘live’’ (during the interview itself), sometimes by an observer, some-times even by the interviewer herself. A main reason for live coding is that one does not need permission of the respondent to audio-record the interview.

Another advantage is that results are quickly avail-able, which can be especially useful in case of pretest-ing questionnaires.

In most studies, however, the interview is first audio-recorded. More recently, in the case of compu-ter-assisted interviewing, the interview is recorded by the computer or laptop itself, thus eliminating the need for a separate tape recorder. Coding audio-recorded interviews is much more reliable than live coding, because the coder can listen repeatedly to ambiguous fragments.

If interviews are audio-recorded, they are some-times first transcribed before coding. Transcripts yield more details than the codes alone. For example, if a particular question is often coded as ‘‘Read with major change,’’ the availability of transcripts allows the researcher to look at the kind of mistakes made by the interviewer. Transcripts also make semi-automatic coding possible; a computer program can decide, for example, whether or not questions are read exactly as worded.

Full Versus Selective Coding

In interviewer-monitoring studies, it may be suffi-cient to code the utterances of the interviewer only;

moreover, the researcher may confine himself to

particular interviewer utterances, like question read-ing, probread-ing, or providing clarification. Other types of utterances—for example, repeating the respondent’s answer—are neglected. In pretesting studies, it is sometimes decided to code only behavior of the respondent. Also, in interaction studies, the researcher may use a form of such ‘‘selective’’ coding, neglect-ing all utterances after the answer of the respondent (e.g., if the respondent continues to elucidate the answer, this would not be coded). Alternatively, each and every utterance is coded. Especially in the case of interaction studies, this is the most common strategy.

All these procedural decisions have time and cost implications. Selective live coding is the fastest and cheapest, while full audio-recorded coding using tran-scriptions is the most tedious and costly but also yields the most information.

Wil Dijkstra See also Cognitive Interviewing; Interviewer Monitoring;

Questionnaire Design

Further Readings

Cannell, C. F., Lawson, S. A., & Hausser, D. L. (1975).

A technique for evaluating interviewer performance:

A manual for coding and analyzing interviewer behavior from tape recordings of household interviews. Ann Arbor:

University of Michigan, Survey Research Center of the Institute for Social Research.

Fowler, F. J., & Cannell, C. F. (1996). Using behavioral coding to identify cognitive problems with survey questions. In. N. Schwarz & S. Sudman (Eds.), Answering questions: Methodology for determining cognitive and communicative processes in survey research(pp. 15–36).

San Francisco: Jossey-Bass.

Ongena, Y. P., & Dijkstra, W. (2006). Methods of behavior coding of survey interviews. Journal of Official Statistics, 22(3), 419–451.

B

ENEFICENCE

The National Research Act (Public Law 93348) of 1974 created the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, which, among other duties, was charged with the responsibility of identifying, articu-lating, and fully explaining those basic ethical princi-ples that should underlie the conduct of biomedical Beneficence 55

and behavioral research involving human subjects throughout the United States. The commission’s find-ings have been detailed in a 1979 document typically referred to as ‘‘The Belmont Report’’ in recognition of the Smithsonian Institute satellite site where it was drafted, the Belmont Conference Center in Elkridge, Maryland. The Belmont Report identified three basic ethical principals for the conduct of research, and one of these is beneficence. (The other identified princi-ples are justice and respect for persons.) The Belmont Report clearly states that the principle of beneficence has its roots in the long-standing ethical guidelines of the medical profession’s Hippocratic Oath generally and, in particular, its maxims instructing physicians to

‘‘never do harm’’ while acting ‘‘according to [one’s]

ability and [one’s] judgment.’’

From these ideas, three more fully articulated notions have been derived. First is the principle that researchers are obligated, not merely encouraged or expected, to take all reasonable steps to avoid inflicting foreseeable harm upon research participants. Second is that researchers are obligated to work toward maximiz-ing the benefits that research subjects might experience from participation in a research program. This does not mean that it is required that a research program provide direct benefits to its research subjects, how-ever. Similarly, investigators are obligated to attempt to maximize anticipated longer-term benefits that society or people in general might realize as a conse-quence of the study. Finally, beneficence incorporates the idea that exposing research participants to risk is justifiable. The reality that research is a human enter-prise, one that relies upon the individual abilities and judgments of researchers acting within the frameworks of existing knowledge and cultural norms, is recog-nized. As such, it is ethically acceptable and permissi-ble for research to possess or encompass potential for a protocol or well-meaning actions taken by an investiga-tor to result in harm to participants; typically some level of risk is appropriate, and it is a judgment call as to what that risk level can and should be. To summar-ize, beneficence represents the process of balancing the trade-off between the potential benefits and the justifi-able risk of potential harms associated with participa-tion in research, and it is manifest in investigator efforts to minimize risks while maximizing potential benefits to the individual participant and/or society as a whole.

The term risk refers to both the likelihood of some type of harm being experienced by one or more

research participants and the extent or severity of that harm in the event that harm is experienced. Therefore, assessments of the risks associated with a research proj-ect may take account of the combined probabilities and magnitudes of potential harms that might accrue to research participants. Furthermore, though one procliv-ity may be to think of harm as physical insults (such as pain, discomfort, injury, or toxic effects of drugs or other substances), the nature of potential harms can be wide and varied. Indeed, while the potential for phy-sical harms typically is virtually nonexistent in survey research, other categories of potential harms frequently are relevant. These other categories include:

• Psychological and emotional harms (e.g., depression, anxiety, confusion, stress, guilt, embarrassment, or loss of self-esteem)

• Social or political harms (e.g., ‘‘labeling,’’ stigmatiza-tion, loss of status, or discrimination in employment)

• Economic harms (e.g., incurring actual financial cost from participation), and

• Infringements of privacy or breaches of confidenti-ality (which, in turn, may result in psychological, emotional, social, political, or economic harms) It is the principle of beneficence, along with the prin-ciples of justice and respect for human subjects, that stands as the foundation upon which the government-mandated rules for the conduct of research (Chapter 45, Subpart A, Section 46 of the Code of Federal Regulations) have been created under the auspices of the U.S. Department of Health and Human Services, Office of Human Research Protections.

Jonathan E. Brill See also Confidentiality; Ethical Principles

Further Readings

U.S. Office of Human Research Protections: http://www.hhs .gov/ohrp/belmontArchive.html

U.S. Office of Human Subjects Research: http://ohsr.od.nih .gov/guidelines/belmont.html

B

IAS

Biasis a constant, systematic form or source of error, as opposed to variance, which is random, variable error. The nature and the extent of bias in survey

56 Bias

measures is one of the most daunting problems that survey researchers face. How to quantify the presence of bias and how to reduce its occurrence are ever-present challenges in survey research. Bias can exist in myriad ways in survey statistics. In some cases its effect is so small as to render it ignorable. In other cases it is nonignorable and it can, and does, render survey statistics wholly invalid.

Overview

Survey researchers often rely upon estimates of popu-lation statistics of interest derived from sampling the relevant population and gathering data from that sam-ple. To the extent the sample statistic differs from the true value of the population statistic, that difference is the error associated with the sample statistic. If the error of the sample statistic is systematic—that is, the errors from repeated samples using the same survey design do not balance each other out—the sample sta-tistic is said to be biased. Bias is the difference between the average, or expected value, of the sample estimates and the target population’s true value for the relevant statistic. If the sample statistic derived from an estimator is more often larger, in repeated samplings, than the target population’s true value, then the sample statistic exhibits a positive bias. If the majority of the sample statistics from an estimator are smaller, in repeated samplings, than the target popu-lation’s true value, then the sample statistic shows a negative bias.

Bias of a survey estimate differs from the error of a survey estimate because the bias of an estimate relates to the systematic and constant error the esti-mate exhibits in repeated samplings. In other words, simply drawing another sample using the same sam-ple design does not attenuate the bias of the survey estimate. However, drawing another sample in the context of the error of a survey can impact the value of that error across samples.

Graphically, this can be represented by a bull’s-eye in which the center of the bull’s-eye is the true value of the relevant population statistic and the shots at the target represent the sample estimates of that popula-tion statistic. Each shot at the target represents an esti-mate of the true population value from a sample using the same survey design. For any given sample, the difference between the sample estimate (a shot at the target) and the true value of the population (the bull’s-eye) is the error of the sample estimate.

Multiple shots at the target are derived from repeated samplings using the same survey design. In each sample, if the estimator of the population statis-tic generates estimates (or hits on the bull’s-eye) that are consistently off center of the target in a systematic way, then the sample statistic is biased.

Figure 1 illustrates estimates of the true value of the population statistic (the center of the bull’s-eye), all of which are systematically to the upper right of the true value. The difference between any one of these estimates and the true value of the population statistic (the center of the bull’s-eye) is the error of the estimate. The difference between the average value of these estimates and the center of the target (the true value of the population statistic) is the bias of the sample statistic.

Contrasting Figure 1 to a figure that illustrates an unbiased sample statistic, Figure 2 shows hits to the target that center around the true value, even though no sample estimate actually hits the true value.

Unlike Figure 1, however, the sample estimates in Figure 2 are not systematically off center. Put another way, the average, or expected value, of the sample estimates is equal to the true value of the population statistic indicating an unbiased estimator of the popu-lation statistic. This is an unbiased estimator even though all of the estimates from repeated samplings never hit the center of the bull’s-eye. In other words, there is error associated with every sample estimate, but not bias.

Figure 1 Example of a biased sample statistic

Figure 2 Example of an unbiased sample statistic

Bias 57

Bias can be classified into two broad categories:

(1) the bias related to the sampling process, and (2) the bias related to the data collection process. In the former case, if the survey design requires a sample to be taken from the target population, shortcomings in the sample design can lead to different forms of bias. Biases related to the sampling design are (a) estimation (or sampling) bias, (b) coverage bias, and (c) nonresponse bias. All of these are related to external validity.

Bias related to the data collection process is mea-surement bias and is related to construct validity.

Measurement bias can be due to (a) data collection shortcomings dealing with the respondent, (b) the questionnaire, (c) the interviewer, (d) the mode of data collection, or (e) a combination of any of these.

To gauge the size of the bias, survey researchers sometimes refer to the relative bias of an estimator.

The relative bias for an estimator is the bias as a pro-portion of the total population estimate.

Estimation Bias

Estimation bias, or sampling bias, is the difference between the expected value, or mean of the sampling distribution, of an estimator and the true value of the population statistic. More specifically, if θ is the population statistic of interest and ^θ is the estimator of that statistic that is used to derive the sample esti-mate of the population statistic, the bias of ^θ is defined as:

Bias½^θ = E½ ^θ − θ:

The estimation bias of the estimator is the differ-ence between the expected value of that statistic and the true value. If the expected value of the estimator, θ, is equal to the true value, then the estimator is^ unbiased.

Estimation bias is different from estimation, or sampling, error in that sampling error is the difference between a sample estimate and the true value of the population statistic based on one sampling of the sam-ple frame. If a different samsam-ple were taken, using the same sample design, the sampling error would likely be different for a given sample statistic. However, the estimation bias of the sample statistic would still be the same, even in repeated samples.

Often, a desirable property of an estimator is that it is unbiased, but this must be weighed against other desirable properties that a survey researcher may want

In document Encyclopedia of Survey Research Methods_Lavrakas_2008.pdf (Page 94-101)