• No results found

ADDITIONAL ISSUES TO CONSIDER WHEN DESIGNING AN EXPERIMENT

Experimental Studies Barbara M Wildemuth and Leo L Cao

ADDITIONAL ISSUES TO CONSIDER WHEN DESIGNING AN EXPERIMENT

In addition to selecting a particular experimental design for your study, and making every effort to avoid threats to internal and external validity, there are three more

Experimental Studies 111

issues that you need to consider when designing an experiment: whether to conduct the experiment in the lab or in the field, whether to use a within-subjects or between-subjects design, and the ethical issues that may arise in your interactions with your study participants.

Experimental Setting: Lab or Field?

Conducting your experiment in the lab gives you the most control over extraneous variables. For example, you know that every participant used the same computer with the same network connections under the same conditions. While this level of control is a core strength of experimental designs, it may limit the external validity of the study’s findings, as noted previously. Conducting your experiment in the field may increase its external validity. For example, participants are using the computer and network connections that they actually have available to them, and the conditions are more realistic. However, it is often the case that a field experiment provides so little control that you can no longer consider it an experiment. You will need to carefully consider the goals of your study to decide whether the lab or the field is a more appropriate setting for your study. You will want to pay special attention to how the two examples discussed later in this chapter handled this decision because one (Yu & Roh, 2002) was conducted in the lab and the other (Churkovich & Oughtred, 2002) was conducted in the field.

Within- versus Between-subjects Designs

One of the basic questions you face in setting up your experimental design is whether you are using independent groups (i.e., a participant is in only one group and experi- ences only one intervention) or overlapping groups (i.e., each participant experiences multiple interventions). The first of these cases is called a between-subjects design be- cause the comparisons made during data analysis are comparisons between subjects. The second of these cases is called a within-subjects design because the comparisons made during data analysis are comparisons within subjects (i.e., between each individ- ual’s outcomes with one intervention and the same individual’s outcomes with another intervention).

There are some situations that definitely call for a between-subjects design. For instance, if the interaction with the first intervention would have a very strong im- pact on the interaction with the second intervention, you can only avoid this threat to the internal validity of your study by using a between-subjects design. This type of situation often arises when you are investigating the effects of different instructional approaches, as in the Churkovich and Oughtred (2002) study discussed later. If your study procedures are quite demanding or take a long time to complete, you should also plan to use a between-subjects design to minimize the burden on each participant. For example, if you want your participants to complete six to eight searches with a novel search engine, filling out a brief questionnaire after each search, it may be asking too much of them to then repeat the entire process with a second search engine. The biggest disadvantage of using a between-subjects design is that it requires you to recruit more subjects. Thus it is more costly, and a within-subjects design might therefore be preferred.

A within-subjects design is more efficient in terms of the use of your subjects because each person will contribute two or more data points to your analysis. It also allows you

112 APPLICATIONS OF SOCIAL RESEARCH METHODS

to ask your participants for a direct comparison of the interventions (e.g., system) under investigation. In addition, it minimizes the variation in individual characteristics, making your statistical analysis more powerful. If you use this design, you need to be aware of potential learning effects from the specific order of the tasks that subjects complete or the interventions they experience. You can control for these effects by counterbalancing the order in which they’re completed. For example, if you have group 1 and group 2 with interventions A and B, you will run intervention A, then B, for group 1, and intervention B, then A, for group 2 (this example is depicted here):

Group 1: R XA O XB O

Group 2: R XB O XA O

The Latin square experimental design is a formalized method for arranging the order of the tasks and/or interventions for even more complex situations (Cotton, 1993).

Ethical Issues in Experiments

Experiments, like other forms of research, require the ethical treatment of your study participants. In addition, there are two aspects of experiments that are unique and warrant some mention here. The first is that the experimental setting sometimes imposes demands on the subjects that they don’t experience in other activities. Your research procedures may bring on excessive fatigue (if very demanding) or excessive boredom, especially if the session is long. As you plan your research procedures, consider their effects on your participants. Second, the use of a control group implies that some of your subjects will not experience the planned intervention. In cases where your research hypothesis is that the intervention will have significant advantages, withholding it from a portion of your subjects may be unethical. Thus you need to plan for ways to provide access to the intervention for the control group (usually after the experiment is completed; thus their access is just delayed, not denied). As with all kinds of research, it is your responsibility to ensure that the burdens of research participation are not unreasonable and are fairly distributed, and that the benefits accrue to all your participants in an equi- table way.

EXAMPLES

The two examples discussed here vary in multiple ways. The first (Yu & Roh, 2002) uses a posttest-only control group design and the second (Churkovich & Oughtred, 2002) uses a pretest-posttest control group design. The first is a within-subjects design and the second is a between-subjects design. The first is a laboratory study and the second is a field experiment. Thus a lot can be learned from looking at them side by side.

Example 1: Users’ Preferences and Attitudes toward Menu Designs To investigate the effects of menu design on the performance and attitudes of users of a shopping Web site, Yu and Roh (2002) used a posttest-only control group design. The posttest in this study consisted of observations of the speed with which the study participants could complete assigned searching and browsing tasks and the participants’ perceptions of the appeal of the Web site and whether they were disoriented when using

Experimental Studies 113

it. Even though the measurements of time were actually collected during the interactions, they’re still considered part of the posttest.

There was no true control group, but each of three types of menu designs was compared with the others: a simple hierarchical menu design, a menu design supporting both global and local navigation, and a pull-down menu design. The selection of the posttest-only control group design was a sound one. There was no interest in examining how the participants changed during their interactions with the Web sites, so there was no need for a pretest. This experimental design is often used to compare users’ reactions to different system designs.

The authors used a within-subjects design, with each of the 17 participants interacting with each of the three menu designs. These three iterations of the research procedures occurred over three separate sessions, each a week apart. The authors chose to space out the research sessions to avoid memory effects (the same browsing and searching tasks were used each time). Unfortunately, this decision did result in some attrition from the sample. Of the original 21 subjects recruited for the study, 4 (almost 20%) were not included in the final analysis because they missed at least one of the sessions. An alternative would have been to develop three parallel sets of browsing and searching tasks and administer all of them in a single session, counterbalancing the design so that all tasks were performed on all three systems. The research procedures that were used took about a half hour to complete; if they had been combined into one session, participant fatigue may have been an issue. If a study is planned to be implemented over repeated sessions, it would also be appropriate to recruit a larger sample than needed to ensure that the final sample size is large enough.

With a within-subjects design, the order in which the interventions are presented can become an issue. The authors handled this issue through random assignment. At the first session, each participant was randomly assigned to a particular menu design (represented by a prototype Web site). At the second session, each participant was randomly assigned to one of the two Web sites he or she had not yet seen. At the third session, the participant interacted with the remaining Web site. Through this careful randomization (an alternative to the counterbalancing discussed earlier), the authors could be sure that any effects on the dependent variables were not due to the order in which people interacted with the three menu designs.

In this study, each participant completed 10 searching tasks and 5 browsing tasks with each of the three menu designs. The order in which these tasks were completed could also have an effect on the outcomes (though it’s less likely than the potential for effects from the order in which the systems were presented). To guard against this possible threat, “the task cards were shuffled beforehand to ensure that the sequence of tasks was random” (Yuh & Row, 2002, p. 930). I would not agree that shuffling cards ensures random ordering; the authors could have used a random number generator to select the task order for each participant for each session. Nevertheless, the precautions they took were reasonable, given the low potential risk to the validity of the study results.

In summary, this example of a posttest-only control group design is typical of many studies comparing two or more alternative information systems. It was conducted in a lab, and adequate control was exerted over the procedures so that most threats to the study’s validity were addressed. While it is somewhat unusual to spread the data collection over three sessions, the authors provided their reasons for this decision. The within-subjects design was very efficient, allowing them to discover differences between the menu designs while using only 17 subjects.

114 APPLICATIONS OF SOCIAL RESEARCH METHODS

Example 2: Comparison of Bibliographic Instruction Methods

Churkovich and Oughtred (2002) were interested in the question of whether online tutorials would be as effective as face-to-face bibliographic instruction for first-year college students. They explicitly posed the question, Could first-year students be trained successfully by an online tutorial without the presence of a librarian? Their actual re- search design reveals that their research question was a bit more complex than this. They compared three conditions: traditional face-to-face bibliographic instruction, in- dependent instruction through two modules of an online tutorial, and mediated (i.e., librarian-assisted) instruction through two modules of the online tutorial.

To investigate this question, they used a pretest-posttest control group design. This de- sign allowed the researchers to gauge the increase in students’ knowledge and changes in their attitudes due to the instruction received. The pretest consisted of 14 multiple-choice questions covering three types of basic information skills: identifying and searching for citations, keyword searching, and knowledge of library resources. In addition, the stu- dents responded to two attitudinal questions concerning their comfort with asking for library assistance and the importance of library skills for college students. The same questions were administered on the posttest, in addition to a question concerning their confidence as a result of the tutorial. Given that the pretest and the posttest were ad- ministered within about one hour of each other, it’s very possible that the pretest may have interacted with the posttest and with the students’ experiences of the instructional intervention. However, since the point of the study was to compare the three instructional approaches and all the students received the same pretest, this type of interaction should not have affected the outcomes of the study.

The students signed up for an instruction session, then the groups were randomly assigned to one of the three instructional approaches. Sixty students in three groups received the face-to-face instruction (this group would be considered the control group), 45 students in two groups received the mediated online tutorial, and 68 students in four groups received the online tutorial worked independently. If the groups had not been randomly assigned to the three types of instruction, this would have been a quasi- experiment. Even the approach used, though random, does not eliminate the potential for bias due to initial differences between the groups. Ideally, the students would have been individually assigned (randomly) to a particular intervention. But that is not the way students’ lives work; they want control over when they will receive this type of instruction. Thus they were given the freedom to sign up for a group, and random assignment was at the group level. By using a pretest, the researchers were able to verify that there were no significant differences between the groups prior to the instruction being offered.

This study used a between-subjects design. Each participant received only one of the instructional interventions. Because the interventions were covering the same material, it is clear that it would be inappropriate to have each student repeat the instruction three times. Thus a between-subjects design was the only feasible option.

In summary, this is a typical example of a practice-based field experiment intended to compare three different forms of bibliographic instruction. Churkovich and Oughtred’s (2002) primary research question was whether the online tutorial was effective enough to replace face-to-face instruction. Their interest in the changes brought about by the instruction indicated that a pretest-posttest design of some type would be appropriate. Likewise, a between-subjects design was the obvious choice for comparing instructional

Experimental Studies 115

interventions. While they could not randomly assign individual subjects to particular interventions, they did randomly assign intact groups to the three forms of instruction, thus finding an appropriate compromise between control and feasibility.

CONCLUSION

The principles of experimentation are relatively simple: randomly assign the sub- jects to two or more groups and measure the outcomes resulting from the intervention experienced by each group. But once you move beyond this basic characterization of experiments, you will find that there are a number of decisions to make about your experimental design. It is crucial that you understand your research question thoroughly and state it clearly. Then you will be able to identify which independent variables should be manipulated and which dependent variables (i.e., outcomes) should be measured. A dose of common sense, some reading of prior research related to your question, and a bit of sketching and analyzing of the Xs and Os will go a long way toward a good experimental design for your study.

WORKS CITED

Bernard, H. R. (2000). Social Research Methods: Qualitative and Quantitative Approaches. Thousand Oaks, CA: Sage.

Campbell, D., & Stanley, J. (1963). Experimental and Quasi-experimental Designs for Research. Chicago: Rand-McNally.

Churkovich, M., & Oughtred, C. (2002). Can an online tutorial pass the test for library instruction? An evaluation and comparison of library skills instruction methods for first year students at Deakin University. Australian Academic and Research Libraries, 33(1), 25–38. Cotton, J. W. (1993). Latin square designs. In L. K. Edwards (Ed.), Applied Analysis of Variance

in Behavioral Science (pp. 147–196). New York: Marcel Dekker.

Haas, D. F., & Kraft, D. H. (1984). Experimental and quasi-experimental designs for research in information science. Information Processing and Management, 20(1–2), 229–237. Montessori, M. (1964). The Montessori Method. A. E. George (Trans.). New York: Schocken.

(Original work published 1909)

Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, Design, and Analysis: An Integrated

Approach. Hillsdale, NJ: Erlbaum.

Piper, A. I. (1998). Conducting social science laboratory experiments on the World Wide Web.

Library and Information Science Research, 20(1), 5–21.

Yu, B.-M., & Roh, S.-Z. (2002). The effects of menu design on information seeking performance and user’s attitude on the World Wide Web. Journal of the American Society for Information

13

Sampling for Extensive Studies