The Delphi Process - CONDUCTING A DELPHI PROCESS TO IDEN-

CHAPTER 3 CONDUCTING A DELPHI PROCESS TO IDEN-

3.2 The Delphi Process

A Delphi process is a structured process for collecting information and reach- ing consensus in a group of experts [82]. The process recognizes that expert judgment is necessary to draw conclusions in the absence of full scientific knowledge. The method avoids relying on the opinion of a single expert or merely averaging the opinions of multiple experts. Instead, experts share ob- servations (so that each can make a more informed decision) in a structured way, to prevent a few panelists from having excessive influence, as can oc- cur in round-table discussions [83]. In addition, experts remain anonymous during the process, so that they are influenced by the logic of the arguments rather than by the reputations of other experts.

For each of the three computing subjects, we used the Delphi process to identify key topics in introductory courses that are both important and difficult for students to learn. Specifically, we sought a set of key topics such that if a student fails to demonstrate a conceptual understanding of these topics, then we could be confident that the student had not mastered the course content. These key topics would set the scope of each CI.

Because the quality of the Delphi process depends on the experts, we selected three panels of experts who not only had taught the material fre-

quently, but who had published textbooks or rigorous pedagogical articles on these subjects. Within each panel, we strove to achieve diversity in race, gender, geography, and type of institution (community college, four-year college, university, industry, etc.). Clayton [84] recommends a panel size of 15 to 30 experts. Our panel sizes were 21 experts for discrete math, 20 experts for programming fundamentals, and 20 experts for digital logic. We executed our Delphi process through four phases and used on-line surveys to conduct the process.

Phase 1. Concept Identification: We asked each expert to list 10 to 15 concepts that they considered to be both important and difficult in a first course in their subject. Using principles from grounded theory [85], we constructed topic lists to include all concepts identified by more than one expert. For each subject, two or three of the authors coded the experts’ responses independently to create a compiled topic list. The authors then met together to reconcile their different list into one master list. These reconciled lists of topics were used for subsequent phases.

Phase 2. Initial Rating: We asked the experts to rate each topic from the reconciled lists on a scale from 1 to 10 on each of three metrics: importance, difficulty, and expected mastery (the degree to which a student would be expected to master the topic in an introductory course). The third metric was included because some concepts identified in Phase 1 might be introduced only superficially in a first course on the subject but would be treated in more depth in later courses; these concepts would probably be inappropriate to include in a concept inventory. We found that expected mastery was strongly correlated (r ≥ 0.81) with importance for all three subjects. Thus, we dropped the expected mastery metric from Phase 3 and Phase 4.

Phase 3. Negotiation: We calculated the averages and inner quartile ranges (middle 50% of responses) of the Phase 2 ratings. We provided this information to the experts, and we asked them to rate each topic on the importance and difficulty metrics again. In addition, when experts chose a Phase 3 rating outside the Phase 2 inner quartiles range, they were asked to provide a convincing justification for why the Phase 2 range was not appro- priate. These justifications also indicate why certain topics are contentious, important, or difficult.

the importance and difficulty of each topic, in light of the average ratings, inner quartiles ranges, and anonymized justifications from Phase 3. We used the ratings from Phase 4 to produce the final ratings.

During each phase, we tracked the averages, standard deviations, and inner quartile ranges of the ratings. The averages were defined to be the relative importance and difficulty of the topics. Given this definition, we expected that the average should drastically change only in the presence of an expert’s compelling argument against the previous phase’s average ratings.

We defined consensus in terms of the standard deviations of responses, because unanimous consensus would result in a standard deviation of zero while increased variability in the rankings would increase the standard deviation. Inner quartile ranges, rather than standard deviations, were given to the Delphi experts because they provide round numbers and less confusion. Because the aim of the Delphi process is to achieve consensus, we expected to see standard deviations decrease from each phase to the next.

Post Delphi. Analysis of Results: From the importance and difficulty ratings, we computed a single metric with which to rank the topics. The final importance and difficulty ratings were used as the coordinates to calculate the Euclidean distance of each topic from the maximum ratings (importance 10, difficulty 10), the L2 norm. (We found that both the sum and product of the two metrics provided almost identical rankings.) In the tables at the end of this chapter, we highlight the top N topics by this ranking, selecting an N close to 10 such that there is a noticeable separation between the two groups of topics.

After completing the Delphi process, we analyzed the Delphi experts’ comments to find trends and themes about why the topics were rated as important and difficult and to learn why certain topics did not achieve strong consensus. We used a grounded theory-based approach of independent and then joint analysis to discover the themes among the comments.

In document The development of a digital logic concept inventory (Page 68-70)