• No results found

Evaluative Feedback

Bases for Evaluation

There are two general bases for evaluating student learning: norm-referenced evaluation and criterion-referenced evaluation. Norm-referenced evaluation is very familiar as the "bell-shaped curve." It is designed to rate a student's performance in relation to the performance of the other students. Students are rank ordered, and grade cutoffs are based on how well the normative group did as a whole. Often the normative group to which a student is compared is his or her own class, although it may be an aggregate of several groups of students who have completed the same task. The individual student is judged in terms of a relative standard; her or his grade reflects that he or she did better than 80 % of the students in the normative group but does not indicate if that means that 40 %, 60 %, or 80 % -- or any other percent -- of the test questions were answered correctly. Norm-referenced evaluation tends to be criticized for unduly punishing moderate and high-ability students in high-ability classes and unduly rewarding moderate and low-ability students in low-ability classes. It is the most defensible when the normative group is very large and varied so that the probability of a representative distribution of students is likely. Criterion-referenced evaluation is based on absolute, objective performance standards or criteria. Its intent is to indicate whether or not a student has mastered a behavior specified in a formal instructional objective. All students have the opportunity of doing well--or of

failing to do well. The key to effective criterion-referenced evaluation is to be sure the measurement of achievement is both reliable and valid. When teachers are required to translate criterion-referenced evaluation systems into a graduated scale of grades, they must specify criteria for different levels of mastery.

Although the distinction between norm-referenced and criterion-referenced evaluation seems to be straightforward, it is common for teachers to assign grades without a clear picture of what they communicate. While they may not subscribe to the idea of grading on a norm-referenced curve, they may also feel uncomfortable when there are "too many" high grades, or "too many" low grades, assuming that tests must be too easy (or hard) or subjective grading standards too lenient (or stringent) when that occurs. In addition, they feel they have a sense of which students need to be challenged to work harder and which need to be reinforced for working hard and use grades as a means of doing so. Thus, judgments of "effort" or "improvement" are considered in modifying the nom or criterion-referenced evaluations before they are communicated to students as grades.

Such hybridizing, of course, serves to muddy the ability of anyone -- students, parents, potential employers, teachers at the next level, and so forth -- to interpret what a particular grade means. If Tika "tried hard" but did not master any of the course objectives will the next teacher know that's what his "C" means? If Dalia mastered every objective but skipped class a lot, how will anyone know that her "C" means something entirely different? Similarly, if Fernando did worse than 97 % of her classmates but showed improvement should her grade be raised at least to a "D" to encourage her? Meanwhile, should Brad, who ranked dead-center in the class but could have tried harder, have his grade lowered to a "D" to tell him his work is below par for his potential? If so, how are we going to communicate what messages these grades really carry?

A Brief History of Grades

Milton, Pollio and Eison (1986) present an interesting chronicle of the history of grades. While the emphasis of their book is on college grades, the trends they illustrate have been characteristic across educational levels and provide some insight into the quandaries associated with evaluative feedback as a part of instructional assessment.

The first grades were recorded in this country in 1783 at Yale, where four descriptive adjectives were used: Optime, Second Optime, Inferiores, and Pejores. These terms translate roughly into the designations of an earlier English system which evaluated students as Honor Men, Pass Men, Charity Passes, and Unmentionables. The standard by

Chapter Eight -94

which students were classified into these ranks is not clear, but it appears that they were intended as designations of academic mastery. In the early 1800s, however, the College of William and Mary reflected a different perspective on evaluation criteria in sending all parents of students a report in which their student's name appeared in one of four lists related primarily to their perceived industriousness (this was obviously before the days of academic privacy laws!):

1. The first in their respective classes, orderly and attentive and have made the most flattering improvement.

2. Orderly, correct and attentive and their improvement has been respectable. 3. They have made very little improvement and as we apprehend from want of

diligence.

4. They have learnt little or nothing and we believe on account of escapade and idleness. (Milton, Pollio & Eison, 1986, p. 4)

By the 1830s numerical scales became popular. Some schools used a 4-point scale, some a 9-point scale, some a 20-point scale, and some a 100-point scale. In 1850 the University of Michigan adopted a pass/fail system; however, by 1860 a "conditioned" level had been added and in 1864 a 100-point scale was incorporated, with a minimum of 50 required for a pass. Meanwhile, other schools which were using three-level evaluations (Passed, Passed With Distinction, and Failed) added plus and minus signs so that students who "Passed With Distinction" could be distinguished from those who merely "Passed With Distinction --.” There appeared to be an ongoing inclination toward making finer and finer distinctions among students' relative degrees of success.

At the end of the nineteenth century, the 100-point scale had become quite popular, with the numerical scores translated into letter symbols to separate students into five achievement groups. Shortly after the turn of the century, the curve came into being at the University of Missouri, as a response to an uproar over a professor who had failed an entire class. The top 3 % of students in a class were thenceforth to be labeled excellent (A), the next 22 % judged superior (B), the middle 50 % to be assessed as medium (C), the next 22 % rated inferior (D), and the bottom 3 % to fail (F). By the end of World War I the curve had caught on, coupled with an era of "objectivity" in testing -- true/false and multiple-choice tests were the hot trends in a new climate of "scientific" evaluation.

Norm-referenced curves, using a 5-point A - F scale, remained the predominant grading philosophy until the 1960s, when a wave of educational humanism led to adoptions

of pass/fail and self-referenced evaluations. This, in turn, led to criticisms of grade inflation and -- once again -- a reactionary trend toward 13-point scales incorporating a full range of plus and minus designations on top of the traditional A - F scale. School faculties spent a great deal of time discussing whether and how plus and minus grades should be calculated into grade point averages, and whether honors or advanced placement classes should count differently than other classes. For example, in one of the authors' first year of teaching high school, the big decision of the year -- made after excruciating deliberation -- was to award an extra honor point for each grade earned in an honors-level class. Thus, an "A" would be calculated as five points rather than four, a "B" as four points rather than three, and so forth. The theory was that this would allow honors teachers to separate the most-honorable students from the merely-honorable and barely-honorable students, and to try to motivate honors students with grades without the danger that some special education student would end up as valedictorian (we swear this was the precise rationale for the system!).

In the end, it becomes apparent that interpreting the messages communicated by grades is a complex process. Milton and his colleagues report the findings of one of their own studies in which experienced faculty members were asked: "Imagine that an intelligent well-informed adult (not connected to higher education) asks you: 'Student X received a B in your course. What does that B mean?'" More than 70 % of the respondents gave straightforward responses to the question without equivocating. Later in the questionnaire the same faculty were asked: "Imagine that your son or daughter is in college. A final grade of C is received in a very important course. How do you interpret this grade to yourself? that is, what does it tell you about your child?" Only 14 % of the respondents said, in this case, that the grade meant "average." The rest were uncertain and wanted to know more specific details about the grade. The moral, we think, is that teachers may well have a clear idea of what their grading systems communicate but that does not mean that shared meaning is inevitable.

Descriptive Feedback

Feedback To Students. As we have seen, evaluative feedback -- which is

communicated in the form of some sort of grading system -- is likely to require a descriptive explanation so that its intended meaning can be interpreted. Without descriptive feedback, a student will not know why a paper earned a "C" rather than an "A" and will be left guessing as to how to improve on the next paper. Without descriptive feedback, a parent will not know whether her child is being evaluated on a norm-referenced or criterion-referenced basis, or what kind of hybridization entered into the final grade. In

Chapter Eight -96

addition, there are many instances throughout a course of study when formative feedback is appropriate, in which case a clear description of what the student has mastered and what remains to be mastered is essential, along with some helpful direction in correcting problems.

Many teachers, the authors included, have expressed frustration at having spent hours writing descriptive comments on student papers only to have many students check the grade and toss the paper in the waste basket by the door. Often this is because students see the assignment of a grade as a summative exercise, and do not perceive the comments on one paper as formative feedback for the next paper. For this reason, it is advisable to provide opportunities for students to obtain descriptive feedback during the process of completing a particular assignment, without being accompanied by an evaluation. Comments on draft copies of assignments or during the developmental stages of projects are more likely to be perceived as having immediately applicable relevance.

Providing descriptive feedback can be time consuming, although teachers should remember that it does not always mean taking home twice as many stacks of papers so that each can be read twice. Sometimes problems that many students are having can be assessed by simply moving around the classroom as students work, or by looking at a sample of eight or ten students' in progress work. These problems can then be brought to the attention of the class as a whole, as the subject of corrective instruction lessons. Often students can give one another descriptive feedback by working in dyads or small groups, or teachers can pair students who are having problems with those who have mastered a task.

Some teachers program a number of precoded comments into a computer so that they can generate personalized feedback for each student by drawing from the coded menu. This allows them to return fully developed explanations of what the student might do to improve her or his performance without having to write the same comments over and over on various papers. Some very individualized notes might still be helpful, but any of us who have written comments in student lab notebooks, on critique forms for class presentations, on essays, or in letters to parents which accompany report cards know that progress toward common objectives usually elicits a relatively predictable need for advice.

Many times it is helpful to separate the descriptive and evaluative components of feedback on graded work. For example, scheduling student conferences a day or so after a set of papers or tests has been returned will usually result in a calmer, more objective discussion than will "buttonhole" conferences on the way out of class -- initiated by the teacher or the student -- while emotions over a disappointing grade are running high. It is logistically difficult in most classes to talk individually with all students after every

assessment opportunity. In elementary and secondary classes, the ability to schedule conferences outside of class time in usually limited; however, opportunities for individual discussions can often be found when the class as a whole is involved in an activity that demands minimal teacher supervision. The students' attitude toward such discussions will be far more positive if they are not reserved only for bad news!

Feedback from Students. Descriptive feedback can also be directed from the

student to the teacher. This kind of feedback allows teachers to make changes in classroom atmosphere, instructional strategies, and so forth based on student input. Research has shown that students are very appropriate sources to solicit information regarding student-instructor relationships; their views on the workload and assignments; what they are learning in the course; the perceived fairness of grading; and the instructor's ability to communicate clearly. Sometimes there is truly nothing the teacher can do to accommodate a student's wishes, but responding to the concern with an expression of empathy and an explanation of why an idea cannot be incorporated in the classroom shows that the feedback is being considered seriously and is likely to result in affective payoffs. Many times student feedback does suggest things a teacher can do (or do more of) to better accommodate the needs and preferences of the particular class. When that is the case, the instructional process is likely to be enhanced.

Feedback from students can be solicited formally or informally. Feedback forms can be devised for periodic use, or students can simply be requested to "write down what you liked most about this unit and what you would have liked to be done differently." One way to do this is a Start-Stop-Continue sheet. Have students fold a piece of paper into thirds and write the words “stop,” “start,” and “continue” one per each section on the page. Then have your students write down (anonymously) things that they would like you to stop doing, things they would like you to start doing, and things they would like you to continue doing. Other teachers place a feedback item at the end of each test so students can "grade" the test. Some develop a routine in which students can drop off a note in a designated place at the end of any day or class period to request content or process clarification that can be made at the start of the next day/class, to comment on anything they liked or didn't like that day, or just to tell the teacher something they want to share in private. This technique usually takes some prompting to get it started; making a point of responding to the feedback and reinforcing students for providing it helps.

The information from formative evaluations of student progress toward mastering objectives also serves as feedback to the teacher. A formative "test," that is not graded, will provide information on where corrective instruction is needed, as well as telling students

Chapter Eight -98

how they are doing. Similarly, the process of reviewing any student work while it is in progress will result not only in an opportunity to give students descriptive feedback but also give the teacher an indication of how things are going. Students can be asked to describe how they think they are doing rather than the teacher's initiating descriptive feedback. Their perceptions can be an enlightening means of assessing how they have decoded the teacher's directions or advice.

If you have your students write an evaluation of the class, or in the Stop-Start- Continue exercise, you must debrief your students once you have examined what they have written. Students want to know that their teachers are taking their opinions and ideas seriously. If your students want you to stop giving homework, this is an unrealistic expectation that requires an explanation for why the homework is so important. If you cannot stop or start something that your students would like you to, explain to them why you cannot do so. Just be careful to avoid the infamous “Because I’m the teacher and you’re the student!”