Assessment is the process of measuring the extent to which a student or group of students has met the learning outcomes of a particular course. This is often done by a lecturer or teacher critically evaluating students work such as essays, presentations, reports, examination scripts or in this case source code.
There are two main approaches to assessment; these are criterion refer-enced assessment and norm referrefer-enced assessment (Brown, 1997). Criterion referenced assessment is focused on measuring whether or not students have met pre-specified criteria and is used to determine how well a student has performed against a static objective as opposed to in comparison to another student. Norm referenced criteria are designed to permit comparisons between
students and to allow rankings to be generated. This thesis is concerned primarily with criterion referenced assessment as this is the most commonly used in Higher Education in the United Kingdom.
The functions of assessment tasks are often distinguished as being either formative or summative (DES/WO, 1988). This report briefly discusses these different types of assessment and how they relate to learning programming.
2.3.1 Formative and Summative Assessment
The purpose of formative assessment is to allow “positive achievements of students to be recognised” (DES/WO, 1988) and to highlight where improve-ments can be made. It is useful for giving learners a chance to improve before attempting an assessment that contributes to their final qualification.
Formative assessment is designed to take place at regular intervals throughout the course. For an assessment to qualify as being formative the feedback derived from it must contain information that enables students to improve on their performance (Wiliam and Black, 1996). Feedback from formative assessment should be used to highlight problems in students learning so that remedial action can be taken (Harlen and James, 1997). If this feedback is provided immediately before a lecture via, for example, a class questionnaire or online test, it is sometimes known as just-in-time teaching (JiTT) (Novak et al., 1999). Just-in-time teaching refers to a process whereby the lecturer uses formative feedback on how a cohort has understood some element of a course in order to guide or modify the content or pace of the future lectures (Bailey and Forbes, 2005).
Summative assessment is the type of assessment used to measure students’
learning so that students as well as stakeholders (e.g. funding bodies, parents and the institution) can record and compare achievement in an objective way.
Often summative assessment results are in the form of a grade or percentage and contribute to the students’ end qualification results. More often than not summative assessment does not have a significant contribution to learning (Knight, 2002), instead simply acts as a measurement of achievement.
Harlen and James argue that there is often a blur between formative and summative assessment and that there is a definite need to ensure the distinction is maintained. The distinction between formative and summative types of assessment is essentially that of timing and purpose (Harlen and James, 1997). Formative assessment is designed to be regular and to contribute to the students’ learning, whereas summative assessment is designed to be a measure or summation of the students’ achievement at a certain point.
Furthermore, there is a difference in perception for the different types of assessment. There is the perception that formative assessment should be a dialogue between tutor and student (Knight, 2002), where there is an opportunity to clarify and negotiate meanings and concepts to do with the assessed work. In contrast summative assessment represents a judgement, where there is an imbalance of power between the assessor and the assessed (Higgins et al., 2001; Knight, 2002). As a result there is no longer the perception of a dialogue but more of a unidirectional communication from the tutor to the student. Wiliam and Black disagree and suggest that all assessment can in fact serve a summative purpose as long as it leads to interpretable evidence of student performance being generated. It is the additional quality of generating feedback which can be used to improve student performance in some way that makes an assessment capable of serving a formative purpose (Wiliam and Black, 1996).
Unfortunately, the process that has been adopted for assessment has become one that is expensive for both tutors and students (Knight, 2002).
Students invest significant time and emotion into their work and tutors are investing ever more time to mark it. Time pressures often encourage surface approaches to learning as it is often quicker to rote learn than it is to develop a deeper understanding of the topic (Knight, 2002). Recent studies suggest that assessment is becoming more and more central to education, in so far as, if you wish to change the way students learn, then changing the methods of assessment is the best way of doing so (Brown, 1997). This is incongruous as the purpose of assessment as measuring learning outcomes. The change in
student learning should originate from the other direction. That is, to change the way students learn you should change the learning outcomes and then the assessment. Assessment should not be used as the primary driver of teaching.
It should only be used to generate feedback and to measure whether students have met the learning outcomes.
2.3.2 Peer Assessment
Peer review or peer assessment (Dochy et al., 1999) is a technique familiar to most people within academia. It is the way we encourage good scholarship and expand the human body of knowledge (Gehringer et al., 2006). In a learning environment, peer assessment activities are operated on a compressed scale where each student occupies both the role of an author and a reviewer.
The idea here is to increase the amount of feedback circulated between students. It is clear that the amount of feedback that can be delivered by other students is significantly higher than the amount that can be feasibly delivered by the relatively few teaching staff (Gehringer et al., 2006). More benefits derive from peer feedback, of these, one of the most important is that of comprehension. Students, when talking to one another, use familiar vocabulary and are less likely to use language that is not mutually understood, whereas lecturers and academics often use a very specialised vocabulary that can exclude students from understanding the feedback (Carless, 2006). This means that the feedback exchanged from peer assessment is likely to be better comprehended by the students involved (Sitthiworachart and Joy, 2008).
Another benefit of these activities, besides the increased amount of feed-back being circulated, is that students are able to access skills that relate to the higher levels of Bloom’s revised taxonomy such as analyse and evaluate (Carlson and Berry, 2007; Gehringer et al., 2006). The skills developed in peer review activities include critical analysis, ability to diagnose misconceptions, general evaluation skills and communication of suggestions for improvement (Gehringer et al., 2006), all of which are valuable to student learning.
Whilst peer assessment may seem like the ‘silver bullet’ of assessment
and feedback for students, there are significant criticisms of it as a technique.
One of the most important is that peer review at undergraduate level can be an example of the ‘blind leading the blind’ (Carlson and Berry, 2007).
This suggests that students who have misconceptions relating to the work propagate these misconceptions to other students and therefore damage others’
learning. Other criticisms are that students have bias during the peer marking process. They often will be more generous to their colleagues and sometimes do not take the assessment process seriously. Whilst some students accept peer feedback as being valuable some of the more cynical complain that they are ‘paying’ to be taught by experienced lecturers and want their feedback to come from them. This complaint alludes to the conception that Higher Education is becoming more and more consumer driven (Rowe and Wood, 2007; Dochy and McDowell, 1997).
Within the context of programming, peer assessment fits particularly well.
An example of a similar technique being used in industry comes from the agile methods of software development, which utilise the technique of pair programming to increase accuracy of source code developed. This technique involves two programmers sharing one computer and having to negotiate and discuss the source code as it is written. One of the more important benefits of paired programming approaches and peer assessment is that they encourage the programmers to make the source code they write easily comprehensible, particularly as another programmer is going to have to understand the source code and give feedback on it immediately.
Tools to support peer assessment in programming courses have been developed and are sometimes used to assess learning outcomes in a summative way. A majority of these tools permit students to fill out an online proforma sheet for one of their peers. Various mechanisms have been used to ensure that the feedback delivered in a peer assessment situation is fair including taking the standard deviation of particular students marks and putting a summative weighting towards accuracy of peer marking (Sitthiworachart and Joy, 2008). That is, the student marking will be assessed on how appropriate
their marks are. In most usages of peer assessment students are given some form of rubric to support them (Carlson and Berry, 2007).