The technical and social dimensions of developing assessment systems: emergent tensions
23.2 Differentiation within a common examining system
2.4.2 Referencing systems
Criterion-referencing with grade-related criteria was originally an objective for the GCSE (Joseph, 1984a, 1984b) but remains an illusive goal (and in 2008). Alongside an educational argument, concerns for comparability of examination grades between different syllabuses and between different examining groups created support for criterion-referencing with grade-related criteria being applied to the GCSE (Orr and Nuttall, 1983). The interplay between the proponents and antagonists of criterion-referencing linked to grade criteria for the GCSE exemplifies the technical and social dimensions of developing national assessment systems. Of relevance here is a
discussion of the referencing systems that awarders use to decide the relative merits of, and thus the grades awarded to, students’ examined work. The referencing systems’ terms are often confused (AEB, 1995). Conventionally, criterion-referenced tests are meant to measure the degree of competence attained by a particular student on a profile of attainment (Glaser, 1963). In such tests the assessment domain is specified in detail and interpretations of the student’s performance are made against a profile of possible attainment for the assessment domain. As stated in Glaser's seminal paper on criterion-referenced testing:
Measures which assess student achievement in terms o f criterion standards thus provide information as to the degree o f competence attained by a particular
student which is independent o f reference to the performance o f others. (Glaser, 1963, p. 520)
Criterion-referencing assumes a conventional numerical scoring process at the level of individual questions. However, the questions are selected so as to be representative of the assessed domain. In that way the score obtained by a student can be interpreted as that student’s expected attainment on the entire domain (hence the need to use a well-defined domain in criterion-referencing). A major intention of conventional criterion-referencing is to provide formative information. In summative forms it is intended to provide users of assessment information with an understanding of what students know and can do.
In its original sense, norm-referencing means standardizing, i.e. identifying each student’s test or examination score within the distribution of attainment of the student’s peers as in the intelligence tests discussed earlier in this Chapter. Conventionally, and in contrast to criterion- referenced tests, norm-referenced tests do not specify the assessment domain in detail. The questions in norm-referenced assessments are not representative of the assessment domain as a whole, although they are assumed to be predictive measures across domains. The rank ordering provided by norm-referenced testing, 'only indicates an individual’s success in relation to their peers and not in terms o f the knowledge, skills and understanding achieved by that individual ’
(Murphy et al., 1996, p. 62).
The terms criterion- and norm-referencing are often assigned meanings other than then- conventional ones. With the introduction of the National Curriculum in England and Wales the alternative meaning of criterion as standard came into use. Brief verbal statements acting as standards in terms of particular competencies were developed and called statements of attainment, and applied across the 5-16 curriculum at Key Stages defined by age (discussed later in the chapter) (DES, 1989). This approach is often termed ‘strong criterion-referencing’ because of the strength of the descriptive inferences about students’ attainments that it claims to make possible (Cresswell and Houston, 1991). When this approach is used, verbal descriptions replace numerical scores. Concerns about the precision of meaning of such verbal descriptions and the accuracy of their
application have been voiced (Sadler, 1987; Ruddock et a l, 1993; Wolf, 1993), particularly in the context of the assessment of the National Curriculum in England and Wales. Furthermore,
controlling the representativeness of the questions so that a student’s performance on them may be interpreted as the student’s expected attainment on the whole assessment domain does not occur in strong criterion-referencing in contrast to its conventional counterpart.
As Christie and Forrest (1981) have argued, national examinations usually have a clearly discernible assessment domain and, in the GCSE question setting process, effort is made to ensure that the examination as a whole represents it effectively. Assessment grids giving a breakdown of the rationale used for sampling the syllabus in the construction of the papers are often provided by examining groups in their syllabus regulations (an assessment grid is shown in Appendix 1). GCSE syllabuses do not define assessment domains with the precision required by conventional criterion-referencing. The question papers are related to broad areas of knowledge, so that there is potential for considerable variability in relation to the constructs being assessed year on year and between examination consortia. In this sense the construction of the examinations is based on strong criterion referencing. Examining groups provide brief descriptions of performance by grade, and what distinguishes them, with their syllabuses. However, in awarding grades the GCSE places an emphasis on numerical scoring of questions, uses aggregation of component numerical scores and rank-orders students’ total numerical scores for a particular syllabus. An approximate normal distribution of total numerical scores is expected and proportions of students falling into numerical score ranges are also anticipated to approximate those of previous years. In terms of an emphasis on quantification and a normal distribution of testing outcomes the norm-referenced traditions of the psychometric paradigm are evident in these aspects of the GCSE.
Grades are not, however, arbitrarily assigned to numerical mark ranges in the GCSE. Grade awarders bring value judgments to the grade awarding process. These value judgments are supported by reasons (Fogelin, 1967; Beardsley, 1981) based upon the use of tacit standards held by them as a ‘guild of professionals’ (Sadler, 1985,1987, 1989). In this evaluative process, marks are considered in terms of how they represent the guild’s (ibid.) view of the value of particular grades in that assessment/subject domain. Of relevance here is the notion that the GCSE cannot simply be regarded as norm-referenced because it has aspects that emphasize rank-ordering of
numerical scores with anticipated approximate normal distributions. For this reason, national examinations such as the GCSE are often erroneously said to be norm-referenced.
The nature of the assessment referencing system for GCSE might have been different if the Schools Council and the government of the 1980s had had their way. Prompted by a concern that grades in national 16+ examinations were not comparable across different syllabuses, the Schools Council in 1979 strongly recommended the development of national agreed definitions of standards of work in these examinations. By 1980 the government was instructing the examining groups to begin work on defining national 16+ examination grades in terms of performance:
Consideration should also be given to the possibility o f incorporating (in the national criteria) some elements o f criterion-referencing o f grades, or some grades in the 7-point scale. This might help certificates to be
more informative for users about the things candidates [students] have shown they can do and go some way to free the award o f grades from statistical
norms of quality or performance change over time.
(Department of Education and Science, DES, 1980) In the conventional sense a grade criterion is an attribute to be assessed (Christie and Forrest (1981). The Schools Council (1979) and the DES paper of 1980 regarded it as a standard based on a view that grade criteria should be written statements that prescribe the level of attainment
required to justify the award of a particular grade (Murphy, 1986).
Murphy (ibid.), commenting on the developments subsequent to the DES (1980) document shown above, argued that the challenge of grade criteria is thereafter avoided. First it was avoided by the Joint Council for National Criteria, a body set up by the GCE and CSE examining groups / boards. This Council redefined the task that they had been given by the Department by
distinguishing between 'criterion-related grading’ and 'grade descriptions' (Joint Council for National Criteria, 1981). Under the former, students would be required to demonstrate
predetermined levels of competence in specified aspects of the subject in order to be awarded a particular grade. This was the intention in the Scottish 16+ common examination / certification system examined for the first time in 1986 and is referred to later in this section. Grade
attainment likely to be shown by students awarded particular grades in a subject (ibid.). Gipps (1990) argues that this redefinition was prompted by the Joint Council's concern about the technical problems associated with a national grade related examination system. Whether or not this was true, the DES also came to accept the more limited aim of producing grade descriptions:
Grade descriptions, as outlined above, may prove to be a step towards a longer term goal. The Secretaries o f State have asked the boards [GCSE examining groups] to set themselves the objective o f making the award o f all grades conditional on evidence o f attainment in specific aspects o f a subject.
(DES, 1982)
As Murphy (1986) notes, nothing more was heard of grade criteria for a couple of years until, in a bold last-ditch attempt, the Education Secretary, Sir Keith Joseph, tried again to inject them into the final stage of the development of the new examination because he saw this as improving standards in schools. He viewed the use of grade criteria in criterion-referenced assessment as being supportive of positive achievement, a central tenet of the government’s proposals for the new examination. Examination grades equated to absolute standards of competence, skill and understanding for the attainment of students of different abilities would facilitate teachers and students working towards new targets (Joseph, 1984a, 1984b). Scotland was ahead of England and Wales in this respect. Changes in the 16+ examining system took place in Scotland for a limited number of subjects with the first criterion-referenced examinations taking place in 1986. The intention was to provide more useful information about students' achievements for both students and their teachers. However, in attempting to identify the content and skills that students could be assessed on and the different degrees of mastery that might be demonstrated, the complexity of the resulting system made it unworkable, prompting a radical simplification.
Popham (1987) warns of this scenario.
Given the Scottish 'experience' of grade criteria, it is hardly surprising that Sir Keith Joseph's enthusiasm for its speedy introduction for the GCSE was not shared by those charged with its development, the Secondary Examinations Council (SEC).
The rigorous specification offull criterion-referencing for assessment in th e .... [new common examination] would result in very tightly defined syllabuses and
patterns o f assessment which would not allow the flexibility o f approach that characterizes education in this country.
Nevertheless Council agreed with the DES that a move towards a greater degree o f explicitness was desirable ...
(Secondary Examinations Council, 1984, p. 2)
In the run up to the examination’s implementation, DES publications became more vague about when grade criteria would emerge:
... the proposed examination will be designed, not fo r any particular proportion o f the ability range, but for all candidates [all students taking the
examination], whatever their ability relative to other candidates, who are able to reach the standards required for the award o f particular grades. Grade criteria are being developed for this purpose and will be incorporated into the subject criteria and
syllabuses as soon as practicable.
(DES, 1985)
Secondary Examination Council (SEC) circulated draft grade criteria for consultation prior to the implementation of the examination but the time frame for comment and subsequent redrafting was very short.
Thus, as the GCSE first became a reality for teaching purposes in 1986 and subsequently for examination in 1988, grade criteria were still in development. Syllabuses were linked to grade descriptions for individual subjects. The grade descriptions describe a representative attainment worthy of the grade, rather than the attainment of every student awarded the grade and cannot, for the reasons discussed above, be used as criteria forjudging the attainment o f all students. They merely ‘convey the flavour’ of a grade (Wilmut and Rose, 1989).
Criterion-referencing with grade criteria prompted by various groups’ aspirations to: provide more valid and useful information about students’ attainments;
facilitate the setting of learning targets;
provide a means of ensuring greater comparability of grading standards between different syllabuses and assessment domains/subjects by various examining groups,
remains an illusive goal for the GCSE. These aspirations emanating largely from the educational agenda concerned with construct validity and the social justice of assessment raised challenges for the system’s technical dimension that could not be met. Consequently the technical dimension dominated emerging practice and ensured that the issue of central concern to the current research, comparability of grading standards, was, and continues to be, problematic and a significant concern in assessment discourse and debate.