Controlling for ability - Comparability and Examination Performance: Technical and Social Appro

This treatment, like the previous one, is based upon a view of ability as innate general intelligence which is predictive of achievement; an achievement is viewed as being predictive of subsequent achievements. This approach claims to take groups of students with the same distributions of ability, and assumes that all other student, school, examination, syllabus and social variables are identical for these groups. The approach not only assumes that general intelligence predicts subject performance but that when differences in the spread of the examination population’s general intelligence are taken into account, any remaining differences between examination grade

distributions in different examinations are indicative of a lack of comparability between the examinations themselves (Bardell, Forrest and Shoesmith, 1978; Forrest and Shoesmith, 1985).

One could argue that controlling for ability is less sophisticated than using the ‘same student’ approach on the basis that it only identifies two variables as being significant, general ability and achievement. However, the ‘same student’ approach’s claim to control for many variables is, as discussed, questionable. Taking students with the same distributions of innate intelligence rather than taking the same students is only more sophisticated from a psychometric perspective. Nevertheless, this treatment was most frequently used in the 1970s by Schools Council researchers (Willmott, 1980). This method controlled for intelligence using a reference ability test such as the NFER’s scholastic Aptitude Test 100. The use of such a test provided information about an examination population’s spread of general intelligence rather than about specific skills and types of knowledge (Bardell, Forrest and Shoesmith, 1978).

For example the 1968 CSE Monitoring Experiment (Nuttal, 1971) involved samples of students who sat CSE or GCE ‘O’ level in summer 1968 being additionally tested with the NFER’s scholastic Aptitude Test 100 in the preceding February and March. The GCE rather than the CSE findings are used to illustrate Nuttall’s work, as in terms of performance in national 16+

examinations, the GCE examination population is more like the GCSE populations used in the current research i.e. it involved the top 20% of the entire 16+ population. For each student in Nuttall’s sample, their total score on the NFER’s Aptitude Test 100 and the grades achieved in as many o f the ten subjects (art, biology, chemistry, English language, English literature, French, geography, history, mathematics, physics) constituted the raw data. The subject grades at this time were in integers so that the smaller the integer, the better the student’s performance. When the average test score for students was plotted against their average grade in each subject it was found that groups of students with the highest test scores tended to be those with the smallest average grades. Some sort of difference between the subjects existed and the extent of this difference was investigated using the same regression method as used by Nuttall (1971) in his 1968 CSE

Monitoring Experiment.

A standard measure of the ten subjects was first obtained, defined as the average of the average grades awarded in each GCE subject and the average of all the Aptitude Test scores. This

measure provided the average relationship between the GCE grades and the Test scores across all ten GCE subjects. It could be used to predict the average GCE grade that would be expected for any given Test score if grades in each GCE subject were awarded using the same standards. For example, the average Test score of the chemistry students in the sample for GCE board 2 was 55.0 (3.5 points better than the average Test score of the complete sample). The regression method predicted that the corresponding average chemistry grade for an average Test score of 55.0 should be 5.11, on the assumption that grades in chemistry were awarded on the same standard as grades in the other nine subjects. As the mean grade actually awarded in chemistry was 5.44, chemistry was identified as being severely graded by 0.33 grades (5.44-5.11 = 0.33). This process was repeated for each subject in turn and the results are shown in Table 3.3.

Table 3 3 Sample estimates of mean grade severity in GCE board 2 Regression method (Nuttall et a l 1974)

Subject Estimate o f severity

Art -0.49 Biology -0.14 Chemistry 0.33 English language -0.49 English literature -0.24 French 0.25 Geography 0.09 History 0.16 Mathematics 0.12 Physics 0.37

Positive values indicate a tendency towards severity of grading, while Negative values indicate a tendency towards leniency of grading.

Physics and chemistry are interpreted as being more severely graded than biology and this trend was replicated across all of the GCE boards included in the study (ibid.). Nuttall et al. (ibid.) identified bias in the reference test (the NFER Aptitude Test 100 used in the study) rather than differences between the subjects as explaining the lack of comparability of grading standards between the subjects. He argued that the nature of the items in the Test 100 was such that those students entered for mathematics or for science subjects would obtain significantly higher scores on the Test than students in other subjects simply by virtue of their having followed mathematically orientated courses (ibid.). In other words Nuttall claimed that the Test scores for the different groups of students might not be directly comparable. In this sense he challenged that ‘intelligence’

existed or could be measured arguing that the reference test itself was just another knowledge and skills test.

This highlights a fundamental problem in using such a reference test in comparing examination grading standards. There is a theoretical incompatibility between the reference test that assumes norm-referencing against general intelligence and examinations that seek to measure developed knowledge and skills related to specific subjects and involving strong criterion - referencing. The reference test assumes a psychometric view of achievement whereas GCSE examinations with their strong criterion-referencing in syllabus and examination paper construction reflect an educational assessment perspective with many achievements being attained by all

students. Using ‘ability’ measures to investigate examination comparability implies that

examinations are inappropriate for the uses made of them. Nuttall and Willmott imply as much by suggesting in 1972 with their call for a ‘single general intelligence test’ in place of public

examinations.

**Controlling for students* attainment relevant to the different examinations being compared**

In document Comparability and Examination Performance: Technical and Social Approaches to Its Study (Page 75-78)