Part III New Analysis
5. Agreement across different methods
5.2. National A-level data from 2006
5.2.3. Results from A-level analysis
Table 23: Estimates of relative difficulty of 33 A-level subjects from seven different methods
Subject
(weighted) Kelly Reference test
85
Difficulty estimates from seven different methods for each of the 33 A-level subjects are shown in Table 23. Correlations among them are shown in Table 24 and a matrix of the scatterplots of these correlations in Figure 9.
Table 24: Correlations among difficulty estimates from different methods
SPA (unweighted) SPA (weighted) Kelly Reference test Value-added Multilevel model
Rasch 0.940 0.910 0.972 0.754 0.902 0.951 SPA (unweighted) 0.951 0.979 0.899 0.963 0.982
SPA (weighted) 0.955 0.856 0.923 0.953
Kelly 0.817 0.936 0.981
Reference test 0.939 0.884
Value-added 0.949
Correlations among the different methods are generally high, with most values in excess of 0.9. One method, reference tests, stands out as having lower correlations with the others. This may be partly a result of the need to choose a specific intercept for this method; subject differences are not constant, but depend on a particular value of average GCSE score.
It is clear that the unweighted Subject Pairs Analysis agrees better with the other methods than the weighted version. This corresponds with a theoretical argument that the unweighted version is to be preferred as a way of comparing subjects (eg Nuttall et al., 1974). Similarly, the Value-added model with all variables included agrees better with other estimates than the less well specified model. Hence in
choosing a representative of each method, we have preferred the unweighted version of SPA and the Value-added model with all variables included.
Figure 9: Matrix of scatterplots of difficulty estimates from the seven methods
Value-added (all vars) Value-added
(mean GCSE) Reference test
Kelly SPA (weighted)
SPA (unweighted) Rasch overall
(grade units) Value-added (all vars)Value-added (mean GCSE)Reference testKellySPA (weighted)SPA (unweighted)Rasch overall (grade units)
Estimates of subject difficulties from the five preferred methods are shown
graphically in Figure 10. With the exception of a small number of subjects where the different methods do not agree so well (Further maths, Film Studies and Media Studies), estimates from the five methods are all within half a grade, and within a third of a grade for the majority of subjects. This compares with a difference of nearly two grades across subjects, averaged across methods. Overall, the average inter-method difference is about 20% of the average inter-subject difference.
The case of Further Maths presents something of a problem, with a difference of over one-and-a-half grades between two of the methods (Rasch and Reference Test).
Further Maths is an unusual A-level, with 58% of its 6500 candidates being awarded the top grade, A. Even more extraordinary is the fact that of those who get A in Further Maths, two thirds also get As in all their other A-levels. Overall, therefore, 39% of the candidates who take Further Maths gain no other grade than A in any A-level. This situation inevitably makes it quite difficult to compare the difficulty of
87
Further Maths with other subjects, since any statistical comparison must be based on differences in the grades that are achieved in different subjects; for a proportion not much less than half, there simply are no differences in the grades gained. In
measurement terms, the grading of Further Maths suffers from a significant ceiling effect.
Figure 10: Relative difficulty estimates of A-level subjects from five different methods, ranked by average difficulty across all methods
-2 -1.5 -1 -0.5 0 0.5 1 1.5
Film Std Photography Media Sociology Fine Art Drama Art Desg DT Prodn Bus Std English Engl Lang RE Law Engl Lit PE Geography Psych Sci Psychology Politics Economics Spanish History IT Maths Computing Music German French Maths Fur Biology Chemistry Physics Gen Std
Relative difficulty (A-level grade units)
Rasch overall (grade units) SPA (unweighted) Kelly Reference test Multilevel model
Average difficulty from all 5 methods
It may be helpful to consider how the different methods deal with this difficulty. For methods such as SPA and Kelly, the results of these candidates with straight As are taken as evidence of the equivalence of the subjects they have taken. Both these methods effectively calculate an average of the differences in the grades achieved between one subject and another. If 39% of the candidates generate a difference of zero, even if there are large differences in grades for the other 61%, the average will be brought down by the zeros. The ceiling on the recognition that can be awarded to the highest levels of performance means that achievements that may be far from equivalent are treated as equivalent in the calculation. A similar problem arises for the Reference Test and Value-Added methods, since the coding of ‘A=5’ (or any other points score) assigns a value to the top grade that treats all performance awarded that grade as equivalent.
By contrast, in the Rasch model the achievements of those who attain the top grade are not taken as indicating that their ability is at that level, but as providing evidence of ability of at least the level of the grade threshold. In fact, the model cannot
estimate the ability of anyone who has achieved an ‘extreme score’ (i.e all A grades)
and these candidates are effectively ignored in calculating the difficulties of different subjects. Only those who have failed to achieve the top grade in one of more subjects contribute information about relative difficulties. Hence the Rasch model is not so limited by the ceiling on performance in the way that all the other methods are. This may be one reason why the Rasch model gives a wider range of subject difficulties that the other models.
In the case of the Reference Test method the problem is different again. The effect of the ceiling on both Further Maths grade and average GCSE score can be seen in Figure 11, which shows scores on both these variables for a sample of the national entry, together with the regression line segment. The ceiling effect depresses the correlation (0.43) between GCSE and the A-level grade and so flattens the regression line. The average GCSE value of 6.1 that we used to estimate the likely A-level grade, while above the A-level population mean, is well below the main range of candidates in Further Maths. This fact, combined with the flat regression line, leads to the
expected grade being higher than is really representative of the difficulty of the subject as a whole. Hence the rather deflated estimate of the difficulty of Further Maths from this method.
Figure 11: Scatterplot of Further Maths grade and average GCSE
-1 0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9
Average GCSE grade
Further Maths A-level grade
Although Further Maths is a rather extreme case, a similar ceiling effect occurs in many other subjects at A-level. Overall, almost 10% of all A-level candidates achieve
89
straight A grades in all their subjects. The percentages of candidates in each A-level subject who gained an A grade in 2006 is shown in Table 25.
Table 25: Percentages of candidates gaining grade A in each A-level subject
Maths Fur 58% Music 21%
Maths 44% Psych Sci 21%
German 37% Sociology 21%
Spanish 37% Law 20%
Economics 36% Drama 19%
French 35% Psychology 18%
Politics 32% DT Prodn 17%
Chemistry 32% Bus Std 17%
Art Desg 31% Computing 16%
Fine Art 31% Film Std 16%
Physics 30% English 15%
RE 27% PE 14%
Engl Lit 27% Engl Lang 14%
Geography 26% Media 13%
Biology 26% Gen Std 12%
History 24% IT 8%
Photography 23%