Reflection: ways forward and a shift in my theoretical position

Inferential statistics

4.9 Summary and ways forward

4.9.2 Reflection: ways forward and a shift in my theoretical position

This Chapter and Chapter 3 have consistently shown that technically, investigating comparability is hugely problematic. Despite the numerous technical treatments that can be used, it is simply not possible to control for the varied influences on examination performances that can affect the validity of performance comparisons. In my view this is because an examination performance cannot be treated as an entity divorced from what I see as ‘unquantifiable’ forces that shape it, for example student motivation. As I have been interpreting my technical findings, I have been increasingly drawn to consider social aspects of the assessment process and to reflect on my own experiences as a GCSE examiner and teacher entering my students for GCSE examinations. The rest of this section follows this line of my thinking and recollections.

In my technical investigation I attempted to increase the validity of comparing students’ science performances by taking populations consisting of students entered for the same tier of available grades in all three science subjects. The accuracy with which teachers can predict students’ national examination grades has been well-investigated (Murphy, 1979, 1981; Petch, 1964; Sowell,

1970) and consistently shows a reasonably high level of agreement between teachers’ predictions and actual awarded grades, although arguably this could be self-fulfilling. Rather less well researched is the accuracy with which teachers enter students for appropriate tiers in differentiated examinations (Good and Cresswell, 1988d). Good and Cresswell (1988d) report that there may be a considerable number of inappropriate entries to GCSE examinations that use differentiated papers and Gillbom and Youdell (1998) show that this is particularly true for Black students. Stobart, Elwood and Quinlan (1992) identified a tendency for girls to be entered for tiers in mathematics GCSE examinations that

were not commensurate with girls’ mathematical ability. Intermediate rather than higher tiers tended to be allocated as a ‘safety first’ approach, reinforcing a perception that girls were less confident of succeeding than boys.

From the literature and my experiences as a teacher and GCSE examiner I know that

inappropriate tier entries occur. I have worked in schools where teachers ‘play safe’ and tend to enter students for tiers in which grade C is the top available grade, fearing that entry to a higher tier risks students succumbing to the ‘floor’ effect and failing to achieve a grade. I accept the possibility that some students in my populations may have been entered for inappropriate tiers of examination papers. For example, in one particular subject a student may be capable of achieving grade A but has been entered for a tier that only gives access to grade C. In other subjects this condition may not apply. Comparisons of such performances might lead one to suggest that a student is only capable of grade C in one subject but a higher grade in another subject. This condition could apply for an unknown number of such students and questions the validity of comparing students’ performances even when they are selected, as I have, from the same tier of subject examination papers.

If I am to understand better the notion of ‘gradeness’ and so continue my exploration of examination comparability, the way forward does not seem to lie within a technical but rather a social approach. My technical findings, despite my caveats about their validity, have illuminated

‘gradeness’. I could use them as a resource for my continuing exploration as exemplified by the following. The significant positive correlation between physics and chemistry and the least positive correlation between biology and physics for both WJEC and SEG datasets and the consistently most positive correlation of mathematics with physics could all be explored for associations with

assessment artefact issues. Another example comes from my finding that there appeared to be a change in the severity of grading of the Triple Award science subjects from 1995 with the first WJEC and SEG examination groups’ examinations set on the national curriculum. Simultaneously, physics became significantly ‘easier’ and chemistry became significantly ‘harder’ and this was associated with significant changes in the individual science subject cognitive skill demands of the WJEC examination papers. Additionally, the 1995 correlation coefficient values for the biology, chemistry and physics

pairings showed more dissimilarity with each other as well as with the values for the 1993 and 1994 populations. The standard deviation values also became more similar for the three science subjects for the 1995 examinations compared with those of 1993 and 1994. All of which indicates the potential usefulness of exploring the nature of the actions causing these changes. They could be due to

assessment artefact changes. For example do the syllabus contents reflect changes in how physics and chemistry Eire variously defined - and can any such changes be aligned with socio-economic and political pressures, as could be claimed for physics if it were being made ‘easier’ to stem the decreasing numbers of students studying it post 16?

I choose not to focus primarily on assessment artefacts or on examining group policies and actions for associations with my technical findings as the literature and my teaching Emd examining experiences have made me more interested in the assessment process as it is played out in schools. The genesi s of this research lay in anecdotal evidence from teachers - their concerns about differences in the severity of grading of biology, chemistry and physics. As an ex-teacher I remain in contact with many teaching colleagues who would be willing to give me their time for extending my investigation of comparability. Engaging with teachers would enable me to explore their beliefs and practices in relation to assessment - if and how they mediate assessment and the relationship this has with comparability. For example, I could explore my finding of mathematics being more positively correlated with physics than chemistry or biology by obtaining their views on the mathematical demands of physics and how these views play out in their actions in relation to students’ assessment. I could explore teachers’ views of what is important in allocating their students to specific tiers of GCSE entry and in turn how this impacts on students’ performances. Teachers’ views on the impact of examining new syllabuses in 1995 could illuminate my findings of a significant change in the associated examination cognitive skill demands and the simultaneous changes in severity of grading of physics and chemistry. My kappa findings indicate a fair measure of ‘subject gradeness’ - that there is fair agreement of obtaining the same grades in the different science subjects and that this is most likely for physics and chemistry. I could extend my exploration of ‘subject gradeness’ by identifying

whether teachers support it as a notion, the ways in which they understand it and how it plays out in

their practice in relation to their students’ assessment. In doing this, I would also be exploring teachers’ views of the relative difficulty of biology, chemistry and physics and relating these to my severity of grading findings. Furthermore, my technical findings have shown that different meanings can be attributed to comparability. What is used as a descriptor of comparability can vary according to the statistical treatment used and the validity attributed to the treatment by various persons. For example subject pair analysis is an acceptable measure of comparability by examining groups; the percentages of achieved grades for sub-groups widely used by AQA is not an acceptable measure of comparability for some researchers such as Gorard (Gorard et al., 1999). I could explore teachers’ understanding of comparability and seek evidence of how this relates to their practice and affects their students. By engaging with teachers I can extend my understanding not just of comparability but of the social nature of the assessment process itself.

It is at this point that I came to revise my theoretical position. Reflecting on my time as a teacher I recalled what it was like to teach - to function within the classroom, within a school. I did not recall how I prepared and entered my students for GCSE science examinations being consciously influenced by ‘this’ school policy or by ‘that’ out-of-school issue. Rather I recalled a myriad of interactions with different types of staff, parents and students and my teaching and assessment practices emerging from this amorphous milieu. I therefore looked at other theoretical positions appropriate to my intention to explore teachers’ beliefs and actions in their classrooms for extending my understanding of examination comparability. The next Chapter describes this journey towards my adopting a sociocultural approach for the qualitative dimension of my research.

CHAPTER 5 The qualitative investigation: theoretical position and research design

In document Comparability and Examination Performance: Technical and Social Approaches to Its Study (Page 174-178)