Chapter 3: Methodology
3.4. Data collection
3.4.1. Data from the BAWE corpus
The BAWE corpus comprises a set of texts written for assessment purposes by both undergraduate and postgraduate students on taught programmes within the UK higher education system (Alsop and Nesi, 2009, p. 71). The majority of texts within the BAWE corpus are written by ‘native speaker’ students; however, all texts included were required, irrespective of the first language of the contributor, to be assessed as ‘proficient’ by academics from the relevant discipline (ibid.). The rationale behind the construction of the BAWE corpus was to provide a resource for research into features of successful discipline-specific student writing, providing ‘strong quantitative insights into student writers’ use of grammar, lexis, and discourse patterns across disciplines’ (Nesi et al., 2004, p. 443, in Alsop and Nesi, 2009, p. 72), for the purpose of informing academic writing tuition (pp. 71-72). Prior large-scale studies of writing in a university context such as the PERC Corpus of
74
Professional English, the TOEFL 2000 Spoken and Written Academic Language Corpus, and the International Corpus of Learner English, had focused, in the case of the first two listed, on ‘published or publicly accessible texts’ or, in the case of the last, ‘learner essays on general academic topics …designed primarily to monitor non-native-speaker errors and the processes of language acquisition, rather than the development of academic literacy skills and disciplinary knowledge’ (ibid.) For this reason the BAWE corpus serves to fill a gap in terms of corpus resources, its aim being ‘to enable the identification and description of student writing genres across disciplines and at different stages of academic development’ (p. 72). Before the completion of the Michigan Corpus of Upper-level Student Papers (MICUSP, 2009) it was ‘the only formally planned and archived corpus of its kind’ (p. 72).
Texts for the BAWE corpus were collected between 2005 and 2007 primarily from Warwick University, Reading University, and Oxford Brookes University with some collected from Coventry in the later stage of the collection process (p. 73). A ‘matrix’ consisting of four disciplinary categories, Arts and Humanities, Life Sciences, Physical Sciences and Social Sciences, and four levels of study, from first to third year undergraduate and fourth-year for one-year post-graduate master’s programmes, was used to structure the corpus organising it into ‘sixteen cells of approximately equal size’ for the sixteen individual disciplines included (ibid.). Disciplinary categories were chosen to enable easy comparison with important corpora of spoken academic English, the Michigan Corpus of Spoken Academic English (MICASE) and the British academic Spoken English (BASE) corpus which use very similar categories (ibid.). Assignments, excluding master’s theses, with both a formative and summative purpose which had received a mark of at least sixty per cent from their source department were collected (p. 74). Each text included in the corpus was assigned a ‘genre family’ label from the categories of case study, critique, design specification, empathy writing, essay, exercise, explanation, literature survey, methodology recount, narrative recount, problem question, proposal, and research report (Nesi, 2008). Information included regarding the first language, number of years of UK secondary education, whether the assignment was categorised as ‘merit’ (equivalent to upper second class) or ‘distinction’ (equivalent
75
to first class) can be used by researchers accessing the BAWE to create a sub-corpus fitting their particular requirements (ibid.).
3.4.2 Data from my own institution
Texts from third-year students in disciplines at my own institution were collected in order to construct two disciplinary sub-corpora equivalent to those from BAWE. As discussed above my aim was to collect texts from students who were performing relatively highly by their final year of study, and I achieved this with the large majority of texts collected having grades of at least 60% with a very large proportion of these having grades in the ‘high 60s’ and a significant minority in the 70s. This means that in terms of the sub-corpora from my institution being composed of ‘good’ student work, there will be equivalency with BAWE.
Collecting data from my institution was both challenging and time-consuming. I made the decision to start relatively early with this process based on anticipated difficulties learned from experience as an EAP tutor attempting to communicate with academics in departments needing EAP support and also seeking out example student writing to better inform development of teaching materials. Academics, although usually positive about and supportive of what I was trying to do, very often proved unreliable in terms of following up on promises of cooperation or responding in a timely fashion to emails. I had also found it very difficult to obtain single examples of writing from students, so expected that it would be a considerable challenge to collect student work on the scale that I needed.
The challenges in collecting such data are described with reference to the BAWE project by Alsop and Nesi (2009) who say of the pilot corpus for this project that it ‘illustrated the difficulty of collecting a representative selection of work from a shifting student population, who produced varying amounts of writing at various stages of the year, and who had relatively little incentive to cooperate with our research agenda’ (pp. 72-73). They describe an evolution in the publicity strategies
76
they employed over the course of collection of texts for the BAWE (pp. 77-80) and also the compromises that had to be made due to failure to achieve original targets in some areas which included the need to go beyond one institution to collect enough texts for certain disciplines (p. 74), and the need to ease restrictions in terms of how many texts a single student could submit (pp. 75-76).
My data collection process spanned just over a year taking in the finishing third-year cohorts of 2011 and 2012, and, due to very little success in 2011, I changed my strategy in 2012. To find students from PIR I initially emailed a senior academic whom I had communicated with reasonably regularly in relation to an ESAP course I ran for international master’s students. The email was forwarded to administrative staff to disseminate to students and I heard nothing more. Alsop and Nesi (2009, p. 77) state that ‘[a]n e-mail with departmental endorsement was assumed to hold more weight than an e-mail directly from [the BAWE team]’. I would agree but go further in arguing that unless students are communicated with directly by an academic whom they respect, ideally in person rather than via e-mail, they are unlikely to respond (I collected texts from Marketing students in the School of Management, which I will not be using for this project, and this proved considerably easier because of a supportive academic who selected and communicated with students directly). I obtained one student volunteer through my contact in PIR who was directly introduced to me at an orientation event I attended as part of my job.
For History, I initially managed to find two participants quite quickly through a personal contact who was studying as an adult student in the department. I then got in touch with an academic within the department who had been recommended as likely to be supportive, and over the course of three months exchanged emails, met for coffee to explain my project, attended one of his tutorials to meet four of his students, all of whom agreed to take part, followed up on this meeting by emailing those four students on three separate occasions, before having just one of these students ultimately participate in the project.
77
I did not initially offer any kind of payment for participation in the project. However, having only collected texts from two students in History and one in Politics in almost a year I decided to offer book vouchers as payment for participation. I also changed tack and contacted the student executives of both the History and PIR societies on campus. Both these changes in approach proved fruitful. All students I emailed responded and then, through some of these students, I was able to get recommendations for others whom they thought would be suitable. I sent three waves of emails to all students I had contact details for. The gap between spring and summer term seemed to be the most effective timing in terms of getting responses, these then tailed off through May and early June during the examination period, with a few final students responding in the period directly after exams. Overall, roughly half of those I emailed participated and donated texts.
Collecting data from each student involved setting up a face-to face meeting for an interview and students generally emailed me their essays or downloaded them from Turnitin from my office computer on the day of the interview. Collecting all related information, such as grades and course names was not always easy often involving follow-up emails not all of these responded to. There are two students I do not have specific essay grades for, but am confident that both these are ‘2:1/1’ students; both of these students were recommended to me by academics as ‘good students’.
Having collected the texts from students at my own institution in Microsoft word form, I then converted them to plain text files and ‘cleaned’ them for use in the sub- corpora. Cleaning involved placing chevrons around Harvard references, finding and deleting footnote numbers within the texts, and deleting titles, footnotes and references/bibliographies.
78