• No results found

3 Grammar and errors

3.2 Reactions to learner errors

Language learners may produce systematic errors in grammar, some of which can be explained by first-language influence. However, the order of acquiring grammar is not dictated by the order in which grammatical structures are presented to learners or how frequent they are (Schachter 1998, 558; see also Ortega 2009, 51-52). It can be difficult to identify the source of a particular error (Ellis 2002b, 27), and learners’ ultimate understanding of concepts in a language may not match that of native speakers of the language. Furthermore, it can be very difficult to determine “what an error is an error of” (Gass and Selinker 2001, 82). Error analysis research, popular mainly in the 20th century, has shown that learners of dissimilar language

backgrounds tend to make similar errors in learning a particular second language (e.g. H. D. Brown 2007a, 257; see also Ellis and Barkhuizen 2005; Saville-Troike 2012). This indicates that there are some universal tendencies in second language learning (see Section 4.4), but the learner’s L1 strongly influences second language acquisition as well. One of the problems with applying error analysis is the fact that the data do not include the items that learners consciously or subconsciously avoid (e.g. H. D. Brown 2007a, 259; see also Sections 2.3.2 and 4.3); it is difficult to say much about items that are not used at all.

This section on errors is divided into two parts. Section 3.2.1 discusses the nature of errors and their origin from the learners’ perspective, while Section 3.2.2 focuses on the position of errors in teaching and testing and examines inter-rater differences. For an extensive discussion on whether errors in grammar should be corrected in classroom settings, see e.g. Ellis et al. (2008).

3.2.1 Errors in learning

As we saw in Section 3.1, learners often expect error correction (or negative feedback, a term used by e.g. Ortega 2009), but some teachers are unwilling to provide it. Nonetheless, errors are both natural and inevitable in any learning process (Ellis 2002b, 22). H. D. Brown (2007a, 257-259) differentiates between mistakes and errors, arguing that mistakes are random lapses or slips made by both native speakers and learners. They do not show any lack of command per se; rather, they are momentary breakdowns in language production, and people can notice these themselves and correct them (Gass and Selinker 2001, 78-79). Errors, then, are the systematic display of “noticeable deviation from the adult grammar of a native speaker” (H. D. Brown 2007a, 258; see also Alanen 1997; Gass and Selinker 2001). They indicate that the learner has made assumptions about the second language that are unlike the practice of native speakers. However, it can be difficult to ascertain whether a particular unusual item is an error or a mistake, and often only a more systematic study of patterns is needed to distinguish between the two. Today, there is growing awareness that errors are “a clue to the active learning process being made by the student as he or she tries out ways of communicating in the new language” (Yule 2014, 191). The way errors are understood in my study is discussed in Chapter 7. Some scholars distinguish between overt and covert errors. Overt errors take place at the sentence level and result in ungrammatical sentences, whereas covert errors are discourse level errors: they are grammatically correct but used in unsuitable contexts (H. D. Brown 2007a, 260; Ellis and Barkhuizen 2005, 56; cf. the difference between grammaticality and acceptability in Section 3.3). Furthermore, errors can be either global or local: global errors make communication difficult, while local errors only show a minor disturbance in communication. In addition, an error can be the addition of an unnecessary element or the omission of a necessary element (H. D. Brown 2007a, 262-263; see also Ellis and Barkhuizen 2005). If learners fail to recognise their own errors and do not take advantage of potential feedback (cf. Ellis et al. 2008), this can result in fossilisation (e.g. Finegan 2004, 561; Long 2003; Thornbury 1999). The term refers to the persistence of an erroneous feature in perhaps otherwise fluent use of the second language. This is readily witnessed in the case of learner accents as well as persistent grammatical or lexical errors. Items become fossilised if students do not learn to correct their errors through the feedback they receive, either because they are not given constructive feedback or because they fail to pay attention to it (Finegan 2004, 561).

Fossilisation is never global. Instead, it is always limited to some linguistic phenomena, for example inflectional morphology, and the learner can excel at another aspect of language, for example syntax (Lardiere 2013, 685-691). Fortunately, however, fossilisation is not “some sort of terminal illness”, and the learner can still make progress and ultimately acquire standard usage (H. D. Brown 2007a, 270; see also Ortega 2009). Thornbury (1999, 117) argues that some explicit attention to grammar (‘focus on form’) is necessary to ensure that fossilisation does not take place. For an overview of research on fossilisation, see Han (2011) and Long (2003).

3.2.2 Errors in teaching and testing

Teachers treat learners’ errors in grammar in different ways (for an overview, see e.g. Ellis and Barkhuizen 2005, 173-175; Thornbury 1999). Teachers vary in their leniency as well: for some raters, any error is an error, while others create scales that treat some errors as graver than others. Furthermore, the same error may be evaluated very differently depending on the context. This variation in teacher approach to errors is one of the research questions in my study (see Chapter 6). For various approaches to treating errors in teaching, see e.g. Brown and Larson- Hall (2012), Ellis (2006), Keck and Kim (2014) and Ortega (2009); for the debate on the benefits of corrective feedback in instruction, see Ellis et al. (2008); for a meta-analysis on the effect of corrective feedback, see Russell and Spada (2006); and for an overview of the history of research on assessing learner knowledge, see Norris and Ortega (2011) and Saville-Troike (2012).

Some studies have found that native and non-native raters of learner language may react differently to non-standard phenomena, so that non-native raters focus more on adhering to norms, while native raters are more concerned with comprehension. For example, Hyland and Anan (2006) explored different raters’ perceptions of error, using one Japanese EFL student’s writing as the source and asking both native English teachers, Japanese EFL teachers and native English non-teachers to identify and correct the errors in the text and to give their reasons for the corrections. The results indicate that native English speakers were more lenient than Japanese teachers in grading the errors (Hyland and Anan 2006, 512). The Japanese teachers were more likely to employ the criterion ‘infringement of rules’ (e.g. inappropriate application of a rule), while native English non-teachers tended to use the criterion ‘unintelligibility’ (e.g. ambiguity or flow hindrance); English teachers employed both criteria.

In the study, Japanese teachers found agreement and word form errors the most serious, while native English speakers felt that word order errors were the most serious (Hyland and Anan 2006, 513). The Japanese teachers also found many more non-target errors (i.e. errors that the researchers had not anticipated) than the native English speakers, and the Japanese teachers were less consistent in their grading than the native English speakers (Hyland and Anan 2006, 517). The non-target errors included acceptability violations that can be categorised as having either a stylistic (levels of formality), discourse (cohesion and organisation) or semantic (lack of clarity) focus (Hyland and Anan 2006, 514). The researchers argue that the Japanese teachers’ lack of exposure to different registers and their lack of confidence would explain why they take “a prescriptive attitude to correctness and a reluctance to accept non-standard forms” (Hyland and Anan 2006, 515). However, the researchers acknowledge that this may reflect cultural differences in teacher expectations and believe that teachers should be “more aware of the distinction between grammatical error and stylistic difference” and that more attempts at harmonisation need to be undertaken so that the same performance would not be assessed in varying ways (Hyland and Anan 2006, 518). While full agreement is impossible, rater training can minimise the effect of raters’ overall tencency to be either lenient or strict (Kondo-Brown 2002, 4). For a discussion on harmonisation and standard-setting, see Fulcher (2016).

Although Hyland and Anan (2006) discuss the differences between Japanese and native English teachers’ rating of errors, they only do so at the level of the number of errors spotted and do not discuss the nature of variation in the teachers’ acceptance of such errors, and while they comment on inter-group differences in marking criteria, they do not discuss intra-group variation. In contrast, my study provides an insight into the variation that exists within teachers’ assessment.

If different raters react to the severity of errors differently, so do learners and their teachers in different contexts. A study by Bardovi-Harlig and Dörnyei (1998) showed that EFL learners and teachers and ESL learners and teachers reacted differently to errors, depending on the nature of the error. In the EFL context, both learners and teachers found grammatical errors more serious than pragmatic errors, whereas the opposite pattern prevailed in the ESL context. Bardovi-Harlig and Dörnyei (1998, 247-252) argue that EFL learners and teachers focused more on structural accuracy, while ESL learners and teachers paid more attention to situation- specific appropriateness. However, both of the learner and teacher groups were successful at identifying errors; they simply rated their seriousness differently. Bardovi-Harlig and Dörnyei (1998) suggest that the difference may reflect ESL learners’ greater access to relevant, authentic

input, as EFL learners may not have sufficient access to situations of authentic language use. Therefore, Bardovi-Harlig and Dörnyei (1998, 255) maintain that instruction should focus more on pragmatic awareness.

Many language tests focus on testing grammar (e.g. Thornbury 1999). What makes testing problematic is the fact that raters do not always behave according to expectations. Thus, “rating always contains a significant degree of chance” (McNamara 2000, 37). There are always borderline cases, and raters may disagree on whether a student’s production passes the crucial threshold or not; raters are not always self-consistent, either. Indeed, according to McNamara (2000, 38), “[r]esearchers have sometimes been dismayed to learn that there is as much variation among raters as there is variation between candidates”. Obviously, there are ways to alleviate the effects of variation, such as rater training and moderation meetings, where disagreement is discussed, consensus is sought and attention is given to how specific criteria are to be interpreted. There are also statistical tests that can be used to analyse inter-rater reliability (e.g. Salkind 2006; 2008). Nonetheless, even despite rigorous attempts at objectivity and harmonisation, intensive training and clear instructions, testers can still deviate from one another, for example in their level of leniency or their tendency to focus on particular aspects of the phenomenon being tested (e.g. Kondo-Brown 2002). This can also be influenced by the testing context, culture and goals. For a discussion on concepts related to harmonisation and standards-based assessment, see Fulcher (2016).

Raters’ leniency or severity has been investigated by, for example, Huhta et al. (2014), who note that, inevitably, some raters are more lenient than others when rating students’ performance, and some raters may be inconsistent in their assessment. For example, Huhta et al. (2014, 312-33) found that the difference in rater severity was half a scale point on a six-point scale, and 1.5-2.5 levels on a 10-point scale. The researchers also found that raters used scale- external criteria in their assessment when they felt that further criteria were needed. Some raters had more aberrant ratings than others and some raters were biased for some task types or particular rating scales, although no clear pattern was apparent and the rating seemed quite idiosyncratic (Huhta et al. 2014, 315). However, because removing a rater who significantly differed from the others did not change the overall ratings, “there appears to be no reason to consider removing raters simply because of their severity or leniency” (Huhta et al. 2014, 313). Furthermore, differences in ratings may also be a sign of difficulty in interpreting the scale. Huhta et al. (2014, 319; see also Kondo-Brown 2002) call for even more systematic benchmarking across countries and languages and highlight “the importance of having multiple

ratings of the learners’ performances, as an individual rater’s personal approach to rating a particular task (or scale) may bias the results”. For this reason, my test includes thirteen raters. The extent of variation among the teachers in my study is discussed in Chapters 7, 8 and 9. Another study on rater bias by Kondo-Brown (2002) found that raters’ bias patterns were not uniform and that raters tended to treat some candidates and criteria more leniently than others despite the fact that they were often self-consistent and that their scores correlated with those of the other raters. The greatest difference in severity vs. leniency was found with the best- achieving and the low-achieving learners, and all raters had a unique bias pattern: one was harsher when rating vocabulary, while another judged errors with mechanics more harshly and the third rated content errors more harshly (Kondo-Brown 2002, 22). The raters discussed discrepancies after the first rating round, and the differences became slightly smaller in the following rounds, which means that harmonisation attempts slightly decreased the gap between the lenient and strict raters (Kondo-Brown 2002, 18). However, the study only included three raters, which is too small a number for any generalisations, and the texts the students wrote were often very short, sometimes only one sentence. Nonetheless, Kondo-Brown (2002, 25) suggests that rater training would help raters become self-consistent, although it cannot fully eliminate inter-rater differences.

3.3 Grammaticality and acceptability