Assessment of learner production - Language Learning Tasks and Automatic Analysis of Learner La

Despite the controversy over the efficacy of oral or written output as part of the language acquisition process (Brown, 2007: pp. 293, and 297–299 and Ellis, 2003: p. 110–115), recent work showed that learner production helps learners gain con- sciousness of their command of the language being acquired, which enables them to build up a coherent set of knowledge (Swain, 2005, 2000; Swain and Lapkin, 1995; de Bot, 1996). On the basis of this research, Brown argues that learner production in the target language can help learners realise “erroneous attempts to convey meaning” and, through that, recognise their linguistic shortcomings (2007: p. 298).

Brown (2007: pp. 255–257) asserts that language learning is “a process of the cre- ative construction of a system in which learners are consciously testing hypotheses about the target language”. This inherently implies that the learner makes mistakes. However, the difficulty of learning a foreign language can be overcome by using a “concerted strategic approach”. This strategic approach includes assessment and a “trial and error ” strategy (Brown, 2007: p. 273–275, original italics). Learner production is corrected and evaluated in instruction contexts, and research shows that learners expect and wish to receive feedback (Brown, 2007: p. 274–276, Chandler, 2003: p. 270), despite the controversy over the efficacy of feedback (Chandler, 2003; Truscott, 2004; Bitchener et al., 2005).3

A reasonable position in this respect is found in (Brown, 2007: p. 273):

Historically, error treatment in language classrooms has been a hot topic. [First] errors were viewed as phenomena to be avoided by overlearning, memorizing, and “getting it right” from the start. Then, some methods [...] took a laissez-faire approach to error [...]. CLT approaches, including task-based instruction, now tend to advocate an optimal balance between attention to form (and errors) and attention to meaning.

Following Brown, we assume that error correction is needed, as long as it is adequate to learner style and level. In this sense appropriate types of feedback should include positive, neutral or negative feedback, and affective and cognitive feedback. Feedback is oriented to help the learner gain some knowledge, which is presumably incomplete (more on this topic in Section 4.4).

The assessment of learner production can be achieved through summative assessment or formative assessment. According to Ellis (2003: p. 312), assessment in TBI must include both types of assessment. While formative assessment is expected to help learners progress in their acquisition of knowledge, summative assessment is expected to show them and the teacher or any other stakeholder how good they are with respect to certain communicative and linguistic abilities at a given point in time.

3_{The research by Chandler (2003), Truscott (2004), and Bitchener et al. (2005) is carried out in}

4.3.1 Summative assessment

Summative assessment must be associated with language tests (Ellis, 2003: Ch. 9, Bachman and Palmer, 1996: Ch. 2, Bachman, 1990). Ellis (2003: p. 283–286) pro- poses to distinguish between two types of tests. System-referenced tests aim to inform about the learner’s language proficiency in general, while performance-referenced tests seek to inform about the learner’s ability to use the language in a specific context. Ellis distinguishes between direct and indirect assessment. The former involves “the holistic measurement of language abilities involving some kind of task”, whereas the latter involves “measuring language proficiency analytically by means of tests of discrete points of language or of specific tests in a task” (Ibid.).

Since tasks per se do not provide a measure of the learner’s language ability, learner performance must be measured in some way. For this, Ellis describes three possible methods: The first is direct assessment of task outcomes, which is possible in closed tasks that result in a solution that is either right or wrong. For instance, if, as a result of a task, learners must grasp one particular object on a table that has more than one object on it, the result can be directly assessed. If the outcome is a communicative one (a piece of language), it might be more open, but if the response must convey some sort of message, this message must be included in it somehow.

Second, he suggests discourse analytic methods that are based on counts of specific linguistic features occurring in the discourse that result from performing the task. Such methods will relate to the learner’s linguistic competence measured in terms of complexity, accuracy and fluency measures. They can also be related to sociolinguistic competence: appropriate use of requesting strategies; or related to discourse competence (e.g., use of cohesive discourse markers).

Finally, there is the external ratings method, which involves an assessor observing the task and making a judgement. This method differs from direct assessment in that the judgement is more subjective, although efforts must be made to warrant reliability. Such a method requires assessment guidelines and possibly a checklist of competencies.

Assessment is an important and very complex process in language teaching and learning, and the measurement of test-task performances has to rely on methods that are valid and reliable (Ellis, 2003: p. 283–286). Because of this Ellis requires complex, qualitative and multidimensional assessments. However, he admits that assessment must be practical and cost-effective, for which assessment procedures must designed according to working conditions and professional expertise.

4.3.1.1 A framework for the characterisation of test tasks

Bachman and Palmer (1996: Ch. 3) present a framework for the characterisation of tasks in tests, whose aim is threefold: (i) to describe the target language use domain (that, is the communicative setting in real life) that will be the basis for the pedagogical task; (ii) to describe test tasks as a means to ensure their comparability and assess their reliability; and (iii) to assess the authenticity of language test tasks (1996: p. 47).

develop test tasks in a principled and objective manner. On one hand, it allows for the comparison of differences and similarities between the test tasks and the corresponding target language use settings in the real-world. On the other hand, it can be used to devise new test tasks that differ from the existing catalogue in a complementary and pedagogically driven manner.

Bachman and Palmer’s task characteristics framework

Bachman and Palmer analyse test task characteristics in terms of five different features: setting, test rubric, input, expected response, and relationship between input and response.

As for the setting, it comprises physical characteristics such as temperature, seating conditions, lighting, and so on, as well as the participants in it, in addition to the testee, and the time of the day in which the task is to be completed. The rubric includes the structure of the test, (items/parts, salience of parts, sequence of parts), the characteristics of the instructions (e.g., language and channel in which they are given), the duration of the test and its items or parts, and, finally, the scoring method, which includes criteria for correctness and a procedure for scoring the response.

As for the input, Bachman and Palmer distinguish between format and language characteristics. The input is whatever information the learner is required to process in order to complete the task. The input’s format includes the channel (aural, visual or both), or the language (in the native or foreign language). The input’s language is analysed in terms of language knowledge and topical knowledge. Language knowledge relates to linguistic aspects such as vocabulary, morphology, syntax, pragmat- ics (cohesion, rhetorical structure), dialect, register, or cultural references. Topical knowledge relates to the type of information that is part of the input: personal, cultural, academic, technical, and so on.

The characteristics of the expected response (as opposed to the actual response) are also analysed by format and language characteristics, exactly as the input is analysed. Moreover, Bachman and Palmer define three types of responses: selected responses (no language product required) and limited or extended production responses. Limited production responses consists of a single word, a phrase, or at most a full sentence. Extended production responses consists of a text that extends somwhere between two utterances and a full text.

The last aspect Bachman and Palmer (1996) propose is the relationship between input and response. According to them, this relationship can be measured in terms of reactivity, scope and directness. Reactivity relates to the extent to which input or response affect subsequent input and responses. In this sense, the relationship can be reciprocal, where there is immediate feedback, in the widest sense of the word, that favours interaction between the learner and the interlocutor, or non-reciprocal, where there is no feedback or interaction until the task is finished and evaluated.

As for the scope of the relationship, the authors relate it to the amount of input that must be processed for learners to respond as expected. The scope can be broad, where much input must be processed, or narrow, where the amount of input to be

processed is minimal. Finally, the directness of the relationship is related to the degree to which the expected response can be based on information found in or inferable from the input, or whether the learner must rely on information in the context or in his or her own topical knowledge.

4.3.2 Formative assessment

According to Ellis (2003: p. 312) formative assessment includes the kinds of testing instruments used in summative assessment, but, crucially, it includes the kind of contextualised assessment that teachers can provide while the task is being done. Ellis distinguishes between planned and incidental formative assessment. Planned formative assessment requires the use of direct tests of the system-referenced and performance-referenced kinds and must be syllabus-driven. By contrast, incidental formative assessment is “the ad hoc assessment that teachers (and students) carry out as part of the process of performing a task that has been selected for instructional rather than assessment purposes” (Ellis, 2003: p. 314).

Incidental formative assessment is something that results from teacher-learner interaction during or after performing a given FL learning task. During the task the teacher (or peer) can provide online feedback by means of scaffolding strategies (see next section). After the task, the teacher and the learners can reflect on the aspects they noticed.

The kind of formative assessment that seems relevant for an ICALL setting is planned formative assessment. One obvious reason is that CALL is based on computer-learner interaction and is often done in contexts where the teacher does not intervene immediately. A second reason is that programming computers to provide feedback to learners critically requires us to plan what is expected from learners, how it will be assessed, and what aspects should the learners’ the attention be drawn to.

In document Language Learning Tasks and Automatic Analysis of Learner Language: Connecting FLTL and NLP design of ICALL materials supporting use in real-life instruction (Page 100-103)