KURD and CG for shallow semantic processing

6.3 A general architecture for the analysis of learner language

6.3.3 KURD and CG for shallow semantic processing

In this section we show how both KURD and CG can be used to analyse response chunks to analyse responses with a focus on activity-specific linguistic structures. This kind of task in NLP is often called shallow semantic analysis, and this is what motivates the title of this section. However, we believe that semantic analysis is a task that (i) implies a much more complex task than what we present in Chapter 8, and (ii) it can lead non-NLP experts to expectations that do not match with the real capabilities of NLP tools. Because of this we will tend to call it domain-specific information extraction or activity-specific learner response assessment.

6.3.3.1 CG-based shallow semantic processing

As for the task of annotating beyond the morphosyntactic level, CG can easily add new levels of information to one or each of the readings in a cohort. This is done by creating a rule file that contains rules that apply to the text to be analysed as a whole and not sentence-wise, as is usually done. Then, using the ADD operator, one can process the analysed text in order to check for the presence or the absence of the relevant linguistic structures.

Figure 6.9, exemplifies four rules that were included in one of the Information Extraction modules for the analysis of the Catalan version of an ICALL activity that is described later on Section 7.2.2.3. The rules correspond to a part of the response where the learner is expected to end an email with a complimentary close. As shown in (4), the rules envisage four different ways of expressing that in Catalan in order to comply with the activity’s requirements – all of which correspond more or less to the English Yours sincerely, or Yours faithfully,.

(4) a. Atentament, b. Cordialment,

c. Ben cordialment, d. Salutacions,

e. Salutacions cordials,

ADD (@:ComplClose) TARGET (Adv) IF

(0 ATENTAMENT OR CORDIALMENT) (1 COMMA); ADD (@:ComplClose) TARGET (Adv) IF

(-1 BEN) (0 CORDIALMENT) (1 COMMA); ADD (@:ComplClose) TARGET (Nom) IF

(0 Nom + SALUTACIONS) (1 COMMA); ADD (@:ComplClose) TARGET (Nom) IF

(0 Nom + SALUTACIONS) (1 CORDIALS) (COMMA);

Figure 6.9: CG rules for the analysis of the complimentary close in a formal letter in Catalan.

After this set of rules and other similar rules are applied to detect the relevant parts of the response, another CG-based module using a different set of rules checks for the global response correctness. This will be described in Chapter 8, where we describe the pedagogically oriented design and implementation of an NLP-based feedback generation module.

6.3.3.2 KURD-based shallow semantic processing

The Information Extraction module is implemented in a slightly different way in KURD. As described in Boullosa, Quixal, Schmidt, Esteban, and Gil (2005: pp. 32– 34), the KURD formalism was enhanced during the ALLES project with a so-called “discourse” module. With its discourse module, KURD is capable of generating analysis nodes, e.g., feature bundles, at the sentence level – instead of associating them with word readings.

We will show how the rule for analysing part of the sentence in (5) would be implemented. This sentence is one of the possible responses to an activity in which learners are required to produce a satisfaction questionnaire – the activity is presented and worked out later on in Chapters 7 and 8.

(5) a. How satisfied are you with Stanley Broadband?

As shown in line 2 of Figure 6.10, the rule name is CustomerSatisf. This rule checks for the presence of the expected words that refer to the satisfaction of the customer in the response. The rule is fairly simple. It checks for the sequence of words satisfied are you with, and it maps the code CustSatisf to all of them – line 9. In addition, it maps this information to the sentence node, identified by a special symbol ($-1) in line 10. The rule CustomerSatisf tells the algorithm to go on with the processing of the block of rules corresponding to the that part of the response in which Prod uct is referred to – which we do not show.

Figure 6.10: KURD rules to process a part of a possible response to one of the ICALL activities later on presented and worked out in Chapters 7 and 8.

After the rules in the Information Extraction module are applied to process the sentence in (5) a set of response chunks are identified and the sentence can be passed on to the module that will check for the correctness of the response. The completely analysed version of the sentence is reflected in Figure 6.11. We see particularly in line 2 the attribute RespOrder that contains all the corresponding response elements.

In each of the other lines corresponding to analysed tokens – lines 3 to 10 – we can see that each of them is identified as a member of a response element in the disc attribute. The elements in the attribute disc are part of the connection between the linguistic analysis and the pedagogical objectives of the activities to be matched with linguistic-based information. The methodology that we propose to design and implement such rules is explained in Chapters 7 and 8.

Figure 6.11: Linguistic analysis for the sentence (5) including the response elements detected by the Information Extraction Module.

6.4 Chapter summary

In this chapter, we introduced ALLES, the research setting in which our methodolog- ical instruments for the design and implementation of ICALL materials are exemplified. This context arises from a multidisciplinary research project carried out by a team of experts in several domains, among them experts in FLTL and NLP. My role in the project was to design and develop pedagogically informed NLP strategies for the generation of automatic feedback. These strategies are based on surface shallow semantic processing techniques for the automatic evaluation of learner responses, and implement summative and formative assessment strategies.

We presented the pedagogical concept underlying the TBLT-driven materials that resulted from the initial design phase, and characterised them in terms of Estaire and Zan´on (1994)’s framework. This characterisation determines aspects of the topic, and the general linguistic and communicative goals of tasks. This approach requires the design of a learning sequence and the design of overall strategy of assessment procedures. However, it does not characterise the contents expected in learner responses, nor specific criteria for correctness for each response item. An approach supplying the instruments for a principled and formal characterisation of these latter aspects is the purpose of our research in the following chapters.

This chapter also introduced the NLP tools that serve as the basis for the implementation of practical assessment functionalities for the materials developed within the ALLES project. The general architecture for the linguistic processing of text using finite-state automata and a mal-rule approach was instantiated in two different software solutions, used for different languages in the project. In Chapter 8 the different levels of information generated by such an architecture are strategically combined to respond to FLTL needs and assessment requirements.

Chapter 7 Designing ICALL tasks –

Characterisation of pedagogical

needs

This chapter introduces and exemplifies the frameworks that we propose to characterise tasks and learner responses during the design phase, as well as the relationships between them. This characterisation will be used to pedagogically motivate the requirements for the linguistic analysis and feedback generation modules of an ILTS.

The Task Analysis Framework (TAF) characterises activities from a general pedagogical and linguistic perspective in terms of learning goals, learning processes, and type of response required from the learner. The TAF serves two purposes: (i) to determine the degree of communicativeness of the FL learning activity; and (ii) to distinguish FL activities that are good candidates to being turned into ICALL activities, and those that are not – mainly due to the expected outcome. The TAF is exemplified in the analysis of a set of learning materials.

The Response Interpretation Framework (RIF) characterises expected learner responses and their assessment criteria in detail. By applying the RIF to a particular task, a set of objective criteria for correctness can be produced, and a set of learner responses can be anticipated. To exemplify its use, we apply the RIF to four activities that are representative of the different kinds of activities that might be considered for NLP-based automatic assessment.

7.1 TAF: Task Analysis Framework

In document Language Learning Tasks and Automatic Analysis of Learner Language: Connecting FLTL and NLP design of ICALL materials supporting use in real-life instruction (Page 142-145)