• No results found

5.6 Evaluation

7.2.3 Choosing the Best Model

As Table 7.4 suggests, it is important to choose the optimal training set with respect to the quality of the input sentences. When no a priori information about this quality is available, how can one determine the optimal model?

Table 7.7 shows the changes in the number of articles in the system output. When running TrainGen on Drop30 and Drop70, there was an increase of 23.8% and 65.9% in the number of articles. These rates of increase were close to those obtained (24.4% and 66.0%) when running their respective optimal sets, TrainDrop30 and TrainDrop70. It appeared that TrainGen was able to provide a reasonable estimate of the number of articles that should be restored. When given new input sentences, one could use TrainGen to estimate the percentage of missing articles, then choose the most appropriate training set accordingly.

7.3

Summary

A maximum-entropy classifier was applied on the article generation task, using features drawn from a statistical natural language parser and WordNet. This feature set yielded state-of-the-art results.

The same model was applied on the article restoration task, where input sentences may have missing articles. In a departure from the previous chapters, the article in the input text is taken into account; that is, the difference between the target (or predicted) article and the source article (e.g., [a→null]) is explicitly modelled. An initial estimation of the quality of the input text helps choose the most appropriate insertion model to use. Our best results are 20.5% article error rate for sentences where 30% of the articles have been dropped, and 38.5% for those where 70% of the articles have been dropped.

From Chapter 5 till now, the error correction task is reduced to a classification task, by limiting our focus on one part-of-speech at a time. When the input sentence is of lower quality, multiple mistakes, possibly inter-dependent, may need to be considered and corrected simultaneously. For example, if a noun has the wrong number, then it is difficult to independently propose an appropriate article for that noun. In the next chapter, we will view the correction task as a sentence re-generation problem (see §2.4.1), which is more suitable for this type of sentence.

Chapter 8

Sentence Re-generation

The previous chapters have treated each class of non-native grammatical error — prepo- sitions, verbs, and articles, respectively — in isolation as a classification problem. This treatment sidesteps two potential issues. First, if an error interacts with another word (e.g., noun number and verb conjugation), the correction of either error cannot be determined independently. Second, if there are errors in other parts of the sentence, then the extracted features may also contain some of these errors, making the classification results less reliable. If the input sentence is of relatively high quality, these two potential problems may not be severe. If it is ridden with errors, however, then an approach that incorporates global considerations would be more appropriate. In this chapter, we will adopt such an approach, and propose a sentence re-generation method that belongs to the paradigm described in §2.4.1.

In the past few years, the Spoken Language Systems group has been developing a con- versational language learning system (Seneff et al., 2004), which engages students in a dialogue in order to help them learn a foreign language. An important component of such a system is to provide corrections of the students’ mistakes, both phonetic (Dong et al., 2004) and grammatical, the latter of which is our focus. For example, the student might say, “*What of time it arrive into Dallas?” The system would be expected to correct this to, “What time will it arrive in Dallas?” Given that the system’s natural language un- derstanding (NLU) component uses a probabilistic context-free grammar, a parsing-based approach might appear to be a natural solution.

However, as discussed in the literature review on parsing-based approaches (§2.3), the addition of mal-rules makes context-free grammars increasingly complicated, exponentially growing the number of ambiguous parses. We instead propose a two-step, generation-based framework. Given a possibly ill-formed input, the first step paraphrases the input into a word lattice, licensing all conceivable corrections; and the second step utilizes language models and parsing to select the best rephrasing.

The rest of the chapter is organized as follows. §8.1 identifies the types of errors we intend to handle and describes our two-step, generation-based framework for grammar correction. §8.3 presents some experiments on a corpus of flight queries. §8.4 describes an application of our approach on postediting MT output.

Part-of-speech Words Articles a, an, the

Modals, can, could, will, would, must, might, should Verb auxiliary be, have, do

Prepositions about, at, by, for, from, in, of, on, with, to Nouns flight, city, airline, friday, departure, ... Verbs like, want, go, leave, take, book, ...

Table 8.1: The parts-of-speech and the words that are involved in the experiments described in §8.1. The five most frequently occurring (in the test set) nouns and verbs are listed in their base forms. Other lists are exhaustive.

8.1

Correction Scope and Procedure

Motivated by analyses based on the Japanese Learners’ English corpus (see §3.1.2), we consider errors involving these four parts-of-speech:

• All articles and ten prepositions, listed in Table 8.1. • Noun number.

• Verb aspect, mode, and tense. The algorithm consists of two steps.

Overgenerate In the first step, an input sentence, which may contain errors, is reduced to a “canonical form” devoid of articles, prepositions, and auxiliaries. Furthermore, all nouns are reduced to their singular forms, and all verbs are reduced to their root forms. All of their alternative inflections are then inserted into the lattice in parallel. Insertions of articles, prepositions and auxiliaries are allowed at every position. This simple algorithm thus expands the sentence into a lattice of alternatives, as illustrated in Figure 8-1, for the reduced input sentence “I want flight Monday”.

This step may be viewed as a type of natural language generation (NLG) that is intermediate between conventional NLG from meaning representations, and NLG from keywords1. Our approach is most similar to (Uchimoto et al., 2002) in the sense of stripping

away and then regenerating the function words and inflectional endings.

Rerank In contrast to previous work on building lattices of paraphrases, such as (Barzilay & Lee, 2003), most paths in this lattice would yield ungrammatical sentences. Therefore, in the second step, a language model is used to score the various paths in the lattice. In the Nitrogen natural language generation system, an n-gram model is used to select the most fluent path through the word lattice (Langkilde & Knight, 1998); in other work, such as (Ratnaparkhi, 2000) and (Uchimoto et al., 2002), dependency models were used. An n-gram model will serve as our baseline; stochastic context-free grammar (CFG) language models will then be utilized to further re-rank the candidates proposed by the n-gram model. Three CFG models will be compared.

<aux> <article> <prep> <aux> <article> <prep> <aux> <article> <prep> <aux> <article> <prep> <aux> <article> <prep> <s> I wanting wanted wants want flight flights Monday Mondays </s>

Figure 8-1: Lattice of alternatives formed from the reduced input sentence, “I want flight Monday”. The lattice encodes many possible corrections, including different noun and verb inflections, and insertions of prepositions, articles, and auxiliaries. One appropriate correction is “I want a flight on Monday”.