NLG based on language models - Data-driven approaches

3.3 Data-driven approaches

3.3.2 NLG based on language models

Given an alignment between data and text, one way of modelling the nlg process is to remain faithful to the division between strategic and tactical choices, using the statistical alignment to inform content selection, while deploying nlp techniques to acquire rules, templates or schemas (´a laMcKeown, 1985) to drive sentence planning and realisation.

Recall that the generative model of Liang et al. (2009) pairs data to text based on a sequential, Markov process, combining strategic choices (of db records and fields) with tactical choices (of word sequences) into a single prob- abilistic model. In fact, Markov-based language modelling approaches continue to feature prominently in data-driven nlg. One of the earliest examples is Oh and Rudnicky (2002) in the context of a dialogue system in the travel domain,

where the input takes the form of a dialogue act (e.g. a query that the system needs to make to obtain information about the user’s travel plans) with the attributes to include (e.g. the departure city). Oh and Rudnicky’s approach en- compasses both content planning and realisation. It relies on dialogue corpora annotated with utterance classes, that is, the type of dialogue act that each utterance is intended to fulfil. On this basis, they construct separate n-gram language models for each utterance class, as well as for word-classes that can appear in the input (for example, words corresponding to departure city). Content planning is handled by a model that predicts which attributes should be included in an utterance on the basis of recent dialogue history. Realisa- tion is handled using a combination of templates and n-gram models. Thus, generation is conceived as a two-step (planning followed by realisation) process. The reliance on standard language models has one potential drawback, in that such models are founded on a local history assumption, limiting the ex- tent to which prior selections can influence current choices. An alternative, discriminative model (known to the nlp community at least since Ratnaparkhi, 1996) is logistic regression (Maximum Entropy). The foundations for this approach in nlg can be found in Ratnaparkhi (2000), who focussed primarily on realisation (albeit combined with elements of sentence planning). He compared two stochastic nlg systems based on a maximum entropy learning framework, to a baseline nlg system. The first of these (nlg2 in Ratnaparkhi’s paper) uses a conditional language model that generates sentences in an incremental, left-to-right fashion, by predicting the best word given both the preceding history (as in standard n-gram models) and the semantic attributes that remain to be expressed. The second (nlg3) augments the model with syntactic de- pendency relations, performing generation by recursively predicting the left and right children of a given constituent. In an evaluation based on judgements of correctness, Ratnaparkhi found that the system augmented with dependencies was generally preferred.

In later work, Angeli et al. (2010) describe an approach to end-to-end nlg that maintains a separation between content selection, sentence planning and realisation, modelling each process as a sequence of decisions in a log-linear framework, where choices can be conditioned on arbitrarily long histories of previous decisions. This enables them to handle long-range dependencies, such as coherence relations, more flexibly (e.g., a model can incorporate the information that a weather report which describes wind speed should do so after mentioning wind direction; see Barzilay & Lapata, 2005, for similar insights based on global optimisation). The separation of tasks is maintained insofar as a different set of features can be used to inform decisions at each stage of the process. Sentence planning and realisation decisions are based on templates acquired from corpus texts: a template is selected based on its likelihood given the database fields selected during content selection.

Mairesse and Young (2014) describe a different approach, which also relies on alignments between database records and text, and seeks a global solution to generation, without a crisp distinction between strategic and tactical com- ponents. In this case, the basic representational framework is a tree of the sort

Figure 5: Tree structure for a dialogue act, after Mairesse and Young (2014). Leaves correspond to word sequences. Non-terminal nodes are semantic attributes, shown at the bottom as semantic stacks. Stacks in bold represent mandatory content.

shown in Figure 5. The root indicates a dialogue act type (in the example, the dialogue act seeks to inform). Leaves in the tree correspond to words or word sequences, while nonterminals are semantic stacks, that is, the pieces of input to which the words correspond. In this framework, content selection and realisation can be solved jointly by searching for the optimal stack sequence for a given dialogue act, and the optimal word sequence corresponding to that stack sequence. Mairesse and Young use a a factored language model (flm), which extends n-gram models by conditioning probabilities on different utterance contexts, rather than simply on word histories. Given an input dialogue act, generation works by applying a Viterbi search through the flm at each of the following stages: (a) mandatory semantic stacks are identified for the dialogue act; (b) these are enriched with possible non-mandatory stacks (those which are not in boldface in Figure 5), usually corresponding to function words; (c) realisations are found for the stack sequence. The approach is also extended to deal with n−best realisations, as well as to handle variation, in the form of paraphrases for the same input.

In document Survey of the state of the art in natural language generation : core tasks, applications and evaluation (Page 31-33)