3.2 Incremental NLG and self-repair
3.2.2 Incrementality in conceptualization, formulation and articulation
There is a sub-field of Natural Language Generation (NLG) research whose object of study is not the automatic production of text or synthesized speech from non-linguistic data for the benefit of system users, but computational models of cognitive processes that underlie human language production (McDonald, 1987), and this sub-field considers incrementality as a key problem in generation.
As mentioned with Thompson (1977)’s approach of functional decomposition of the pro- duction system, psycholinguistically-motivated NLG took on the task of implementing emerging psycholinguistic models of speech production. The task was formulated as the design of a multi- modular process that did not require complete input plans for sentences before beginning their production. To this end, a distinction between the different components of a generation system became important, as did the passing of incremental units between them. This approach was largely motivated by (Kempen and Hoenkamp, 1987) and (Levelt, 1989)’s separation of pro- duction into distinct conceptualization, formulation and articulation phases (see Figure 3.5), a
3.2. Incremental NLG and self-repair 72
a.
b.
Figure 3.5: Incremental production without and with inversion of order. From (Levelt, 1989, p. 25)
psychological model which still has a bearing on NLG today.
These systems were incremental insofar as transfer of information within the generator was piecemeal, but they were not always necessarily strictly word-by-word incremental in terms of their output. The general programme of research was that input from a conceptualization module to a grammar-based formulator could be partial, as could a formulator’s input to an articulator, so syntactic processes determining surface form elements like word order and inflection could begin before the entire input LF for a sentence had been received. This follows Wundt’s Prin- ciple that each processing component should be triggered into activity by a minimal amount of its characteristic input (Levelt, 1989, Chapter 1.2). Neumann and Finkler (1990) describe this kind of incremental generation as “immediate verbalization of the parts of a stepwise computed conceptual structure – often called message)” (ibid., p. 288).
Kempen and Hoenkamp (1987) made the first detailed attempt at describing a generation implementation, introducing the Incremental Procedural Grammar (IPG) model. Schematically, IPG was driven by parallel processes whereby a team of syntactic modules worked together on small parts of a sentence under construction, with the sole communication channel as a stack ob- ject (with different constituents loaded onto it), rather than the modules being controlled by a cen- tral constructing agent. The system was designed under a premise consistent with the emerging psychological models that tree formation was simultaneously conceptually and lexically guided (van Wijk and Kempen, 1987), and that production did not take place in a serial manner. IPG was
implemented inLISPas a Dutch sentence generator, and was shown to be capable of generating elliptical answers to questions and also some basic self-repairs.
De Smedt (1990) took incrementality a stage further, and developed a fairly comprehensive computational model of incremental generation in which self-repair is incorporated explicitly. De Smedt developed parallelism implicit in Kempen and Hoenkamp’s IPG model by implement- ing parallel processing within the formulation stage of generation, with a particular focus on incremental construction of syntactic structure in sentence generation. De Smedt proposed the Incremental Parallel Formulator (IPF), a module for grammatical encoding which could operate with input that underspecified sentences. In this case the input increments were abstract con- ceptual messages representing semantic conceptual relations, semantic role relations or lexical feature specifications.
The IPF operated in accordance with Kempen (1987)’s criteria for incremental generation: input from the conceptualizer should be fragmentary and not guaranteed to be sent in an order corresponding to a particular sentence’s surface linear left-right word order; as a consequence, generation should be be able to proceed from the bottom of a syntactic structure upwards as well as from the top down. The IPF showed how language generation should also exploit variations in word order as made necessary, but still observe linguistic restrictions.
The formulator constructed syntactic structures by applying unification operations on ‘syn- tactic segments’, the principal units of the unification-based lexically-driven formalism segment grammar (Kempen, 1987; De Smedt and Kempen, 1991). Segments were TAG-like structures with two nodes (the head and foot, labeled with grammatical categories such as NP and N and containing feature structures such as nominative (+)), and an arc representing a grammatical function e.g. an S-subject-NP segment represents a subject relation between a sentence and a noun phrase. Unification operated via the combination of two compatible segment nodes into a new segment that shared their syntactic features. In tree construction terms, this is the attachment of auxiliary trees to the currently derived tree, as in a TAG. The procedure differed from the early TAG formalisms however by making a sister-node attachment operation (furcation) available, which allowed for different parts of a structure to be worked on in parallel before unifying them. The IPF had two internal components: the Grammatical Encoder, which generated f-structures (TAG-like tree structures representing functional and dominance relationships between con- stituents) to which segments were attached, which in turn were used to generate c-structures
3.2. Incremental NLG and self-repair 74
(data structures including features representing word order and other grammar specifications like case); and the downstream Phonological Encoder, which was responsible for executing the word ordering and correct inflection in accordance with the c-structure features. The grammatical en- coding procedure could begin as soon as the first conceptual fragment entered the IPF, beginning with an empty segment SIGN. The formulator attempted the unification operation of each lexical entry segment in the lexicon with the existing structure and if successful these unified struc- tures could be stored in the Unification Space as candidates for sending on to the Phonological Encoder, allowing multiple structures to be worked on in parallel.
De Smedt (1991) introduced a development to the IPF to allow revisions of syntactic struc- tures in the generation procedure, providing a computational explanation for overt and covert syntactic self-repair. This was achieved by making the unification procedure “non-destructive”, in the sense that the original configuration of two nodes was preserved after a unification oper- ation, while operations on them with other syntactic constituents were still permitted as if they were unified as one structure. The accessibility to component parts of unified structures meant that no undoing of unification had to be executed at any point. The computational overhead of this extra storage and search space was not discussed.
De Smedt used the connectionist concept of activation in assigning real number values to the bonds of the “virtually” unified segments denoting the probability of them eventually becoming properly unified, to differentiate between strong and weak candidate structures. The author also experimented with annealing, whereby node activations would be set to decay over time if not unified. If a steady equilibrium was reached with a frozen configuration of segments (a state of conformation) the strongest remaining structure could be passed to the Phonological Encoder. The simulation of speech errors was achieved through this time-constrained annealing process: if there was not a clearly strong enough candidate or conformation after a given amount of genera- tion time, the “incorrect” segment could be passed on. A lexical selection error such as “The next speaker will be given by Jonathan Slocum” (ibid.) was characterized as the presence of equally viable alternatives in the Unification Space, and possible incorrect concatenations or furcations of segments. Additionally, annealing allowed a cognitively inspired implementation whereby experiments that allowed more or less time between inputs gave different surface results, as the competing segments could optimally reconfigure with more time, simulating speech errors under time pressure.
Incremental conceptualization
While De Smedt’s work on the grammatical formulation side of generation was thorough, it did not address the nature of the conceptualizer that sent the input messages to it, as atomic messages were passed to the IPF incrementally ‘by hand’. Guhe and colleagues began to address this void in computational models of language production by developing the Incremental Conceptualizer (INC, Guhe and Habel, 2001; Guhe, 2007), the principle behind it being to incrementally and automatically create and send pre-verbal messages to the formulator in a cognitively motivated way. The generation task here began in a top-down manner, beginning with the incremental production of pre-verbal messages.
Guhe was interested in the idea of conceptual change in the input data and considered self- repairs from this perspective, distinguishing them from performance errors such as incorrect lexical access or misconception, which could be attributed to system malfunction. For testing, Guhe and Schilder (2002); Guhe (2007) chose a dynamic domain of a simple airport scene which had a variety of live scenarios, in order to evoke change in the input concepts which could cause both overt and covert self-repairs such as those below:
(3.5) “CK-314. . . uh. . . is delayed” [covert]
(3.6) “CK-314 is on time. . . uh. . . is delayed” [overt, formulator occupied]
(3.7) “CK-314 is on time. . . uh. . . CK-314 is delayed” [overt, concept changed after formulation]
A simple version of self-monitoring (Levelt, 1989) was employed in INC’s error detection mechanism, whereby a parse of the output was compared with the planned utterance, a difference therein automatically stopping the current generation and triggering a marking of the part of the utterance to be repaired. A correction term was then generated (i.e. “uh” or “no”) and the content to be corrected (the information difference) was passed to the formulator. The incremental gen- eration of concepts in the conceptualizer was triggered by atomic perceived entities (based on a dynamically changing virtual scene at the airport), simulating real-time processing, and given a changing environment, the generator would have to be able to adapt its output quickly– this is a classic use case of incremental generation and self-repair.
A semantic underspecification formalism CLLS (Constraint Language for Lambda Struc- tures) a framework for the partial description of lambda structures, was used to incrementally
3.2. Incremental NLG and self-repair 76
compose the conceptual messages. The message generation procedure consisted of 4 operations- construction, selection, linearization, and PVM(pre-verbal message)-generation, which all op- erated on the current conceptual representation (CCR), a hierarchical semantic network that represented the internal state of the conceptualizer. The CCR was first built up by the construc- tion process through a concept matcher linked to a concept store. The construction algorithm worked recursively with the matcher until no more complex concepts could be constructed from simpler ones, until a newly perceived entity arrived to be handled. The selection process chose the concepts to be verbalized from the CCR, which were then linearized into an appropriate order (logically, not into final word order), and PVM-generation incrementally produced a pre-verbal message by taking the first element out of a traverse buffer (a sub-structure of the CCR), and passing that part of the PVM onto the formulator, continuing in an incremental fashion.
Repairs could be triggered due to the fact that as soon as an increment was sent to the for- mulator it became inaccessible to the conceptualizer- generating corrections was the only way to change information. Upon new information arriving which significantly changed a concept in the PVM being sent to the formulator, the difference between the planned and actual utterance content was computed and a correction increment was generated containing information about which concept to change, which concept to be deleted by the formulator, and which information to be added by the formulator. The formulator received this correction increment, and then made decisions about how the correction was to be treated in accordance with the modular division-of- labour postulated by De Smedt (1990). Guhe and Schilder showed the consistency of theirCLLS
correction algorithm with the parallelism constraint commonly attributed to verb-phrase ellip- sis. Informally, the lambda structure in CLLSfor a correction was structurally the same as for a coordination, so it could be added to the incremental preverbal message simply as another incre- ment. This increment could then combine with the alternation by beta-reduction in the parallel correction structure to yield a message such as CK-314(λx.correction(on time(x), delayed(x))),
a message invoking an overt repair in the formulator such as (3.6) above, and correction(CK-
314(λx.on time(x)),CK-314(λx.delayed(x))) which would cause a more lengthy overt repair
(see (3.7) above).
Guhe’s work showed how concepts could be monotonically constructed and trigger repair strategies in the formulator and also developed a rudimentary semantic representation for repair concepts. These achievements fell within the larger research programme of casting the generation
of pre-verbal messages as an incremental procedure, however the work avoids the impenetrable problem of the initial autonomous generation of a concept, rather starting from the input of live- events which trigger the input of some predefined conceptual increments to the system. While it is possible to compare the verbal output of the system with that of human beings as shown in Guhe (2007),INC would be difficult to evaluate in terms of its individual contribution in a quantitative way, as pre-verbal message LFs (the input for tactical generation), do not have a widely agreed form (Belz et al., 2010, Section 3.2.1). This is not the case for measuring similarity of string outputs given a gold standard generation input or LF, as is the case for more common surface realisation tasks, for which a wide variety of metrics exist. Also, while the system operated in a dynamic and changing domain it was not interactive with users: if operating in a dialogic extension of the domain, such as describing the moving scene to a partner who could query the descriptions, more interactive requirements would be put on the conceptualizer.