Generating Arguments in Natural Language

(1)

Generating Arguments in

Natural Language

Chris Reed

Submitted in fulfilment of the requirements

for the award of the degree of Doctor of Philosophy

University College London

(2)

ProQuest Number: U643524

All rights reserved

INFORMATION TO ALL USERS

The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript

and there are missing pages, these will be noted. Also, if material had to be removed,

a note will indicate the deletion.

uest.

ProQuest U643524

Published by ProQuest LLC(2016). Copyright of the Dissertation is held by the Author.

All rights reserved.

This work is protected against unauthorized copying under Title 17, United States Code.

Microform Edition © ProQuest LLC.

ProQuest LLC

789 East Eisenhower Parkway

P.O. Box 1346

(3)

(4)

GENERATING ARGUMENTS IN NATURAL LANGUAGE

ABSTRACT

Automated generation of persuasive arguments has a wide range of potential applications, but represents a major challenge to existing natural language generation (NLG) techniques. In this thesis, it is argued that existing approaches fall short in several fundamental ways, and that handling argumentation demands major extensions to the conventional NLG model. Five key extensions are discussed. First, a distinction between the logical and rhetorical components of a text is advocated which is reflected in a similar modularisation of the plaiming task. Second, the adoption of an advanced style of hierarchical planning is proposed which is shown to mirror the hierarchical structure of argument, to increase generative flexibility, and to reduce computational cost. Third, the insufficiencies of a coherence-relation account are enumerated, and employed to motivate a more abstract representation layer drawing on the structural theories developed in argumentation theory. Fourth, conventional models in NLG have represented informational content; more recently, the role of intentional content has been emphasised; here, the importance of the attentional state and its explicit manipulation is also incorporated in a uniform way. Fifth, it is demonstrated that the generation of cue phrases between argument components relies not upon relations holding between clauses, but upon relations between more abstract units of text, and that those cues must necessarily therefore be introduced at an earlier stage of the planning process.

An architecture is proposed which integrates these extensions and formalises components of accounts offered in argumentation theory. This formalisation is carried out through a characterisation of deductive, inductive and ‘fallacious’ argument operators, including Modus Ponens, Modus Tollens, Inductive Generalisation and Ignoratio Elenchi. These argument forms are operationalised (in much the same way as Rhetorical Structure Theory relations have been) as planning operators which employ basic notions not only of belief, but also of saliency. Through a careful analysis of this distinction, argument forms such as the enthymeme, and rhetorical devices such as informing the hearer of facts which he is known already to believe, are shown to be easily accounted for. The architecture is implemented in the Ji^etorica system, which encompasses layers of processing responsible for argument structuring and eloquence generation. H^etorica also employs a body of thirty heuristics, which uniformly represent a variety of the most common guidelines listed in rhetoric and oratory texts of classical, renaissance, Victorian and contemporary authors.

(5)

Table o f Contents

EXORDIUM___________________________________________________________________________ 8 THE PROBLEM SPACE________________________________________________________________ 12

2.1 La n g u a g e 12

2.1.1 Th e Dia l o g ic Sit u a t io n 16

2.1.2 Ag e n t Co m m u n ic a t io n 19

2.1.3 Pla n n in g 21

2.1.4 Na t u r a l La n g u a g e Gen er a tio n 24

2.2 Ar g u m e n t a t io n 27

2.2.1 Ar g u m e n t a t io n Th e o r y: An a l y s is Tec h n iq u esa n d Str u c tu r a l Mo d e l s 28

2.2.2 Ar g u m e n t As A So c ia l Ph e n o m en o n 31

2.2.3 Ap p l ie d Ar g u m e n t a t io n In Fo r m a l Re a s o n in g 33

2.2.4 Co m p u t a t io n a l Appr o a c h e st o Ar g u m en ta tio n 34

2.2.5 Ar g u m e n t a t io nin Mu l t i-Ag e n t Sy stem s 37

FRAMING A SOLUTION 41

3.1 DESIGN DECISIONS 41

3.1.1 Mo n o l o g u eo r Dia l o g u e? 41

3.1.2 Bla c k b o a r do r Pip e l in e? 51

3.1.3 Wh e r et o St o p? 54

3.1.4 RST OR... s o m e th in g e l s e ? 55

3.1.5 Fu n c t io nAUST o r Fo r m a l is t? 59

3.1.6 Be l ie fa n d Sa l ie n c y 60

3.2 Ar c h it e c t u r e 63

3.3 Wo r k e d Ex a m p l e 7 0

FOCUS AND ORDER 75

4.1 Ma in t a in in g F o c u s 75

4.1.1 A S Le v e l Pl a n n in g Ope r a to r s 79

4.1.2 A S Le v e l Pla n n in g 89

4.1.3 In t e n t io na n d At t e n t io n 91

4.1.4 Th e To pic St a c k 93

4.2 Or d er in gfo r Co h e r e n c y 97

4.2.1 He u r is t ic sf r o m Rh eto r ic 97

4.2.2 He u r is t ic sf r o m In t u it io n 99

4.2.3 Ha r da n d So f t Co n str a in ts 100

4.3 Or d e r in gfo r Pe r s u a s iv e Effe c t 100

4.3.1 Pe r s u a s iv ea n d In f e r e n t ia l St r e n g t h 101

4.3.2 He u r is t ic sf r o m Rh e t o r ic 102

4.3.3 He u r is t ic sf r o m Ps y c h o lo g y 104

4 .4 Pu llin git To g e t h e r 105

SURFACE FEATURES 111

5.1 Clues 111

5.1.1 INTERCLAUSE CUE PHRASE GENERATION 112

5.1.2 INTERARGUMENT CLUE GENERATION 118

5.1.3 Pu n c t u a t io na n dfo r m a t t in ga sclu es 122

5.2 EXPLICIT AND IMPLICIT SALIENCY 126

5.3 Rh eto r ic 129

5.4 Ta g g in g 132

5.5 Pu llin git To g e t h e r 135

SYSTEM OUTPUT 141

PERORATION 149

7.1 Ap pr a is a l 7.2 Fu tu r ew o r k 7.3 Co n t r ib u t io n s

APPENDICES

149 154 160

(6)

. GENERATING ARGUMENTS IN NATURAL LANGUAGE

Table o f Figures

Fig u r e 2.1 Th ef o u rb a s ica r g u m e n ts t r u c t u r e s, after (Fr e e m a n, 1 9 9 1 , p2 ) ... 2 9

Fig u r e 2 .2 To u l m ins c h e m aa n de x a m p l e... 30

Fig u r e 2 .3 Sa m p l esit u a t io nb e t w e e ns p e a k e r, S a n dh e a r e r, H ... 39

Fig u r e 2 .4 Ty p e so f Dia l o g u e, a d a p t e df r o m (Wa l to na n d Kr a b b e, 1 9 95, p6 6 ) ... 4 0 Fig u r e 3.1 Sa m p l ep r o c e s s, (a), p r o d u c t, (b), a n da n a l y s is, (c) o fm o n o l o g u e... 50

Fig u r e 3 .2 Pipe l in em o d e l sa n dt e r m in o l o g y... 53

Fig u r e 3 .3 De e pn e s t in go fb e l i e f s... 61

F ig u r e 3 .4 A r c h t i e c t u r e o f t h e ^r^ 'e k x rjc a. s y s t e m ...63

Fig u r e 3 .5 (a) Sa m p l e R ST t r e ea n d (b) e q u iv a l e n t SemNe tr e p r e s e n t a t io n... 64

Fig u r e 3 .6 AbN L P o pe r a t o rc h a r a c t e r is in g Mo d u s Po n e n s... 65

Fig u r e 3 .7 Sa m p l ea r g u m e n ts t r u c t u r e...7 0 Fig u r e 3 .8 Fir s t A S p l a nw ithp a r t ia lo r d e r... 71

Fig u r e 3 .9 Re s u l to fr e f in e m e n t (a) b e f o r e, a n d (b ) a ftc ro r d e r in g...7 2 Fig u r e 3 .1 0 Hig hl e v e lp r o c e s s in gc y c l e... 73

Fig u r e 4.1 Ma jo rt h eo r ie so ff o c u s... 78

Fig u r e 4 .2 Th en in ed e d u c t iv eo p e r a t o r sa tt h e A S l e v e l... 82

Fig u r e 4 .3 Sa m p l esc e n a r iof o rr e b u t t in ga n du n d e r c u t t in g...83

Fig u r e 4 .4 Th eo p e r a t o r s U C P a n d U C I... 84

Fig u r e 4 .5 Th eh ie r a r c h ic a ls t r u c t u r eo fa nin d u c t iv eg e n e r a l is a t io n... 85

Fig u r e 4 .6 Th eo p e r a t o r s IG a n d I S U P ... 86

Fig u r e 4 .7 Th e A P o p e r a t o r... 88

Fig u r e 4 .8 Th e IE o p e r a t o r...88

Fig u r e 4 .9 Th e M A K E _ S A L IE N T o p e r a t o r...89

FIGURE 4 .1 0 THE PU SH _T O P IC AND P O P_T O PIC OPERATORS... 93

Fig u r e 4 .1 1 Th elim it so fp a r t ia lo r d e r... 9 4 Fig u r e 4 .1 2 Th er u n t im es p e c ih c a t io nf o r M A K E _ S A L IE N T ... 95

Fig u r e 4 . 13 Th er u n t im es p e c ih c a t io nfo r P U S H _ T 0 P IC a n d P 0 P _ T 0 P I C ... 9 6 Fig u r e 4 .1 4 Th ed é f in it io no f H C R l : g r o u p in gt o p ic s... 98

Fig u r e 4 .1 5 Th ed e f in it io no f H C R 2: r e d u c eb r e a d t h... 9 9 FIGURE 4 .1 6 THE DEFINITION OF H C I l : REMINDING OF IMPLICATION...100

Fig u r e 4 .1 7 Th ed e f in it io no f H P R l : c l im a xo r d e r in g... 102

Fig u r e 4 .1 8 Th ed e f in it io no f H PR 2: c o n c l u s io nf ir s t... 103

Fig u r e 4 .1 9 Th ed e f in it io no f H PR 3: w e l lk n o w nf i r s t...103

Fig u r e 4 .2 0 Th ed e f in it io no f H PR 4: r e f u t a t io n sfir st... 104

FIGURE 4 .2 1 Th e DEFINITION OF H P P l: NO WEAK REFUTATIONS... 104

Fig u r e 4 .2 2 Th epr e f e r e n c eo r d e ra m o n g s th eu r ist ic sf r o mm o s tt ol e a s tp r e f e r r e d 105 Fig u r e 4 .2 3 Th ea r g u m e n ts t r u c t u r efo rt h ev e g e t a r ia n isma r g u m e n t... 107

Fig u r e 4 .2 4 Th ein it ia lk n o w l e d g eb a s ef o rt h ev e g e t a r ia n isma r g u m e n t...108

Fig u r e 4 .2 5 Co n t e x tp a r a m e t e r sf o rt h ev e g e t a r ia n is ma r g u m e n t...108

Fig u r e 4 .2 6 Med w ays t a t u so ft h ep l a n n in gpr o c e ssfo rt h ev e g e t a r ia n is ma r g u m e n t 109 Fig u r e 4 .2 7 Fi n a lp l a no fp r im it iv e sforv e g e t a r ia n is ma r g u m e n t...110

Fig u r e 5.1 Th e C L U E -M T h e u r is t ic...115

Fig u r e 5 .2 Ex a m p l e s 6 .1 0 a n d 6 .1 1 fr o m (Kn o t t, 1996, p1 0 3 ) ... 115

Fig u r e 5 .3 Th e C L U E -U C P a n d C L U E -U C I h e u r is t ic s... 118

Fig u r e 5 .4 Cl u ec a t e g o r ie s, a f t e r (Co h e n, 1 9 8 7 , P l 5 ) ... 119

Fig u r e 5 .5 Th e C L U E -P A R A L L E L h e u r is t ic... 119

Fig u r e 5 .6 Th e C L U E -IN F E R E N C E h e u r is t ic...120

Fig u r e 5 .7 Th e C LU E-D ETA EL h e u r is t ic... 121

Fig u r e 5 .8 Th e P U N C -B R E A K h e u r is t ic... 124

Fig u r e 5 .9 Th e P U N C -F N l h e u r is t ic...125

Fig u r e 5 .1 0 Th e P U N C -F N 2 h e u r is t ic... 126

Fig u r e 5.1 1 Th e E C -E P IST E M IC 2 h e u r is t ic...128

Fig u r e 5 .1 2 Th e E C -L E N G T H I h e u r is t ic...129

Fig u r e 5 .1 3 Th e R H E TO R IC -PE h e u r is t ic... 131

Fig u r e 5 .1 4 Th e E C -Q U A L IF IE R l a n d E C -Q U A L IF IE R 2 h e u r i s t i c s...132

Fig u r e 5 .1 5 Th e A F F E C T -U C h e u r is t ic...134

Fig u r e 5 .1 6 Th e A F F E C T -G O O D h e u r is t ic s... 134

(7)

Fig u r e 5 .1 8 Th e C L U E -D S in s t a n t ia t io nint h ev e g e t a r ia n is m a r g u m e n t...136

Fig u r e 5 .1 9 Th e P U N C -B R E A K h e u r is t icin s t a n t ia t e dint h ev e g e t a r ia n is ma r g u m e n t 137 Fig u r e 6.1 St r u c t u r eo ft h e ‘t o u r is tf a c il it ys ig n s’ a r g u m e n t... 142

Fig u r e 6 .2 Sy s t e m Bel ie fsa n dp a r a m e t e r sfo rt h e ‘t o u r is tfa c il it ys ig n s’ a r g u m e n t 142 Fig u r e 6 .3 Fin a l o u t p u tfo rt h e ‘t o u r istfa c il it ys ig n s’ a r g u m e n t...143

Fig u r e 6 .4 St r u c t u r eo ft h e ‘Cl a r e Sh o r t’ a r g u m e n t... 145

Fig u r e 6 .5 Sy s t e mb e l ie f sfo rt h e ‘Cl a r e Sh o r t’ a r g u m e n t... 145

Fig u r e 6 .6 Fin a l o u t p u t fo rt h e ‘Cl a r e Sh o r t’ a r g u m e n t...146

Fig u r e 6 .7 St r u c t u r eo ft h e ‘Ir ishc e n s u s’ a r g u m e n t...147

Fig u r e 6 .8 Sy s t e mb e l ie f sint h e ‘Ir ishc e n s u s’ a r g u m e n t... 147

Fig u r e 6 .9 Fin a l o u t p u tfo rt h e ‘Irishc e n s u s’ a r g u m e n t... 148

Fig u r e 7.1 Su m m a r yo fr e su l t sf r o m P R E S a sr a wd a t aa n dr a n k i n g s... 152

Fig u r e 7 .2 Su m m a r yo fr a wr e s u l t sf r o m P R E S ... 153

Fig u r e 7 .3 Su m m a r yo fr e su l t sf r o m P R E S a n a l y s e da sr a n k in g s... 153

Fig u r e 7 .4 Su m m a r yo fr e su l t so np r im a c y/r e c e n c ye ffec tsin P R E S ... 155

Fig u r e C .l i^EZOi^/oiPROCESsiNG fo rt h e ‘t o u r istfa c il it ys ig n s’ a r g u m e n t...173

Fig u r e C .2 f^ETOiÇ/cü-M iN us-EG p r o c e s s in gfo rt h e ‘t o u r is tfa c il it ys ig n s’ a r g u m e n t 176 Fig u r e C .3 p r o c e s s in gfo rt h e ‘Cl a r e Sh o r t’ a r g u m e n t...177

Fig u r e C .4 ! ^e205Ç/o i-m in u s-E G p r o c e s s in gfo rt h e ‘Cl a r e Sh o r t’ a r g u m e n t... 179

Fig u r e C .5 ! ^e205Ç/o ïp r o c e s s in gf o rt h e ‘Ir ishc e n s u s’ a r g u m e n t... 180

Fig u r e C .6 f^ŒntMÇ/ca-MiNus-EG p r o c e s s in gfo rt h e ‘Ir is hc e n s u s’ a r g u m e n t... 181

(8)

GENERATING ARGUMENTS IN NATURAL LANGUAGE

Acknowledgements

I gratefully acknowledge the financial support provided by EPSRC, and the additional contributions of the following bodies: AISB, Brunei University Department of IS & Computing, Durham University Department of Computer Science, UCAII, SSHRC, UCL Department of Computer Science, and the UCL Graduate School.

During the course of my doctoral studies, I have had the privilege to discuss preliminary (and often incoherent) ideas with eclectic, exciting, stimulating and exceedingly patient researchers. I would like to thank the following for ideas and comments which have contributed to the development of my work: Tony Blair, Robert Dale, Robin Fawcett, John Fox, Maria Fox, James Freeman, Roberto Garigliano, Michael Gilbert, Tom Gordon, David Green, Stephen Green, Leo Groarke, Mandy Haggith, Hans Hansen, Jaakko Hintikka, Graeme Hirst, Helmut Horacek, Ed Hovy, Xiaorong Huang, Kristiina Jokinen, Alistair Knott, Eric Krabbe, Daniel Marcu, Tim Norman, Franck Panaget, Cecile Paris, the late Elsa Pascual, Larry Powers, Ivan Rankin, Ehud Reiter, Francisca Snoeck Henkemans, Bart Verheij, Doug Walton, and John Woods. I would also like to thank my examiners, Donia Scott and Simon Parsons, for their diligent reading of my work and their useful comments.

I have also had the benefit of the support and encouragement of friends and family who put up with me going on and on about argumentation. Thanks to all, and particular thanks to Mum, Dad, John, Aspassia, Nancy and Jon.

For teaching me how to write, for long discussions, for giving me the freedom just to get on with it, for answering the phone and discussing papers in the wee small hours of a deadline day, for an excuse to visit Durham regularly, for occasional motivation boosts, for a scrupulous approach to research, for pointing out my mistakes and helping me towards solutions, for lending me good books that he will now finally get back, for the best vegetarian cooking, and for unbelievable patience, I would like to thank my supervisor, Derek Long. Without Derek, it would not have been nearly so much fun.

(9)

I

Exordium

Persuasive text has a variety and abundance of entomological proportion. Advertisments, editorials, academic papers, letters to the editor, parliamentary speeches, materials for education, religious pamphlets and more, all offer examples of text which aims to alter the beliefs of some audience, and might therefore be considered persuasive. Yet despite the enormously important role played by persuasion in natural communication, only a handful of models have been built to investigate the process by which such text is created. Furthermore, there is a strong trend amongst this work to consider only the logical structure of an argument, and to denigrate textual argument from the status of an elegant, complex interplay between linguistic, psychological and interpersonal factors, to that of little more than a set of propositions. The reasons for this approach are doubtless rooted in pragmatism, expediency, and a desire for simplicity, but the result is a reduction in flexibility and expressiveness to a level at which almost none of the phenomena explored by argumentation theorists from Aristotle onwards can be accounted for. To describe a praying mantis and a giant peacock moth alike, by enumerating legs, wings and antennae, is to miss the point rather.

(10)

I. EXORDIUM 9

representation than alternatives available in the pragmatics community as a whole. In addition to the propensity for argument to employ and manifest a high degree of structuring, it is also clear that most argument involves, specifically, a hierarchical organisation of components. This hierarchical nature can be exploited in the employment of a hierarchical generation technique: such a technique is not only available, but also has a good pedigree in artificial intelligence, and has been demonstrated to be highly suited to the natural language generation (NLG) field. The use of hierarchical planning in NLG is an established approach to which the current work subscribes, but with one important departure with respect to the style of hierarchical planning employed.

One final aspect motivating investigation of argument in particular is the potential utility of a system capable of generating NL argument. Persuasive argument is primarily concerned with shifting belief in a given audience (the picture is actually rather more complex than this, as explained below). Thus wherever it is necessary to alter the beliefs of a user of a system, the ability to generate persuasive argument is critical. Many examples are currently under investigation in NLG - health education materials aimed at inducing a disinclination to smoking, expert systems justifying their conclusions, decision support systems offering critiques of user decisions, etc. Many more can be conceived of - a service provider demonstrating its superiority, generating adverts tailored to a particular audience, creating political campaign materials, etc.

The current work, then, focuses on the genre of persuasive text, and in so doing, cuts through a range of important issues in NLG. One recent concern in NLG has been the role played by intentions in the generation process, contrasting with the primarily informational approach predominant until the early 90’s. The current work proposes an approach which not only integrates informational and intentional facets of generation, but also draws in the attentional component into a single framework, thus uniformly modelling each element of the triumvirate controlling discourse structure. Another current NLG issue addressed concerns the introduction and placement of cue phrases (or clues in the specific of argumentation). A major tenet of the current work is that clue introduction is often dependent upon very high level processing - specifically, that as a clue may function to relate two large segments of text (e.g. a section break), it is appropriate for a generation algorithm to introduce such a clue at the same level of abstraction as the units of text which it connects.

(11)

concentrated predominantly on the logical structure of argument, so there has been an implicit assumption in much relevant generation work that coherency will ensure persuasiveness. Yet psychological research has demonstrated that such a notion of idealised human rationality is mistaken: ordering of pro and con arguments, phrasing, and repetition have all been demonstrated to significantly impact the reception of a particular text. Work in argumentation theory and rhetoric contributes to an exploration of the distinction and its manifestation in text, and from there, its role in the generation process.

In addition to this dualistic underpinning explored primarily through the distinctions between logic and rhetoric, and coherency and persuasive effect, there are also a number of further claims of a more technical nature. These include an approach to the handling of the disjunctive constraints holding over the partial order of a plan, the development of a highly parsimonious representation language, the explicit computational characterisation of the relationship between linked and convergent argument support, and the design of a control structure for high-level processing which departs from the classical pipeline model.

One swathe of NLG issues which has been largely ignored in the current work is that of tactical generation - the problems of deciding which lexical, syntactic, and morphological constructions best convey the intended message. In this division of the generation task between tactical and strategic levels of processing, the work follows the prevailing view in NLG, and similarly has several precedents in its implementation of only the strategic level. Although following this route distances the work from real text, and as a result stores up problems for evaluation, there are a number of benefits, chief of which is the ability to focus - as discussed below, if the assumptions made about other, unimplemented, stages in the framework are realistic (e.g. by reference to existing research investigating those stages), then the decoupling is less worrisome, and provides an opportunity to concentrate solely on higher level functionality.

The theoretical claims that are laid out are further supported through their integration into the

iR^ietOTica ^ system which implements the upper (i.e. more abstract) layers of the generation framework. Each of the various facets of argument generation are formalised and implemented, resulting in a coherent system which not only demonstrates the validity of the theoretical approach, but also offers a means of evaluating the components of that approach empirically.

(12)

I. EXORDIUM 11

these focus on persuasive rather than deductive aspects of argument. Sixth, some few additional rules are suggested merely by intuition - these are primarily features which rhetoricians and psychologists have taken for granted, but which need to be made explicit in an operationalisation. Finally, there are also contributions from the study of a small corpus. This corpus is composed of several dozen arguments taken from a variety of sources, including letters to the editor of a newspaper, editorials and leader comments in magazines, advertisements, and a precis of a monograph in paleontology. In most cases, generalisations of structuring rules observed in this corpus are already covered by components of one of the first six categories, but several additional rules were also extracted. It is the identification, formalisation and integration of rules and heuristics from these various sources which forms the core of the current work.

The thesis is broadly structured in such a way that the two key threads - NLG and argumentation - are first presented and reviewed separately, then introduced to one another and gradually woven together into a coherent whole, from which it is demonstrated that a variety of benefits and avenues for future investigation can be derived. Chapter two performs the initial task of reviewing first NLG research, and then argumentation theory. Though primarily compendious and exegetical, the chapter also identifies problems in argumentation theory, misunderstandings and misconstructions of argument in the computational domain, and current issues in NLG with particular consideration for the generation of argument. Chapter three then offers solutions to some of these issues in the process of dissecting and motivating the major design decisions taken in building the

(13)

n

The Problem Space

The two threads from which the thesis is composed are here introduced and their history explored: firstly, the linguistic components, including a brief scene-setting summary on the theoretical background to many assumptions made in natural language generation (primarily on the nature of speech act theory), followed by a discussion of the dialogic situation and its components. Finally, the key milestones in the development of NLG theory are discussed, and a survey of more recent work presented. The second thread is then introduced with a survey of current work in argumentation theory (again motivated by a brief discussion of the roots of the work), summarising several major models of argument analysis including those founded on social and interpersonal considerations. This work in ‘informal logic’ is then contrasted with more conventional, formal approaches to argument. Such formalisation carries with it a range of problems which are then inherited by computational systems adopting the formal approach: the formalisations, systems and associated problems are explored in order to motivate the framework presented in the next chapter.

2.1 Language

Before the work of Austin, straightforward sentences had generally been regarded in the philosophy community as constative statements with which a truth value could be associated^. In (Austin, 1976), however, it is demonstrated that sentences such as I bet you sixpence it will rain tomorrow cannot be constative - it makes little sense to claim that the statement is true or false. Rather, the sentence itself is an act:

“... to utter the sentence is not to describe my doing of what I should be said in so uttering to be doing or to state that I am doing it: it is to do it.” (Austin, 1976, p6)

(14)

n. THE PROBLEM SPACE 13

is not simply a constative communication, it is also an act - one of stating belief (compare, for example the more explicit performative I state that I believe X).

Austin then goes on to explore the variety of ways in which such performatives can be ‘unhappy’ or infelicitous, i.e. the situations in which they are void or abused. He identifies six constraints on the successful execution of a performative, and each then forms the basis for a characterisation of a particular infelicity. A misapplication occurs in a situation where a conventional procedure is incorrectly instantiated by actors and objects (e.g. a layman declaring two people married); a related but rarer infelicity occurs where such a conventional procedure does not exist at all. Together, these two are termed misinvocations. In a situation where the conventional procedure is incorrectly or incompletely executed, the infelicity is termed misexecution (and, specifically, d ifla wrefers to incorrect and hitch to incomplete execution). Finally, an abuse results firom the insincerity of one or more of the actors, either in not possessing the thoughts and feelings demanded by the conventional procedure (e.g. betting without intending to pay) or in failing to meet constraints on subsequent conduct (e.g. failing to pay a gambling debt).

For the purposes of the current discussion, however, this taxonomy of the infelicities of speech acts is of less direct concern than the subsequent complimentary development of a theory of the necessary and sufficient conditions for the successful performance of speech acts, proposed in (Searle, 1969). Searle builds on Austin’s distinction between locutionary, illocutionary and perlocutionary acts, all of which are present in any utterance. The locutionary component is simply the production of noise which accords with the phonetic, morpohological and syntactic constraints of a language; the illocutionary component characterises the act performed in making the appropriate noise (as opposed to the performance of the act o f making that noise); the perlocutionary component is the effect that that noise may have - the act performed by making the noise. Austin’s example (Austin, 1976, p i 02) distinguishes a locutionary act He said 'Shoot her! ' fi-om a corresponding illocutionary act. He urged me to shoot her, and again from a perlocutionary act. He persuaded me to shoot her. Searle focuses upon the structure of illocutionary acts, and sets out precise prerequisites for their successful execution (Searle, 1969, pp66-7). These prerequisites make use of two types of constraint: those concerning the beliefs of the speaker, and those concerning the intentions of the speaker. For the latter, Searle draws upon Grice’s intention analysis, (Grice, 1957), where it is proposed that “a speaker S meant something by X where S intended the utterance of X to produce some effect in a hearer H by means of the recognition of this intention.” (paraphrased in (Searle, 1969, p43)). Searle develops this proposal in two respects: first by characterising the role played by conventions in determining meaning, and second by modifying the implicit assumption that the speaker intends some perlocutionary effect. This second development is in recognition of the subtle distinction between a hearer understanding a locution, and being persuaded/convinced/etc. The latter is the perlocutionary effect described in the Gricean analysis, the former the illocutionary effect employed in Searle’s definition (though of course the definition then requires some analysis of what it is to ‘understand’ a locution: Searle’s analysis employs both the

^ As Austin points out, earlier philosophy had not failed to recognise that some sentences are nonsensical (in a non- grammatical way) or involve clauses which are not descriptive, hence the restriction on the generalisation to

(15)

characterisation of conventions in determining meaning and the reflexive account of illocutionary effect).

The conditions which must be met for the successful carriage of various classes of illocutionary acts constrain four facets: the prepositional content (e.g. in a promise, that the utterance predicates some future act A of S), the necessary preparatory conditions (e.g. in a promise, that H

would prefer 5 ’s doing A to her not doing A, S believes H has this preference, and S would not normally perform A), the requisite sincerity (e.g. in a promise, that S intends to do A), and the ‘essential’ condition, or what the act constitutes (e.g. in a promise, that S intends that the utterance will place her under an obligation to perform A). The means by which uttering a statement meeting the prepositional content, preparatory and sincerity conditions constitutes the realisation of the essential condition are then defined in terms of the illocutionary effect mentioned above^.

The clarity of Searle’s characterisation of the constraints by which illocutionary acts are bound, and the lack of a formal analysis of the adequacy of his account, led to an attempt by Cohen and Perrault to provide such an analysis through a computational model of speech act arrangement (Cohen and Perrault, 1988). The first premise of their model is that language can be characterised as a goal directed behaviour (and indeed, a similar assumption underpins most research in computational approaches to the pragmatic aspects of natural language). Although a similar assumption is present in the work of both Austin and Searle, it is important to recognise a potential misunderstanding which may be the result either of the translation from Searle to computational work, or from the development of the computational theories over time. In these theories, the goal invariably involves the state of some hearer; in Austin and Searle (and more explicitly in the latter), this need not be the case, particularly as it could lead to the conflation of illocutionary and perlocutionary effect - the disentanglement of which was one o f Searle’s major contributions. Searle gives an example: “I may make a statement without caring whether my audience believes it or not but simply because I feel it my duty to make it.” (p46). Although this example could well be accounted for in terms o f goal fulfilment, it is rather different to the speech act goals explicated in (Cohen and Perrault, 1988).

Cohen and Perrault’s assumption of goal-directed behaviour permits their development of a plan-based approach, viewing individual illocutionary acts as plan operators with specific preconditions (i.e. those described by Searle under the heads ‘prepositional content’, ‘preparatory’, etc.), and

(16)

characteristic effects (i.e. the essential condition/. The planning itself employed by Cohen and Perrault is unremarkable (its predecessors and successors are discussed in more detail in §2.1.3) but its use in organising the pragmatic structure of text represents a crucial development in natural language generation (surveyed in §2.1.4) - despite the primary aim of the paper being one of substantiating claims in the philosophy of language, rather than introducing a new framework for the construction of applied text generation systems. It is interesting to note that speaker intentions at the level of individual illocutionary acts were recognised to play a key role; at the level of larger pieces of text, addressed in subsequent NLG work, this role was heavily underestimated until quite recently. The development of intention-laden approaches is traced in §2.1.4, and the discussion returns to the issue in chapters three and four.

An important development o f Grice’s formulation is presented in (Sperber and Wilson, 1986), which adduces the aforementioned results from Searle, in addition to alternative reformulations by Strawson, Schiffer, etc. (placing particular emphasis on (Strawson, 1964)). Sperber and Wilson focus upon the situated act of communication and its cognitive components. Their starting point is that communication is ostensive behaviour: “behaviour which makes manifest an intention to make something manifest” (p49), where manifestness is defined as “[being] capable of representing [a fact] mentally and accepting its representation as true or probably true” (p39)^. Facts are more or less manifest within an agent’s cognitive environment, a combination of his perceived environment and his cognitive abilities (i.e. of percepts and stored facts). Sperber and Wilson then define the situation of communication as one involving a mutual cognitive environment, the intersection of interlocutors’ individual cognitive environments, in which it is also manifest which agents share it. They also demonstrate (p41) that the notion of a mutual cognitive environment, unlike mutual knowledge or mutual belief, is tractable (the discussion returns in more detail to the problems of mutual belief and the notion of saliency - akin to that of manifestness - in §3.1.6).

One of the key claims in (Sperber and Wilson, 1986) concerns the informative intention of a speaker, and the means by which this can lead to a definition of relevance. Given the definitions of manifestness and the ostention of conununicative behaviour, Sperber and Wilson claim (p58) that a speaker has an informative intention in making an utterance to make (more) manifest a set of facts {/} (though an explicit individuation of {/} is not required). A speaker’s intention is thus to modify the

^ Note that Searle distinguishes the essential condition (p60) “S intends that the utterance T will place him under an obligation to do A” (emphasis added), from the essential rule (p63) ‘T counts as the undertaking of an obligation to do A”. The former constrains the application of the illocutionary act, the latter the linguistic devices used to convey the associated illocutionary force. This distinction seems not to have been recognised by Cohen and Perrault, with the result that they map from the essential nde (which they term the essential condition) to the effect of a planning operator (i.e. an illocutionary act). This seems to lose an important aspect of Searle’s account, in which mapping from the essential condition (i.e. S’s intention that an utterance should count, in the case of a promise, as an obligation to perform A) to the essential rule (i.e. the that the utterance actually does count as that obligation) is nontrivial. Indeed, the mapping requires invocation of the post-Oricean definition of illocutionary effect mentioned above: this invocation is termed by Cohen and Perrault the force condition, about which they state “we have chosen not to deal with the force condition until we have a better understanding of the plans for speech acts and how they can be recognized” (Cohen and Perrault, 1988, pl75). It is unclear to what extent this omission damages the account and those drawing upon it, but it is worrying that the effects of the plan operators - i.e. the illocutionary acts - are characterised in terms of what Searle intended to be viewed as constraints on linguistic realisation.

(17)

cognitive environment of the hearer (and thereby their mutual cognitive environment). On hearing an utterance, an interlocutor thus has a number of sets of information available: information that is already known (and manifest), information manifest in the perceptual environment, and information made manifest through ostensive communication. The first set, {C}, is ‘old’ information; the two others together present ‘new’ information, {F}. There are three potential relations between the two sets {C} and {?}: (i) {F} may be so different as to yield no contextual implications - no new deducible information based on the union o f {C} and {F} (e.g. reading an advanced academic paper in a technical field which is not one’s own); (ii) {F} may be so small or so trivial (with respect to {C}) as to allow no contextual implications (e.g. reading a glossary which serves only to equate terms with which one is already familiar); (iii) (F) may be sufficiently new and interesting - but not so unconnected with {C)

-as to allow a multiplicity o f contextual implications (e.g. reading a technical paper in one’s own field). In the third case only is the information in {F} relevant - the more contextual implications which are derivable, the more relevant the information^. Finally, Sperber and Wilson demonstrate that “every act of ostensive communication communicates the presumption of its own optimal relevance” (pl58), which they term the principle o f relevance - the ‘presumption of optimal relevance’ draws on two parts: first, that the set {/} which the speaker intends to make manifest is sufficiently relevant (using the above definition) to make it worth the while of the hearer to process the communication, and second, that the ostensive act is the most relevant that could have been used to communicate {/}’.

One o f the key developments in (Sperber and Wilson, 1986) is the emphasis placed upon the whole dialogic situation, rather than taking a more restricted, heavily speaker-centred view, as adopted by Searle, and developed explicitly in (Cohen and Perrault, 1988) as the “point of view principle”. The next section is devoted to characterising in more detail the various aspects from which this rich notion of the dialogic situation is comprised.

2.1.1 The Dialogic Situation

To engage in dialogue, a system needs to maintain a model of the user. This necessity arises from a number of demands placed upon such a system. In the first place, a basic understanding o f user utterances relies upon more than just an ability to analyse the communicated message: deictic reference, anaphora, ellipsis and indirect speech acts all require some understanding of the cognitive state of the user (in Sperber and Wilson terms, of the user’s cognitive environment). Secondly, most dialogue (task-oriented, information-giving, etc.) is cooperative in nature; to achieve this cooperation, a system must be able to infer the beliefs, goals, plans, and intentions of the user. A third requirement, extending a minimal level of cooperation, also relies upon competent user modelling: the ability to volunteer unsolicited information, as found in Jameson’s IMP system (Jameson, 1989). Fourthly, the generation of system utterances needs to be tailored to the user: ellipsis, for example, can be tested against a model of the user to determine whether or not it would be understood (using an anticipation feedback loop), as employed in the HAM-ANS system (Jameson and Wahlster, 1982). The same

process can also be used to determine appropriate ajfect (Hovy, 1986) - i.e. how to select appropriate ^ In fact, Sperber and Wilson’s characterisation is rather more subtle than this, in that it considers the relative weights of information in {C} and {F}: this refinement is not relevant to the current discussion.

(18)

lexicalisation to pander to a user’s particular bias, also employed in HAM-ANS (Morik, 1989). Fifthly, the ability of the user to understand various versions of a text should not be assumed: Paris’ TAILOR system (Paris, 1993), for example, distinguished novice and expert users in tailoring explanation construction. Finally, another imperfection in the user’s competence makes a further demand on a system’s user model: the ability to cope with user misconceptions(McCoy, 1986), (Calistri-Yeh, 1991). Together, these demands suggest that “user models constitute an indispensable prerequisite for any flexible dialog [system]’’(Wahlster and Kobsa, 1989, p5) (though Sparck Jones (1991) cautions against building models which are unnecessarily sophisticated).

Although the structure of user models exhibits great diversity, several features are common to a number of systems (Kass and Finin, 1988). The starting point for many user models is a set of default assumptions about a user. Such defaults are often arranged into sets associated with a particular user class, providing a stereotypical characterisation of that class of user. One of the earliest systems to adopt this approach was GRUNDY (Rich, 1989), which requested the user to give a short description of themselves, on the basis of which, GRUNDY would identify appropriate stereotypes. In extensions to the basic notion, individual users could belong to numerous stereotypical classes, with the latter arranged in specificity hierarchies to resolve conflicts (Finin, 1989). If appropriate, specific observations could then override the values determined by reference to the stereotype (thus demanding nonmonotonicity in the supporting reasoning mechanism).

The model itself is typically founded upon some notion of belief, frequently one derived from Hintikka’s (Hintikka, 1962) modal characterisation (though popular alternatives are surveyed in (Wahlster and Kobsa, 1989)). Furthermore, the beliefs of the user are rarely captured adequately by a static model: in order to handle the necessary dynamic nature, a truth maintenance system (either justification based (Doyle, 1979), as used in TRUMP (Bonarini, 1987) or assumption based (de Kleer,

1986) as used in GUMS (Finin, 1989)) is required.

Dynamic update of the user model during a dialogue involves a number of challenging tasks: key amongst these are belief ascription and plan ascription. Though clearly related, the former concentrates on inference drawn at the level of the individual sentence (considering presupposition, affect, etc.) (Jameson, 1989), whilst the latter focuses on higher level recognition of the broader goals and associated plans of the user (Carberry and Pope, 1993), (Goodman and Litman, 1992). A range of approaches have been proposed to the latter problem, including those based on defeasible reasoning (Konolige and Pollack, 1989), and relatedly, abduction (Appelt and Pollack, 1992); those concentrating on coping with ill-formed user input (Eller and Carberry, 1992); and those that recognise the need to constrain the inference process (Mayfield, 1992).

(19)

helping, action-seeking, information-seeking, information-probing, instructing and griping. Associated with each dialogue are a number of roles (to be adopted by the parties playing a particular game), a topic (the prepositional content of the game) and specific subgoals (which the parties undertake to fulfil according to their roles). In a style similar to the ‘adjancy pair’ theory of Schegloff and Sacks (1973), they identify canonical openings and closings to the various dialogue games - typically, for example, a game is initiated with a ‘proposal’ utterance followed by an ‘acceptance’ utterance.

Identification of these dialogue games. Levin and Moore suggest, facilitates an extension to the speech act account summarised above whereby understanding of indirect speech acts (i.e. those whose linguistic structure alone is insufficient to enable identification of illocutionary force, such as “Can you pass the salt?”) can be achieved by using knowledge of the current dialogue game to ‘disambiguate’. A given party attempts to understand an utterance of the other party by adopting the

meta-goal o f comprehension: “To comprehend an utterance, find some previously known goal of the speaker which this utterance can be seen as furthering” (Levin and Moore, 1977, p415) (the approach is thus similar in spirit to the Gricean notion of conversational postulates (Grice, 1975)). In their conclusion. Levin and Moore note that the dialogue games characterisation thus rests upon a bilateral analysis of action, in contrast to the unilateral actions embodied in speech acts; through such an analysis it then becomes possible to reduce the number and complexity of the speech acts involved.

The internal structure of the dialogue games themselves is quite unconstrained - the rules of exchange and of interaction between utterances are only very loosely specified. A more formal approach is provided in the logical accounts of the microstructure of dialogue offered in, for example, (Lorenz, 1982), which sets out a formal characterisation of the rights and duties incurred by interlocutors on a tum-by-tum basis. Concentrating on the structure of reasoned dialogue, Lorenz specifies restrictions on which interlocutor may put forward attacks or defences of a position at particular points in a dialogue. In particular, he claims that attacks are a right (i.e. either party may table an attack at any point) and the defences are a duty (and, specifically, that a defence must be provided in immediate response to an attack unless the defender has a counterattack available). Lorenz employs a game-theoretic foundation in his model, but other authors opt for a variety of alternative techniques: Heidrich (1982) uses a Montague grammar, Leopold-Wildburger (Leopold-Wildburger, 1982) decision theory, and Apostel (1982) action theory, for example. Apostel also makes the point that in addition to dialogue games, there is also a need for meta-games, “having as their object other games or leading to the modification o f the rules of the game” (Apostel, 1982, p i 09) - these are of particular importance in jurisprudential dialogue (Feteris, 1997).

The characterisation of the permissible moves by the interlocutors is also highly heterogeneous across the field: Hintikka (Hintikka and Hintikka, 1982), (Hintikka et a l, 1996), for example, advocates a primarily interrogative approach based upon question-and-answer exchanges, whereas Carlson’s model (Carlson, 1983), offers a rich model encompassing (amongst others) assertions, questions, acceptances, presuppostions and intonation.

(20)

Updated during a dialogue, and (iv) rules of interaction. His system, DL3, is an attempt to generalise over these logics, drawing in particular on his earlier work on DL and DL2, and on the related BQD system of Mackenzie(1979). Mackenzie’s work is of particular interest because it has been shown to be directly amenable to implementation; Pilkington et al. (1992) demonstrate how an implementation of BQD can form a core component in a computer aided learning system.

Indeed, both linguistic approaches to dialogue structure (such as those of (Schegloff and Sacks, 1973) and (Levin and Moore, 1977)), and dialogue logics in general are often specified to a sufficient level of detail to facilitate computational interpretation. Freeman and Farley (Farley and Freeman, 1995), (Freeman and Farley, 1994) offer a recent example of the latter; of the former, accounts of linguistic structure (particularly those based upon Rhetorical Structure Theory, discussed in more detail in §2.1.4 and §3.1.4) extended to integrate a model of exchange structure, have been demonstrated to be of use in natural language generation(Daradoumis, 1996), (Fawcett and Davies, 1992).

One of the most promising routes for computational application of research into dialogue structure, however, is the formalisation of dialogue logics - particularly the rich system developed in (Walton and Krabbe, 1995) - for the communication demanded by increasingly complex multi agent systems.

2.1.2 Agent Communication

Though there is continuing debate over the meaning and scope of the itim agent (Nwana, 1996) (the discussion returns to this point in §2.2.5), it seems clear that in any non-trivial multi-agent system there will need to be a means of communication between component agents, and indeed the capability for an agent to be able to communicate with its peers is often taken as a defining feature of agenthood (Wooldridge and Jennings, 1995) (and in some cases, the sole defining feature (Genesereth and Ketchpel, 1994)).

Though pre-dating much of the multi agent terminology. Smith’s Contract Net (Smith, 1980) represents one of the first examples of a system to employ a common communication language between (mostly) autonomous ‘nodes’ (in Smith’s terminology, the common intemode language). This language comprises around a dozen primitives, each with a specific structure, purpose and set of conditions: though not recognised at the time, the primitives thus bear a striking resemblance to both illocutionary speech acts and the locutions of dialogue logics mentioned by Girle (Girle, 1996). The primary content of these locutions is a simplistic contract, a notion which has recently undergone a revival and can be found in several contemporary systems - as the service level agreement in the ADEPT project (Jennings et a l, 1996), and explicitly as contracts in (Sierra et a l, 1997), (Sierra et a l,

1997a) (Reed, 1998), inter alia. It is these contracts which form the subject of negotiation, a concept examined in more detail with regard to its relation to argumentation in §2.2.5.

(21)

importantly, of the language in which agents (and the content of their messages) are written. Finin et a l

(1997) offer a seven point desiderata for an agent communication language: (i) declarative and simple locutions, (ii) distinction between the communicative ‘layer’ (at which the illocutionary acts are expressed) and the content layer (at which domain facts are expressed - in whichever language is indicated at the communicative layer), (iii) well defined semantics, (iv) efficient implementation, (v) compatibility with networking technology, (vi) ability to cope with dynamic, heterogeneity, (vii) support for reliability and security. KQML performs well under each head, though of particular importance is the recent formalisation of its semantics (Labrou and Finin, 1997).

One of the very few major alternatives to KQML is the speech act based communication component of Shoham’s Agent Oriented Programming (AOP) language, Agent-0 (Shoham, 1993). Agent-0 is closer to the dialogue logic and speech act account of communication than KQML, both because of the explicit operationalisation of illocutionary acts, and the role played by commitment: along with beliefs and obligations, commitments represent a primitive modality in AOP (in contrast to, for example, the belief-desire-intention account of (Rao and Georgeff, 1992)). The notion of commitment, as mentioned above, plays a key role in the definition of dialogue logics (and this point is carefully explicated in (Walton and Krabbe, 1995)). A key disadvantage with AOP however, is that unlike KQML, the communication language is tightly coupled to the design of the agents themselves. It is for this reason that the attempt to support communication between AOP and KQML - Agent-K - is coded as an agent in AOP which can interface to a KQML world (Davies and Edwards, 1994).

As multi agent systems become increasingly complex, so the demands placed upon the communication protocol between agents become ever greater. This complexity has led some authors to propose a more flexible ‘open’ approach to protocol design, whereby the agents themselves can negotiate the protocol to be employed (Vreeswijk, 1995).

Perhaps the highest possible level of complexity of agent communication would be represented by the use of natural language. As an agent-agent communication language, the approach would clearly introduce far more problems than it would solve (although a move in this direction seems to have been taken in the TRAINS project (Allen et a l, 1995)), but as a human-computer interface, designing the computer agent with natural language capabilities is highly desirable (see, for example, (Blandford, 1993)).

Determining an appropriate utterance at a given moment requires some internal processing within the agent (as Turner (1994) points out, given resource bounding of the kind discussed in (Bratman et a l, 1988), even determining what to include in a communication for it to be cooperative is a difficult task). When the utterance is to be expressed in natural language, that processing assumes an even more important role, since there will no longer be a one-to-one relationship between message content and message form. As suggested by (Searle, 1969), deciding upon the appropriate surface form of an utterance is a goal directed activity - planning, along the lines suggested in (Cohen and Perrault,

(22)

variety of approaches to plan-based communication, a brief survey of the wider planning literature is in order®.

2.1.3 Planning

The first three decades or so of planning research have a well rehearsed genealogy, a typical example of which is offered in (Chapman, 1987). The root is usually taken to lie with GPS (Newell and Simon, 1963) and the introduction of means-ends analysis, by which new steps are introduced to a plan to fulfil particular goals: this is step addition^. STRIPS (Fikes and Nilsson, 1971) characterises actions as plan operators with preconditions and postconditions - the latter are lists of what explicitly gets added to or deleted from the world as a result of executing the action. Sussman’s HACKER (Sussman, 1975) introduces in heuristic form many of the ideas which become formalised in later nonlinear planners, particularly the idea of promotion (constraining one operator to precede another). Promotion was first formalised in INTERPLAN (Tate, 1975), but the first truly nonlinear planner was NOAH (Sacerdoti, 1975) which in employing promotion and separation (as well as step addition) offers a straightforward solution to the Sussman anomaly (a scenario which was insoluble by linear planners such as HACKER without recourse to a ‘hack’). NONLIN (Tate, 1977) extends NOAH by adding backtracking to enable better coverage of the search space, and employing simple establishment to force codesignation between a variable and an atom. Stefik’s MOLGEN (Stefik, 1981) formalises constraints (of codesignation and ordering), and in a similarly ‘neat’ spirit. Chapman provides a precise characterisation of nonlinear planning which illustrates the restrictions necessary to ensure soundness and completeness (Chapman, 1987). The characterisation is summarised in the modal truth criterion which is implemented in the TWEAK algorithm (though more recently, it has been argued that the modal truth criterion does not in fact represent a necessary condition (Fox and Long, 1993)).

One of the problems with this tranche of planners is the inflexibility of the underlying representation language which remains virtually unchanged firom STRIPS to TWEAK. There are a number of key problems with STRIPS-based planning languages which have been addressed in more recent research. The first problem is of distinguishing primary effects firom side effects. As discussed in (Fink and Yang, 1997), the distinction can lead to significant computational savings by pruning the search space without compromising either soundness or completeness o f the resulting planning algorithm. One of the first planners to represent primary effects explicitly was Wilkins’ SIPE (Wilkins,

1988), where the distinction was employed in simplifying conflict resolution. The notion has also been developed in explicitly handling the links between one operator’s postconditions and another’s preconditions - ‘causal’ links - which may then be threatened by the existence o f other operators in the partial plan. Resolution of these threats (i.e. the formulation of safety conditions (McAllester and Rosenblitt, 1991)) then forms a key component of the functionality of the planning algorithm. Partial order, causal link (POCL) planners such as UCPOP (Penberthy and Weld, 1992) and SNLP (McAllester and Rosenblitt, 1991) have also been demonstrated to be sound and complete. Acknowledging the demands of recent natural language generation research, the notion of primary

The role of planning mentioned here is in producing an utterance; related work has examined the reverse relationship, i.e. the role of dialogic communication in ‘distributed’ planning, e.g. (Shadbolt, 1992).

(23)

effect has been associated with that of intention (a key component of the generation process, as discussed in the next section) in the development of the DPOCL planner (Young and Moore, 1994) (which although inspired by NLG work is nevertheless a domain independent planner: Young and Moore emphasise the point that discourse planning does not demand a domain specific planner (Young and Moore, 1994a)).

Several other fundamental restrictions on the expressiveness of the STRIPS language are addressed in UCPOP (Penberthy and Weld, 1992) through its adoption of Pednault’s ADL language (Pednault, 1989) which represents a STRIPS-like characterisation of actions in the situation calculus (and was originally formulated for linear planning, as demonstrated in PEDESTAL (McDermott, 1991)). Within operator descriptions, ADL - and therefore UCPOP - permits the representation of (i) conditional effects and (ii) universal quantification, and as Penberthy and Weld point out, although neither of these are individually novel, UCPOP represents the first planner to admit both into a nonlinear framework, and be demonstrably sound and complete. The use of universal quantification introduces another problem: what is the domain of that quantification? More specifically, is it reasonable to make the closed world assumption (Reiter, 1978), and take it that absent information is false? Etzioni et al. (1997) review literature making use of the open-world assumption (that information not explicitly represented is unknown), which has the undesirable effect that universal quantification is simply not possible (since a planner cannot guarantee it is aware of every element in the domain). They go on to propose an alternative approach based upon local closed-world information, whereby information in a limited, local domain is complete (e.g. I s - a can be guaranteed to return a complete list of files).

In order to cope with the complexity o f real world planning domains (exacerbated by the additional overheads of representing the various constraints associated with the codesignation and partial order of nonlinear planning), a technique is required for limiting the combinatorial explosion of the search tree. Sacerdoti proposed a formal characterisation of the intuitive concept of abstraction, whereby a system constructs a plan in an abstract, simplified description of the domain, before moving to a less abstract, more detailed domain description - the details in the latter are not considered until a plan is generated in the former, thus drastically reducing the options available to the planner, without impinging upon soundness or completeness. ABSTRIPS (Sacerdoti, 1974) represents the first abstraction based planner (implementing abstraction within a STRIPS planning framework); the unification of abstraction-based planning and nonlinear planning is formalised in the ABTWEAK system (Yang et a l, 1996).

(24)

to make; the OMP, in contrast, is weaker, and though holding subproblems invariant, does not prevent backtracking through the abstraction hierarchy (Smith and Peot, 1992). (Weaker again is ABTWEAK’s

monotonie protection (Yang et a l, 1996) which ensures that preconditions on an abstract operator are protected at lower levels in the abstraction hierarchy, i.e., they are guaranteed to have at least one establishment which is not necessarily clobbered).

Fox (1997) offers a clear classification of the various approaches to abstraction in hierarchical planning, based upon a tripartite division: Hierarchical Task Network (HTN) decomposition (in which abstraction is achieved by representing compound tasks), model-reduction (in which abstraction is performed by assigning criticality levels to literals), and operator decomposition (in which compound tasks are represented as abstract operators with associated pre- and postconditions). The HTN decomposition approach discussed in (Erol et a l, 1994) includes NOAH, NONLIN, 0-PLAN (Currie and Tate, 1991) and Erol’s own UMCP (Erol et a l, 1994a); model-reduction includes ABSTRIPS, ABTWEAK and ALPINE (Knoblock, 1994); and operator decomposition, DPOCL, UCPOP, SIPE and SNLP. Fox goes on to demonstrate that a concept of abstraction would benefit from aspects of both HTN and operator decomposition approaches, whereby a complete abstract plan undergoes refinement on completion and has a clear semantics*® (HTN decomposition) and manipulates abstract operators representing compound tasks (operator decomposition). Furthermore, the use o f goals rather than operators in the bodies of abstract operators is shown to increase the flexibility of the planner, by eschewing the rigid, prescriptive, ‘recipe’-like abstract operators whose bodies are composed of other operators (such as are used in NONLIN, 0-PLAN and SIPE, inter alia). Fox and Long (1995), (1996) formalise these desirable features in the AbNLP planner, which also provides a much richer characterisation of time (associating, for example, an operator with an inseparable pair of time points, the earlier representing the moment of application - at which point the preconditions must be true - and the latter, the moment at which effects become true).

Increasingly, planning research is also focusing on practical issues of efficiency. A good example is the work of Gerevini and Schubert (1996) on improving the search mechanism used in executing nondeterminism: the approach is a pragmatic one rather than, for example, the less well understood and computationally expensive approach of metaplanning (Stefik, 1981a). Another is the translation mechanism which can map from the rich but computationally expensive domain representation of UCPOP to the concise but impoverished language of an efficient language such as Graphplan (Gazen and Knoblock, 1997). Finally, the computational benefits afforded though adoption of a hierarchical planning approach are catalogued in (Knoblock et a l, 1991), (Bacchus and Yang,

1992), and of automatically generating abstraction hierarchies in (Knoblock, 1994).

The next section explores in more detail the demands made upon planning systems by the natural language generation domain, reviewing the reasons for the current affinity with POCL planners, and motivating the adoption of AbNLP in the current work.

(25)

2.1.4 Natural Language Generation

This section does not aim to review the field of natural language generation (NLG) as a whole. Such an objective would not only be more suited to a textbook, but would also demand inclusion of a range of issues which are not directly relevant to the current work: lexical and syntactic generation, multilingual generation, spoken dialogue, text summarisation, machine translation, natural language understanding, plan recognition, etc. Wider reviews of NLG encompassing these topics, and relating them to research on natural language processing in general can be found in (Cole et a l, 1995), (Dale and Reiter, 1999). Instead, the survey presented here concentrates specifically upon the generation of discourse structure, and the trends which have emerged on that topic over the past twenty years, omitting only review of research into focus and cue phrases which is provided in chapters four and five, respectively.

The ‘prehistory’ of discourse generation was characterised by ad hoc approaches which were difficult to generalise - a survey encompassing many such systems is presented in (Hovy, 1993). One of the first attempts at domain independence was McKeown’s TEXT system (McKeown, 1985). Though applied as a component of an interface to a naval database, the approach she advocated used generic schemas, stereotypical supra-sentence textual structures. Schemas are defined in terms of

rhetorical predicates drawn from stylistic analyses of relations between clauses of text (such as those of Grimes (Grimes, 1975) and (Williams, 1893)) - examples of these rhetorical predicates used by McKeown include Evidence, Amplification and Particular-illustration. Schema-based generation offers a straightforward, fast way of generating paragraph sized chunks of text, and as a result remains a popular method of generation - see for example, (Rambow and Korelsky, 1992) for an applied example of recent schema-based work, Mellish et a l's IDAS system employing schemas for content selection (Mellish et a i, 1996), (Reiter et a l, 1992), (Reiter et a l, 1995), Paris’s TAILOR system (Paris, 1993), Jones’s discourse component of the large scale language engineering project LOLITA (Jones, 1994), and (Jonsson, 1996) for an argument advocating the utility of the schema-based approach in general.

One of the key problems with the schema approach, however, is its inherent inflexibility. To better cope with novel, unexpected generation requirements, and to build larger, coherent pieces of text, a more dynamic approach to generation is required; such is the approach offered by planning. Early plan-based approaches to the generation task were predominantly concerned with generation at the utterance level (seminally, by Appelt (Appelt, 1985) and Cohen (Cohen and Perrault, 1988)). Soon after, a number of systems adopted the planning paradigm for discourse level generation, foremost amongst which was Hovy’s PAULINE, which also integrated stylistic concerns and a sensitivity to the hearer (Hovy, 1986a), (Hovy, 1990).