A Problem for RST: The Need for Multi Level Discourse Analysis

(1)

Multi-Level Discourse Analysis

J o h a n n a D . M o o r e * University of Pittsburgh

M a r t h a E. P o l l a c k * University of Pittsburgh

Rhetorical Structure T h e o r y (RST) (Mann a n d T h o m p s o n 1987), argues that in most coherent discourse, consecutive discourse elements are related b y a small set of

rhetor-

ical relations.

Moreover, RST suggests that the information c o n v e y e d in a discourse o v e r and a b o v e w h a t is c o n v e y e d in its c o m p o n e n t clauses can be d e r i v e d from the rhetorical relation-based structure of the discourse. A large n u m b e r of natural language generation systems rely on the rhetorical relations defined in RST to i m p o s e structure on multi-sentential text ( H o v y 1991; Knott 1991; M o o r e a n d Paris 1989; Ros- ner and Stede 1992). In addition, m a n y descriptive studies of discourse h a v e e m p l o y e d RST (Fox 1987; Linden, C u m m i n g , a n d Martin 1992; Matthiessen and T h o m p s o n 1988). H o w e v e r , recent w o r k b y M o o r e and Paris (1992) n o t e d that RST cannot be u s e d as the sole m e a n s of controlling discourse structure in an interactive dialogue system, because RST representations p r o v i d e insufficient information to s u p p o r t the generation of a p p r o p r i a t e responses to " f o l l o w - u p questions." The basic p r o b l e m is that an RST representation of a discourse does not fully specify the intentional structure (Grosz and Sidner 1986) of that discourse. Intentional structure is crucial for r e s p o n d i n g effectively to questions that address a previous utterance: w i t h o u t a record of w h a t an utterance was i n t e n d e d to achieve, it is impossible to elaborate or clarify that utterance. 1

Further consideration has led us to conclude that the difficulty o b s e r v e d b y M o o r e a n d Paris stems from a m o r e f u n d a m e n t a l p r o b l e m with RST analyses. RST p r e s u m e s that, in general, there will be a single, preferred rhetorical relation h o l d i n g b e t w e e n consecutive discourse elements. In fact, as has been n o t e d in other w o r k on discourse structure (Grosz and Sidner 1986), discourse elements are related

simultaneously

on multiple levels. In this paper, w e focus on two levels of analysis. The first involves the relation b e t w e e n the information c o n v e y e d in consecutive elements of a coherent discourse. Thus, for example, one utterance m a y describe an e v e n t that can be p r e s u m e d to be the cause of a n o t h e r e v e n t described in the s u b s e q u e n t utterance. This causal relation is at w h a t w e will call the

informational level.

The second level of relation results from the fact that discourses are p r o d u c e d to effect changes in the mental state of the discourse participants. In coherent discourse, a speaker is carrying out a consistent plan to achieve the i n t e n d e d changes, and consecutive discourse elements are related to one a n o t h e r b y m e a n s of the ways in w h i c h they participate in that plan. Thus, one utterance m a y be i n t e n d e d to increase the likelihood that the hearer will c o m e to

* Department of Computer Science and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260. e-mail:

jmoore,[email protected]

(2)

Computational Linguistics Volume 18, Number 4

believe the subsequent utterance: we might say that the first utterance is intended to provide evidence for the second. Such an evidence relation is at what we will call the intentional level.

RST acknowledge s that there are two types of relations between discourse elements, distinguishing between subject matter and presentational relations. According to Mann and Thompson, "[s]ubject matter relations are those whose intended effect is that the [hearer] recognize the relation in question; presentational relations are those whose intended effect is to increase some inclination in the [hearer]" (Mann and Thompson 1987, p. 18). 2 Thus, subject matter relations are informational; presentational relations are intentional. However, RST analyses presume that, for any two consecutive elements of a coherent discourse, one rhetorical relation will be primary. This means that in an RST analysis of a discourse, consecutive elements will either be related by an informational or an intentional relation.

In this paper, we argue that a complete computational model of discourse structure cannot depend upon analyses in which the informational and intentional levels of relation are in competition. Rather, it is essential that a discourse model include both levels of analysis. We show that the assumption of a single rhetorical relation between consecutive discourse elements is one of the reasons that RST analyses are inherently ambiguous. 3 We also show that this same assumption underlies the problem observed by Moore and Paris. Finally, we point out that a straightforward approach to revising RST by modifying the definitions of the subject matter relations to indicate associated presentational analyses (or vice versa) cannot succeed. Such an approach presumes a one-to-one mapping between the ways in which information can be related and the ways in which intentions combine into a coherent plan to affect a hearer's mental state--and no such mapping exists. We thus conclude that in RST, and, indeed, in any viable theory of discourse structure, analyses at the informational and the intentional levels must coexist.

To illustrate the problem, consider the following example.

A n E x a m p l e E x a m p l e 1

(a) George Bush supports big business. (b) He's sure to veto House Bill 1711.

A plausible RST analysis of (1) is that there is an EVIDENCE relation between utterance (b), the nucleus of the relation, and utterance (a), the satellite. This analysis is licensed by the definition of this relation (Mann and Thompson 1987, p. 10):

R e l a t i o n n a m e : EVIDENCE

C o n s t r a i n t s o n N u c l e u s : H might not believe Nucleus to a degree satisfactory to S.

C o n s t r a i n t s o n Satellite: H believes Satellite or will find it credible.

2 Mann and Thompson analyzed primarily written texts, and so speak of the "writer" and "reader." For consistency with much of the rest of the literature on discourse structure, we use the terms "speaker" and "hearer" in this paper, but nothing in our argument depends on this fact.

(3)

C o n s t r a i n t s o n N u c l e u s + S a t e l l i t e c o m b i n a t i o n : H's comprehending Satellite increases H's belief of Nucleus.

Effect: H's belief of Nucleus is increased.

However, an equally plausible analysis of this discourse is that utterance (b) is the nucleus of a VOLITIONAL CAUSE relation, as licensed by the definition (Mann and Thompson 1987, p. 58):

R e l a t i o n n a m e : VOLITIONAL-CAUSE

C o n s t r a i n t s o n N u c l e u s : p r e s e n t s a volitional action or else a situation that could have arisen from a volitional action.

C o n s t r a i n t s o n S a t e l l i t e : n o n e .

C o n s t r a i n t s o n N u c l e u s + S a t e l l i t e c o m b i n a t i o n : Satellite presents a situation that could have caused the agent of the volitional action in Nucleus to perform that action; without the presentation of Satellite, H might not regard the action as motivated or know the particular motivation; Nucleus is more central to S's purposes in putting forth the Nucleus-Satellite combination than Satellite is.

Effect: H recognizes the situation presented in Satellite as a cause for the volitional action presented in Nucleus.

It seems clear that Example I satisfies both the definition of EVIDENCE, a presentational relation, and VOLITIONAL CAUSE, a subject matter relation. In their formulation of RST, Mann and Thompson note that potential ambiguities such as this can arise in RST, but they argue that one analysis will be preferred, depending on the intent that the analyst ascribes to the speaker:

Imagine that a satellite provides evidence for a particular proposition expressed in its nucleus, and happens to do so by citing an at- tribute of some element expressed in the nucleus. T h e n . . . the condi- tions for both EVIDENCE and ELABORATION are fulfilled. If the analyst sees the speaker's purpose as increasing the hearer's belief of the nuclear propositions, and not as getting the hearer to recognize the o b j e c t : a t t r i b u t e relationship, then

the only analysis

is the one with the EVIDENCE relation (Mann and Thompson 1987, p. 30, emphasis ours).

(4)

2. The Argument from Interpretation

We begin b y s h o w i n g that in discourse interpretation, recognition m a y flow f r o m the informational level to the intentional level or vice versa. In other words, a hearer m a y be able to d e t e r m i n e w h a t the speaker is trying to do because of w h a t the hearer k n o w s a b o u t the w o r l d or w h a t she k n o w s a b o u t w h a t the speaker believes a b o u t the world. Alternatively, the hearer m a y be able to figure out w h a t the speaker believes a b o u t the w o r l d b y recognizing w h a t the speaker is trying to d o in the discourse. This point has p r e v i o u s l y b e e n m a d e b y Grosz a n d Sidner (1986, pp. 188-190). 4

Returning to o u r initial example

Example 1

(a) George Bush s u p p o r t s big business. (b) H e ' s sure to veto H o u s e Bill 1711.

s u p p o s e that the hearer k n o w s that H o u s e Bill 1711 places stringent e n v i r o n m e n t a l controls on m a n u f a c t u r i n g processes, s F r o m this she can infer that s u p p o r t i n g big business will cause one to o p p o s e this bill. Then, because she k n o w s that one w a y for the speaker to increase a h e a r e r ' s belief in a p r o p o s i t i o n is to describe a plausible cause of that proposition, she can conclude that (a) is i n t e n d e d to increase her belief in (b), i.e., (a) is evidence for (b). The hearer reasons f r o m informational coherence to intentional coherence.

Alternatively, s u p p o s e that the hearer has no idea w h a t H o u s e Bill 1711 legislates. H o w e v e r , she is in a conversational situation in w h i c h she expects the speaker to s u p p o r t the claim that Bush will veto it. For instance, the speaker and hearer are arguing a n d the hearer has asserted that Bush will not veto a n y additional bills before the next election. Again using the k n o w l e d g e that one w a y for the speaker to increase her belief in a p r o p o s i t i o n is to describe a plausible cause of that proposition, the hearer in this case can conclude that H o u s e Bill 1711 m u s t be s o m e t h i n g that a big business s u p p o r t e r w o u l d o p p o s e - - i n o t h e r w o r d s that (a) m a y be a cause of (b). H e r e the reasoning is f r o m intentional coherence to informational coherence. N o t e that this situation illustrates h o w a discourse can c o n v e y m o r e than the s u m of its parts. The speaker not o n l y c o n v e y s the propositional content of (a) a n d (b), b u t also the implication relation b e t w e e n (a) and (b): s u p p o r t i n g big business entails opposition to H o u s e Bill 1711. 6

It is clear from this e x a m p l e that a n y interpretation system m u s t be capable of recognizing both intentional a n d informational relations b e t w e e n discourse elements, a n d m u s t be able to use relations r e c o g n i z e d at either level to facilitate recognition at the other level. We are not claiming that interpretation always d e p e n d s on the recognition of relations at both levels, b u t rather that there are obvious cases w h e r e it does. An interpretation s y s t e m therefore n e e d s the capability of maintaining b o t h levels of relation.

4 In Grosz and Sidner (1986), dominates and satisfaction-precedence are the intentional relations, while supports and generates are the informational relations.

5 The hearer also needs to believe that it is plausible the speaker holds the same belief; (see Konolige and Pollack 1989).

(5)

3. The Argument from Generation

It is also crucial that a generation system have access to both the intentional and informational relations u n d e r l y i n g the discourses it produces. For example, consider the following discourse:

S:

H:

(a) Come h o m e by 5:00. (b) Then we can go to the h a r d w a r e store before it closes.

(c) We d o n ' t need to go to the hardware store. (d) I borrowed a saw from Jane.

At the informational level, (a) specifies a CONDITION for doing (b): getting to the hardware store before it closes depends on H's coming h o m e b y 5:00. 7 H o w should S respond w h e n H indicates in (c) a n d (d) that it is not necessary to go to the h a r d w a r e store? This depends on w h a t S's intentions are in uttering (a) and (b). In uttering (a), S m a y be trying to increase H's ability to perform the act described in (b): S believes that H does not realize that the hardware store closes early tonight. In this case, S m a y respond to H by saying:

S: (e) OK, I'll see y o u at the usual time then.

On the other hand, in (a) a n d (b), S m a y be trying to motivate H to come h o m e early, say because S is planning a surprise party for H. Then she m a y respond to H with something like the following:

S: (f) Come h o m e by 5:00 anyway. (g) Or else you'll get caught in the storm that's m o v i n g in.

W h a t this example illustrates is that a generation system cannot rely only on informational level analyses of the discourse it produces. This is precisely the point that Moore and Paris have noted (1992). If the generation system is playing the role of S, then it needs a record of the intentions u n d e r l y i n g utterances (a) a n d (b) in order to determine h o w to respond to (c) a n d (d). Of course, if the system can recover the intentional relations from the informational ones, then it will suffice for the system to record only the latter. However, as Moore a n d Paris have argued, such recovery is not possible because there is not a one-to-one m a p p i n g between intentional a n d informational relations.

The current example illustrates this last point. At the informational level, utterance (a) is a CONDITION for (b), but on one reading of the discourse there is an ENABLEMENT relation at the intentional level between (a) a n d (b), while on another reading there is a MOTIVATION relation. Moreover, the nucleus/satellite structure of the informational level relation is maintained only on one of these readings. Utterance (b) is the nucleus of the CONDITION relation, and, similarly, it is the nucleus of the ENABLEMENT relation on the first reading. However, on the second reading, it is utterance (a) that is the nucleus of the MOTIVATION relation.

(6)

Just as one cannot always recover intentional relations from informational ones, neither can one always recover informational relations from intentional ones. In the second reading of the current example, the intentional level MOTIVATION relation is realized first with a CONDITION relation b e t w e e n (a) and (b), and, later, with an OTH- ERWISE relation in (f) a n d (g).

4. D i s c u s s i o n

We have illustrated that natural language interpretation a n d natural language generation require discourse m o d e l s that include b o t h the informational and the intentional relations b e t w e e n consecutive discourse elements. RST includes relations of b o t h types, b u t commits to discourse analyses in w h i c h a single relation holds b e t w e e n each pair of elements.

One m i g h t imagine m o d i f y i n g RST to include multi-relation definitions, i.e., definitions that ascribe b o t h an intentional a n d an informational relation to consecutive discourse elements. Such an a p p r o a c h was suggested b y H o v y (1991), w h o a u g m e n t e d rhetorical relation definitions to include a "results" field. A l t h o u g h H o v y did not cleanly separate intentional f r o m informational level relations, a version of his ap- p r o a c h m i g h t be d e v e l o p e d in w h i c h definitions are given only for informational (or, alternatively, intentional) level relations, a n d the results field of each definition is used to specify an associated intentional (informational) relation. H o w e v e r , this a p p r o a c h cannot succeed, for several reasons.

First, as w e have argued, there is not a fixed, one-to-one m a p p i n g b e t w e e n intentional a n d informational level relations. We s h o w e d , for example, that a CONDITION relation m a y hold at the informational level b e t w e e n consecutive discourse elements at the same time as either an ENABLEMENT or a MOTIVATION relation holds at the intentional level. Similarly, w e illustrated that either a CONDITION or an OTHERWISE relation m a y hold at the informational level at the same time as a MOTIVATIONAL relation holds at the intentional level.

Thus, an a p p r o a c h such as H o v y ' s that is based on multi-relation definitions will result in a proliferation of definitions. Indeed, there will be potentially n x m relations created from a t h e o r y that initially includes n informational relations a n d m intentional relations. Moreover, b y c o m b i n i n g informational a n d intentional relations into single definitions, one m a k e s it difficult to p e r f o r m the discourse analysis in a m o d u l a r fashion. As w e s h o w e d earlier, it is sometimes useful first to recognize a relation at one level, a n d to use this relation in recognizing the discourse relation at the o t h e r level.

In addition, the multi-relation definition a p p r o a c h faces an e v e n m o r e severe chal- lenge. In some discourses, the intentional structure is not m e r e l y a relabeling of the informational structure. A simple extension of o u r p r e v i o u s example illustrates the point:

S: (a) C o m e h o m e b y 5:00. (b) T h e n w e can go to the h a r d w a r e store before it closes. (c) That w a y w e can finish the bookshelves tonight.

(7)

wishes H to p e r f o r m (recall that S is p l a n n i n g a surprise p a r t y for H). This structure is illustrated below:

motivation

b c

At the informational level, this discourse has a different structure. Finishing the bookshelves is the nuclear proposition. C o m i n g h o m e b y 5:00 (a) is a condition on going to the h a r d w a r e store (b), and together these are a condition on finishing the bookshelves (c):

condition

a

c

The intentional and informational structures for this discourse are not isomorphic. Thus, they cannot be p r o d u c e d simultaneously b y the application of multiple-relation definitions that assign two labels to consecutive discourse elements. The most obvious "fix" to RST will not work. RST's failure to a d e q u a t e l y s u p p o r t multiple levels of analysis is a serious p r o b l e m for the theory, b o t h from a c o m p u t a t i o n a l a n d a descriptive point of view.

A c k n o w l e d g m e n t s

We are grateful to Barbara Grosz, Kathy McCoy, C6cile Paris, Donia Scott, Karen Sparck Jones, and an anonymous reviewer for their comments on this research. Johanna Moore's work on this project is being supported by grants from the Office of Naval Research Cognitive and Neural Sciences Division and the National Science Foundation.

References

Appelt, Douglas E. (1985). Planning English Sentences. Cambridge University Press. Fox, Barbara (1987). Discourse Structure and

Anaphora: Written and Conversational English. Cambridge University Press. Grosz, Barbara J., and Sidner, Candace L.

(1986). "Attention, intention, and the structure of discourse." Computational Linguistics, 12(3):175-204.

Hovy, Eduard H. (1991). "Approaches to the planning of coherent text." In Natural Language Generation in Artificial Intelligence and Computational Linguistics, edited by C6cile Paris, William Swartout, and William Mann, 83-102. Kluwer Academic Publishers.

(8)

Intelligence, University of Edinburgh. Konolige, Kurt, and Pollack, Martha E.

(1989). "Ascribing plans to agents: Preliminary report." In Proceedings, Eleventh International Joint Conference on Artificial Intelligence. Detroit, MI, 924-930. Vander Linden, Keith; Cumming, Susanna; and Martin, James (1992). "Using system networks to build rhetorical structures." In Proceedings, Sixth International Workshop on Natural Language Generation. Berlin Heidelberg, Germany, 183-198. Mann, William C., and Thompson,

Sandra A. (1987). "Rhetorical structure theory: A theory of text organization." USC/Information Sciences Institute Technical Report Number RS-87-190, Marina del Rey, CA.

Matthiessen, Christian, and Thompson, Sandra (1988). "The structure of discourse and 'subordination.' " In Clause Combining

in Grammar and Discourse, edited by

J. Benjamins. Amsterdam.

Moore, Jo:hanna D., and Paris, C6cile L. (1992). "Planning text for advisory dialogues: Capturing intentional, rhetorical and attentional information." Technical Report 92-22, University of Pittsburgh Computer Science Department, Pittsburgh, PA.

Moore, Johanna D., and Paris, C6cile L. (1989). "Planning text for advisory dialogues." In Proceedings, Twenty-Seventh Annual Meeting of the Association for Computational Linguistics. Vancouver, B.C., 203-211.

Rosner, Dietmar, and Stede, Manfred (1992). "Customizing RST for the automatic production of technical manuals." In Proceedings, Sixth International Workshop on Natural Language Generation. Berlin Heidelberg, Germany, 199-215.