• No results found

Paraphrasing Questions Using Given and new information

N/A
N/A
Protected

Academic year: 2020

Share "Paraphrasing Questions Using Given and new information"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Paraphrasing Q u e s t i o n s Using

Given and N e w I n f o r m a t i o n 1

K a t h l e e n R. M c K e o w n

C o m p u t e r S c i e n c e D e p a r t m e n t C o l u m b i a U n i v e r s i t y N e w Y o r k , N Y 10027

The design and implementation of a paraphrase component for a natural language question-answering system (CO-OP) is presented. The component is used to produce a paraphrase of a user's question to the system, which is presented to the user before the question is evaluated and answered. A major point made is the role of given and new information in formulating a paraphrase that differs in a meaningful way from the user's question. A description is also given of the transformational grammar that is used by the paraphraser.

1. Introduction

In a natural language interface to a data base query system, a p a r a p h r a s e r can be used to ensure that the system has correctly u n d e r s t o o d the user. Such a par- a p h r a s e r has b e e n d e v e l o p e d as p a r t of the CO-OP system ( K a p l a n 1979). In CO-OP, an internal r e p r e - sentation of the user's question is passed to the p a r a - p h r a s e r , which t h e n g e n e r a t e s a n e w version of the question for the user. U p o n seeing the p a r a p h r a s e , the user has the o p t i o n of r e p h r a s i n g h e r / h i s q u e s t i o n b e f o r e the system a t t e m p t s to answer it. Thus, if the question was not i n t e r p r e t e d correctly, the error can be caught b e f o r e a possibly lengthy search of the data base is initiated. F u r t h e r m o r e , the user is assured that the answer s h e / h e receives is an answer to the ques- tion asked and not to a deviant version of it.

The idea of using a p a r a p h r a s e r in the a b o v e w a y is

not new. To date, other s y s t e m s have used canned

templates to f o r m p a r a p h r a s e s , filling in e m p t y slots in the p a t t e r n with i n f o r m a t i o n f r o m the user's question (Waltz 1978, C o d d 1978). In CO-OP, a t r a n s f o r m a - tional g r a m m a r is used to g e n e r a t e the p a r a p h r a s e f r o m an internal r e p r e s e n t a t i o n of the question. M o r e o v e r , the CO-OP p a r a p h r a s e r generates a question that differs in a m e a n i n g f u l w a y f r o m the original question. It m a k e s use of a distinction b e t w e e n given

1 This work was carried out in the D e p a r t m e n t of C o m p u t e r and I n f o r m a t i o n Science, T h e University of P e n n s y l v a n i a . It was partially s u p p o r t e d by an IBM Fellowship, and by NSF grants M C S 78-08401 and M C S 7 9 - 1 9 1 7 1 .

and new i n f o r m a t i o n to indicate to the user the exis- tential presuppositions m a d e in h e r / h i s question.

2. Overview of the CO-OP System

T h e CO-OP s y s t e m is aimed at infrequent users of d a t a

base query systems. T h e s e casual users are likely to

be unfamiliar with c o m p u t e r s y s t e m s and unwilling to invest the time n e e d e d to learn a f o r m a l query lan- guage. Being able to c o n v e r s e n a t u r a l l y in English enables such p e r s o n s to tap the i n f o r m a t i o n available in a data base.

In order to allow the q u e s t i o n - a n s w e r i n g process to p r o c e e d naturally, CO-OP follows s o m e of the " c o - o p e r a t i v e principles" of c o n v e r s a t i o n (Grice 1975). In particular, the s y s t e m a t t e m p t s to find m e a n i n g f u l answers to failed questions by addressing any incorrect a s s u m p t i o n s the questioner m a y h a v e m a d e in h e r / h i s question. W h e n the direct r e s p o n s e to a q u e s t i o n would be simply " n o " or " n o n e " , CO-OP gives a m o r e i n f o r m a t i v e r e s p o n s e b y c o r r e c t i n g the q u e s t i o n e r ' s m i s t a k e n assumptions.

The false a s s u m p t i o n s that CO-OP corrects are the existential p r e s u p p o s i t i o n s of the questions. 2 Since these presuppositions can be c o m p u t e d f r o m the sur- face structure of the question, a large store of s e m a n - tic k n o w l e d g e for inferencing p u r p o s e s is not needed.

2 For e x a m p l e , in the q u e s t i o n " W h i c h users work on projects s p o n s o r e d by N A S A ? " , the s p e a k e r m a k e s the existential p r e s u p p o - sition that there are projects s p o n s o r e d by N A S A .

Copyright 1983 by the A s s o c i a t i o n for C o m p u t a t i o n a l Linguistics. Permission to copy without fee all or part of this material is g r a n t e d provided that the copies are not m a d e for direct commercial a d v a n t a g e and the Journal r e f e r e n c e and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee a n d / o r specific permission.

0 3 6 2 - 6 1 3 X / 8 3 / 0 1 0 0 0 1 - 10503.00

(2)

K a t h l e e n R. M c K e o w n P a r a p h r a s i n g Q u e s t i o n s Using G i v e n and N e w Information

In fact, a lexicon and data base schema are the only items that contain domain-specific information. Con- sequently, the CO-OP system is a p o r t a b l e one; a change of data base requires that only these two knowledge sources be modified.

3. The CO-OP Paraphraser

CO-OP's paraphraser provides the only means of error

checking for the casual user. If the user is familiar

with the system, s h e / h e can ask to have the intermedi- ate results printed, in which case the parser's o u t p u t

and the formal data base q u e r y will be shown. T h e

naive user, h o w e v e r , is unlikely to u n d e r s t a n d these results. It is for this reason that the paraphraser was designed to respond in English.

T h e use of English to p a r a p h r a s e queries creates several problems. The first is that natural language is

inherently ambiguous. A paraphrase must clarify the

system's interpretation of possible ambiguous phrases in the user's q u e s t i o n w i t h o u t introducing additional ambiguity.

One particular type of ambiguity that a paraphraser must clarify and avoid re-introducing is caused by the

linear n a t u r e of sentences. A modifying relative

clause, for example, f r e q u e n t l y c a n n o t be placed di- rectly after the noun phrase it modifies. In such cases, the semantics of the sentence may indicate the correct choice of modified noun phrase, but occasionally the

sentence may be genuinely ambiguous. F o r example,

q u e s t i o n (A) below has two i n t e r p r e t a t i o n s , b o t h

equally plausible. The speaker could be referring to

b o o k s dating f r o m the '60s or to c o m p u t e r s dating f r o m the '60s.

(A) Which students read books on computers dating

from the '60s?

A second problem in paraphrasing English queries is the possibility of generating the exact question that was originally asked. If a grammar were developed to simply generate English from an underlying represent- ation of the question, this possibility could be realized. Instead, a m e t h o d must be devised that can determine how the phrasing should differ from the original.

The CO-OP p a r a p h r a s e r addresses b o t h the problem

of ambiguity and the rephrasing of the question. It

makes the system's interpretation of the question ex- plicit by b r e a k i n g d o w n the clauses of the q u e s t i o n and reordering them depending upon their function in the sentence. Thus, question (A) above will result in either paraphrase (B) or (C), reflecting the interpreta- tion the system has chosen.

(B) Assuming that t h e r e are b o o k s o n c o m p u t e r s (those c o m p u t e r s date f r o m the '60s), which students read those books?

(C) Assuming that there are b o o k s on c o m p u t e r s (those b o o k s date f r o m the '60s), which stu- dents read those books?

T h e m e t h o d a d o p t e d g e n e r a t e s a p a r a p h r a s e that differs f r o m the original except in cases where no rela- tive clauses or prepositional phrases were used. It was f o r m u l a t e d on the basis of a distinction b e t w e e n given and new i n f o r m a t i o n and indicates to the user the presuppositions s h e / h e has made in the question (in the " a s s u m i n g t h a t " clause), while focusing h e r / h i s attention on the attributes of the class s h e / h e is inter- ested in.

4. Linguistic Background

As m e n t i o n e d earlier, the lexicon and the data base are the sole sources of world knowledge for CO-OP. While this design increases CO-OP's portability, it means that little semantic i n f o r m a t i o n is available for the paraphraser's use. C o n t e x t u a l i n f o r m a t i o n is also limited since no running history or c o n t e x t is main- tained for a user session in the current version. T h e input the p a r a p h r a s e r r e c e i v e d f r o m the parser is a syntactic parse tree of the question. Using this infor- mation, the p a r a p h r a s e r must construct a question that

differs in phrasing from the original. T h e following

question must t h e r e f o r e be addressed:

What reasons are there for choosing one syntac- tic form of expression over a n o t h e r ?

Some linguists maintain that word o r d e r is a f f e c t e d by functional roles elements play within the sentence. 3 T e r m i n o l o g y used to describe the types of roles that can occur varies widely. Some of the distinctions that have b e e n d e s c r i b e d include g i v e n / n e w , t o p i c / c o m - ment, t h e m e / r h e m e , and p r e s u p p o s i t i o n / f o c u s . Defini- tions of these terms, however, are not consistent. 4

N e v e r t h e l e s s , one influence on e x p r e s s i o n does appear to be the interaction of sentence c o n t e n t and the beliefs of the speaker concerning the knowledge of the listener. Some elements in the s e n t e n c e function in conveying i n f o r m a t i o n the speaker assumes is pres- ent in the " c o n s c i o u s n e s s " of the listener ( C h a f e 1976). This i n f o r m a t i o n is said to be c o n t e x t u a l l y d e p e n d e n t , either by virtue of its presence in the pre- ceding discourse or b e c a u s e it is part of the s h a r e d

world k n o w l e d g e of the dialog participants. In a

q u e s t i o n - a n s w e r i n g system, s h a r e d w o r l d - k n o w l e d g e

3 Some o t h e r influences on syntactic expression are discussed in Morgan and G r e e n 1973. T h e y suggest that stylistic reasons, in addition to some of the functions discussed here, determine w h e n different syntactic c o n s t r u c t i o n s are to be used. T h e y point out, for example, that the passive tense is often used in academic prose to avoid identification of agent and to lend a scientific flavor to the text.

4 F o r example, see Prince 1979 for a discussion of various usages of " g i v e n / n e w " .

(3)

Kathleen R. M c K e o w n Paraphrasing Q u e s t i o n s Using Given and N e w I n f o r m a t i o n

refers to information the speaker assumes is present in the data base. I n f o r m a t i o n functioning in the role just described has been termed " g i v e n " .

" N e w " labels all information in the sentence that is presented as not retrievable from context. In the dec- larative, elements functioning in asserting information that the listener is presumed not to know are called new. In the question, elements functioning in c o n v e y - ing what the speaker wants to know (i.e., what s h e / h e doesn't know) represent information the speaker pre-

sumes the listener is not already aware of. Firbas

1974 identifies additional functions in the question. Of these, (ii) is used here to augment the interpreta- tion of new information. He says (p. 31):

(i) it indicates the want of knowledge on the part of the inquirer and appeals to the informant to satisfy this want.

(ii) [a] it imparts knowledge to the informant in that it informs him what the inquirer is interested in (what is on her/his mind) and [b] from what particular angle the intimated want of knowl- edge is to be satisfied.

Although word o r d e r vis-a-vis these and related distinctions has been discussed in light of the declara- tive sentence, less has been said about the interroga-

tive form. Halliday 1967 and Krizkova s are among

the few to have analyzed the question. Despite the

fact that they arrive at different conclusions, 6 the two follow similar lines of reasoning. Krizkova argues that b o t h the wh-item of the w h - q u e s t i o n and the finite verb (e.g., " d o " or " b e " ) of the y e s / n o question point to the new information to be disclosed in the response. These elements, she claims, are the only unknowns to

the questioner. Halliday, in discussing the y e s / n o

question, also argues that the finite verb is the only unknown. The polarity of the text is in question and the finite element indicates this.

In this paper the i n t e r p r e t a t i o n of the u n k n o w n elements in the question as dfined by Krizkova and

Halliday is followed. The wh-items, in defining the

questioner's lack of knowledge, act as new informa- tion. Firbas's analysis of the functions in questions is used to further elucidate the role of new information in questions. The remaining elements are given infor-

mation. T h e y represent information assumed by the

questioner to be true of the data base domain. This

5 S u m m a r y by Firbas 1974 of the u n t r a n s l a t e d article " T h e Interrogative S e n t e n c e and Some p r o b l e m s of the So-called F u n c - tional S e n t e n c e Perspective ( C o n t e x t u a l O r g a n i z a t i o n of the Sen- t e n c e ) , " N A S A Rec. 4, 1968.

6 It should be n o t e d that Halliday and K r i z k o v a discuss the u n k n o w n s in the q u e s t i o n in order to define the t h e m e a n d r h e m e of a question. A l t h o u g h they agree a b o u t the u n k n o w n s for the q u e s t i o n e r , they disagree a b o u t which e l e m e n t s f u n c t i o n as t h e m e and which f u n c t i o n as rheme. A full d i s c u s s i o n of their analysis a n d c o n c l u s i o n s is given in M c K e o w n 1979.

labeling of information within the question will allow the c o n s t r u c t i o n of a natural paraphrase, avoiding ambiguity.

5. Formulation

Following the analysis d e s c r i b e d above, the CO-OP paraphraser breaks down questions into given and new

information. More specifically, an input question is

divided into three parts, of which (2) and (3) form the new information.

1. given information

2. lack of knowledge (ii[a] from Firbas above) 3. angle (ii[b] from Firbas above)

In terms of the question components, part (2) is indicated by the question with no subclauses 7 as it defines the lack of knowledge of the hearer. Part (3) is indicated by the direct and indirect modifiers of the interrogative words as t h e y define the angle f r o m

which the question was asked. T h e y identify the at-

tributes of the missing i n f o r m a t i o n for the hearer. Part (1) is f o r m e d from the remaining clauses.

As an example, consider question (D):

(D) Which division of the computing facility works

on projects using o c e a n o g r a p h y research?

Following the outline a b o v e , part (2) of the para- phrase will be the q u e s t i o n minus the subclauses: " W h i c h division works on p r o j e c t s ? " Part (3), the modifiers of the interrogative words, will be " o f the computing facility", which modifies " w h i c h division".8 The remaining clause " p r o j e c t s using o c e a n o g r a p h y r e s e a r c h " is considered given information. The three parts can then be assembled into a natural sequence:

(E) Assuming that there are projects using oceanog-

raphy research, which division works on those projects? L o o k for a division of the computing facility. 9

I n f o r m a t i o n belonging to each of the three categor- ies occurred in question (D). If one of these types of information is missing, the question will be p r e s e n t e d minus the initial or concluding clauses. Only part (2) of the paraphrase will invariably occur. N o t e that this means that if there are no clauses in the original ques- tion corresponding to parts (1) and (2) (i.e., the ques- tion contains no relative clauses, prepositional phrases,

7 Here, s u b c l a u s e s are d e f i n e d as relative clauses, p r e p o s i t i o n - al phrases, a n d adjectival phrases.

8 N o t e t h a t this phrase also identifies a p r e s u p p o s i t i o n of the q u e s t i o n e r . F o r the p a r a p h r a s e , however, its f u n c t i o n to precisely specify w h a t the q u e s t i o n e r is i n t e r e s t e d in (which is n e w i n f o r m a - tion for the hearer) is of greater i m p o r t a n c e .

9 This e x a m p l e , as well as sample q u e s t i o n s and p a r a p h r a s e s that follow, were t a k e n f r o m actual s e s s i o n s with the p a r a p h r a s e r . Q u e s t i o n (A) a n d its possible p a r a p h r a s e s (B) a n d ( C ) were n o t run on the s y s t e m .

(4)

K a t h l e e n R. McKeown Paraphrasing Questions Using Given and N e w Information

or adjectival phrases), the p a r a p h r a s e m a y be the same as the original question.

If m o r e t h a n one clause occurs in a particular cate- gory, the question will be f u r t h e r splintered. Addi- tional given i n f o r m a t i o n is p a r e n t h e s i z e d following the " a s s u m i n g that . . . " clause. E x a m p l e (F) below illus- trates the p a r a p h r a s e for a question containing several clauses of given i n f o r m a t i o n and no clauses defining specific attributes of the missing information. Clauses containing i n f o r m a t i o n c h a r a c t e r i z e d by c a t e g o r y (3) will be p r e s e n t e d as s e p a r a t e sentences following the s t r i p p e d - d o w n question. ( G ) b e l o w d e m o n s t r a t e s a p a r a p h r a s e c o n t a i n i n g m o r e t h a n o n e clause of this t y p e of information.

(F) Q: Which users w o r k on projects in o c e a n - o g r a p h y that are s p o n s o r e d b y N A S A ? P: A s s u m i n g that there are projects in o c e a n -

o g r a p h y (those p r o j e c t s are s p o n s o r e d by N A S A ) , which users w o r k on those projects?

( G ) Q: Which p r o g r a m m e r s in superdivision 5000 f r o m the A S D group are advised b y T h o m a s Wirth?

P: Which p r o g r a m m e r s are advised b y T h o m a s Wirth? L o o k for p r o g r a m m e r s in superdivi- sion 5000. T h e p r o g r a m m e r s m u s t be f r o m the A S D group.

6. Implementation O v e r v i e w

T h e p a r a p h r a s e r ' s first step in processing is to r e f o r m the parse tree it is given so that the main v e r b occurs as the r o o t of the new tree. This is done to simplify the identification of given and new i n f o r m a t i o n in the parse. T h e tree is then divided into three s e p a r a t e trees reflecting the division of given and new i n f o r m a - tion in the question. The design of the tree allows for a simple set of rules that flatten the tree. T h e final stage of processing in the p a r a p h r a s e r is translation. In the translation phase, labels in the p a r s e r ' s r e p r e - s e n t a t i o n are t r a n s l a t e d into their c o r r e s p o n d i n g words. During this process, n e c e s s a r y t r a n s f o r m a t i o n s of the g r a m m a r are p e r f o r m e d u p o n the string.

6.1 The phrase structure tree

In its initial processing, the p a r a p h r a s e r t r a n s f o r m s the p a r s e r ' s r e p r e s e n t a t i o n into one that is m o r e c o n v e n - ient for g e n e r a t i o n purposes. T h e resultant structure is a tree that highlights certain syntactic f e a t u r e s of

the question. This initial processing gives the p a r a -

p h r a s e r some i n d e p e n d e n c e f r o m the CO-OP system. W e r e the p a r s e r ' s r e p r e s e n t a t i o n c h a n g e d or the c o m - p o n e n t m o v e d to a new system, only the initial proc- essing phase would n e e d to be modified.

T h e p a r a p h r a s e r ' s p h r a s e s t r u c t u r e tree uses the main v e r b of the question as the root n o d e of the tree.

The subject of the main v e r b is the r o o t node of the left subtree, the o b j e c t (if there is o n e ) the r o o t n o d e of the right subtree. In the c u r r e n t system, the use of binary relations in the p a r s e r ' s r e p r e s e n t a t i o n 10 c r e a t e s the illusion that e v e r y v e r b or p r e p o s i t i o n has a sub- ject and object. T h e p a r a p h r a s e r ' s tree does allow for the r e p r e s e n t a t i o n of o t h e r c o n s t r u c t i o n s should the incoming language use them.

N o t e t h a t the use of binary relations in the i n c o m - ing parse tree to r e p r e s e n t the v e r b s and p r e p o s i t i o n s of a s e n t e n c e m e a n s that modifiers of v e r b s are r e p r e - sented as modifiers of their o b j e c t s ( a n d thus h a n g o f f the o b j e c t in the p a r a p h r a s e r ' s r e f o r m e d tree). While this is not the usual i n t e r p r e t a t i o n of questions using such c o n s t r u c t i o n s , it f u n c t i o n s a d e q u a t e l y f o r b o t h CO-OP and the p a r a p h r a s e r as illustrated by a h y p o t h - etical p a r a p h r a s e f o r such a question, s h o w n b e l o w in (H):

( H ) Q: Which p r o g r a m m e r s w o r k e d on o c e a n - o g r a p h y p r o j e c t s in 1972?

P: A s s u m i n g that there w e r e o c e a n o g r a p h y p r o j e c t s in 1972, which p r o g r a m m e r s w o r k e d on those p r o j e c t s ?

E a c h of the p a r a p h r a s e s u b t r e e s r e p r e s e n t s o t h e r clauses in the question. B o t h the subject and the o b - ject of the main v e r b will h a v e a s u b t r e e f o r e a c h o t h - er clause it participates in. If a n o u n in one of these clauses also participates in a n o t h e r clause in the sen- tence, it too will h a v e subtrees.

As an e x a m p l e , consider the question: " W h i c h ac- tive users advised b y T h o m a s Wirth w o r k on p r o j e c t s in area 3 ? " T h e p h r a s e structure tree used in the p a r - a p h r a s e r is s h o w n in Figure 1. Since " w o r k o n " is identified as the main v e r b of the question b y the p a r - ser, it will be the r o o t n o d e of the tree. " u s e r s " is r o o t of the left subtree, " p r o j e c t s " of the right. E a c h n o u n participates in one o t h e r clause a n d t h e r e f o r e has one subtree. Modifiers are closely b o u n d to the n o u n they m o d i f y a n d are t r e a t e d as p r o p e r t i e s of the n o u n (i.e., e a c h n o d e in the tree that is modified has a p r o p - erty called " m o d i f i e r s " whose value is a n y adjectival or n o u n modifier). In Figure 1, modifiers are s h o w n as p a r t of the node label f o r clarity. S u b t r e e nodes (the leaves of Figure 1) h a v e three pieces of i n f o r m a - tion associated with them:

• the relation b e t w e e n the node and its p a r e n t , • the n o u n phrase the node r e p r e s e n t s , and • an indication of w h e t h e r the n o d e functions as

subject or o b j e c t in the clause.

10 See Kaplan 1979 for a description of Meta Q u e r y L a n - guage, or MQL.

(5)

Kathleen R. M c K e o w n Paraphrasing auestions Using Given and N e w Information

w o r k on

/

active users

/

advised by T h o m a s Wirth object

\

projects

\

in area 3 object

Figure 1.

6 . 2 Dividing the tree

The c o n s t r u c t e d tree is c o m p u t a t i o n a l l y suited for the t h r e e - p a r t p a r a p h r a s e . T h e tree is f l a t t e n e d after it has b e e n divided into subtrees containing given infor-

m a t i o n a n d the two types of new information. T h e splitting of the tree is a c c o m p l i s h e d by first extracting the t o p m o s t smallest p o r t i o n of the tree containing the wh-item. At the v e r y least, this will include the r o o t node plus the left and right s u b t r e e r o o t nodes. This p o r t i o n of the tree is the s t r i p p e d - d o w n question. T h e clauses t h a t define the p a r t i c u l a r a s p e c t f r o m which the question is asked are f o u n d by searching the left and right subtrees for the w h - i t e m or questioned noun. T h e subtree whose r o o t node is the w h - i t e m contains these clauses. N o t e that this m a y be the entire left or right subtree or m a y only be a s u b t r e e of one of these. T h e r e m a i n d e r of the tree r e p r e s e n t s given i n f o r m a - tion. Figure 2 illustrates this division for the previous example.

Q :

P:

Pt. 3

(new)

Which active users advised by T h o m a s Wirth w o r k on projects in area 3?

Assuming that there are projects in area 3, which active users work on these projects? advised by T h o m a s Wirth.

Pt. 2 i n f o r m a t i o n (new)

Pt. 1 i n f o r m a t i o n (given)

[image:5.612.73.232.53.177.2]

L o o k for users

Figure 2.

6.3 Flattening

If the structure of the phrase structure tree is

Tree: Subtree:

R R'

/\

/\

A B A' B'

Figure 3.

with A the left subtree and B the right, then the fol- lowing rules define the flattening process:

T R E E ~ A R B

S U B T R E E ~ R w A v B v

In other words, the top level of the tree (shown on the left in Figure 3) is linearized by an i n - o r d e r traversal while each of its subtrees (shown on the right in Fig- ure 3) is linearized by a p r e - o r d e r traversal. In the e x a m p l e shown in Figure 2, part (2) of the tree corre- sponds to the top level of the tree and will undergo

in-order linearization, and parts (1) and (3) are the subtrees, which will be linearized by a p r e - o r d e r trav-

ersal. The use of two traversals to linearize the tree

stems f r o m the fact that different types of i n f o r m a t i o n are stored at nodes at different levels in the tree. As a node in a subtree has three pieces of i n f o r m a t i o n asso- ciated with it, one m o r e rule is required to e x p a n d a node. A node consists of:

• arc-label

set-label • s u b j e c t / o b j e c t

where arc-label is the label of a binary relation in the input parse tree (i.e., a v e r b or preposition) and set- label is the label of a set in the input parse (i.e., n o u n phrase). T h e input parse is in MQL r e p r e s e n t a t i o n , which consists of sets and b i n a r y relations b e t w e e n them. S u b j e c t / o b j e c t indicates w h e t h e r the s u b - n o d e n o u n p h r a s e f u n c t i o n s as s u b j e c t or o b j e c t in the clause; it is used b y the s u b j e c t - a u x t r a n s f o r m a t i o n and does not apply to the e x p a n s i o n rule. In Figure 2, the leaves of the tree carry these three pieces of infor- mation. F o r example, the l e f t m o s t leave has arc-label

advised by, set-label Thomas Wirth, and is labeled as

[image:5.612.79.504.250.377.2] [image:5.612.93.224.479.586.2]
(6)

K a t h l e e n R. M c K e o w n P a r a p h r a s i n g Q u e s t i o n s U s i n g G i v e n a n d N e w I n f o r m a t i o n

the object of the relation. T h e following rule e x p a n d s a subtree node:

N O D E -~ A R C - L A B E L S E T - L A B E L

T h e tree of given i n f o r m a t i o n is f l a t t e n e d first. It is p a r t of the left or right subtree of the phrase struc- ture tree and t h e r e f o r e is f l a t t e n e d b y a p r e - o r d e r traversal. It is during the f l a t t e n i n g stage t h a t the words " A s s u m i n g that there [be] . . . " are inserted to introduce the clause of given information. " b e " will agree with the subject of the clause. Following these rules, the tree of given i n f o r m a t i o n in Figure 2 would be f l a t t e n e d by a p r e - o r d e r traversal yielding " p r o j e c t s in a r e a # 6 " ( R ' A t a r c - l a b e l s e t - l a b e l ) . A f t e r the " A s s u m i n g t h a t " clause is inserted, this p o r t i o n of the p a r a p h r a s e is " A s s u m i n g that there be projects in area # 6 " . If there is m o r e t h a n one clause, p a r e n t h e s e s are inserted around the additional ones.

T h e tree r e p r e s e n t i n g the s t r i p p e d - d o w n question is f l a t t e n e d next, using the in-order traversal. A p p l y i n g this process to Part (2) of the tree in Figure 2 yields the phrase " w h active users w o r k on p r o j e c t s " (A R B). (In final processing stages, the c o r r e c t d e m o n s t r a - tive ( " t h o s e " or " t h a t " ) is selected to m o d i f y nouns already m e n t i o n e d in the first part of the p a r a p h r a s e . )

T h e tree that r e p r e s e n t s modifiers of the questions n o u n is linearized to follow these phrases. A p r e - order traversal of this p o r t i o n of the tree in Figure 2 yields " u s e r s advised b y T h o m a s W i r t h " (R t A T arc- label s e t - l a b e l ) . A n y m o d i f i e r s of a n o u n (here, " a c t i v e " ) are o m i t t e d in this part of the p a r a p h r a s e if they have already b e e n m e n t i o n e d . T h e p h r a s e " L o o k f o r " is inserted b e f o r e the first clause of modifiers.

T w o t r a n s f o r m a t i o n s are applied during the flatten- ing process. T h e y are w h - f r o n t i n g a n d s u b j e c t - a u x inversion. O t h e r t r a n s f o r m a t i o n s are applied following the f l a t t e n i n g p r o c e s s to p r o d u c e s e n t e n c e s in final g r a m m a t i c a l form.

6.4 T r a n s f o r m a t i o n s

T h e g r a m m a r used in the p a r a p h r a s e is a t r a n s f o r m a -

tional one. In addition to the basic flattening rules

d e s c r i b e d a b o v e , the following t r a n s f o r m a t i o n s are used:

~

w h - f r o n t i n g n e g a t i o n d o - s u p p o r t

s u b j e c t - a u x inversion t e n s e - p l a c e m e n t c o n t r a c t i o n h a s - d e l e t i o n

T h e c u r v e d lines indicate the o r d e r i n g restrictions. T h e r e are t w o c o n n e c t e d groups of t r a n s f o r m a t i o n s . If w h - f r o n t i n g applies, t h e n so will d o - s u p p o r t , subject- aux inversion, and t e n s e - p l a c e m e n t . T h e second group

SD: X -

1

SC: 2 + 1

I n p u t to rule:

of t r a n s f o r m a t i o n s is i n v o k e d t h r o u g h the application of negation. It includes d o - s u p p o r t , c o n t r a c t i o n , and t e n s e - p l a c e m e n t . H a s - d e l e t i o n is not a f f e c t e d b y the a b s e n c e or p r e s e n c e of o t h e r t r a n s f o r m a t i o n s . A de- scription of the t r a n s f o r m a t i o n rules follows. T h e rules used here are b a s e d on a n a l y s e s d e s c r i b e d b y A k m a j i a n and H e n y (1975) and b y Cullicover (1976).

T h e rule f o r w h - f r o n t i n g is specified as follows, w h e r e SD stands f o r s t r u c t u r a l d e s c r i p t i o n a n d SC, structural changes. E a c h rule is f o l l o w e d b y an e x a m - ple input string and the string a f t e r it has u n d e r g o n e the t r a n s f o r m a t i o n . T h e full tree for the string is not shown, but the string is labeled b y m a r k e r s in the SD.

NP - Y

2 3

0 3

1 2

i I I I

programmers in division 5 past plur work on wh projects?

T r a n s f o r m e d input:

2 1

i I I t

wh projects programmers in division 5 past plur work on?

T h e first step in the i m p l e m e n t a t i o n of w h - f r o n t i n g

is a search of the tree f o r the wh-item. A slightly

different a p p r o a c h is used f o r p a r a p h r a s i n g t h a n would be used if simply generating a question f r o m the input parse. T h e difference occurs b e c a u s e in the original question the NP to be f r o n t e d m a y be the h e a d n o u n of s o m e relative clauses or p r e p o s i t i o n a l phrases. If generating, these clauses would be f r o n t e d along with the h e a d noun. Since the clauses of the original ques- tion are b r o k e n d o w n for the p a r a p h r a s e , it will n e v e r be the case w h e n p a r a p h r a s i n g t h a t the NP to be f r o n t - ed also d o m i n a t e s relative clauses or p r e p o s i t i o n a l phrases. F o r this r e a s o n , the applicability of w h - fronting is testing for and is applied in the flattening process of the s t r i p p e d - d o w n question. N o t e that the phrase m a r k e r s (or categories) of e a c h w o r d are re- tained as the tree is f l a t t e n e d a n d thus the SD's c a n be m a t c h e d against b o t h the tree and its linearized v e r - sion. If w h - f r o n t i n g applies, only o n e w o r d n e e d be m o v e d to the initial position.

T h e p a r a p h r a s e r is c a p a b l e of g e n e r a t i n g E n g l i s h f r o m the input as well as p a r a p h r a s i n g (see Section 7). W h e n g e n e r a t i o n is being d o n e , the a p p l i c a b i l i t y of w h - f r o n t i n g is tested f o r i m m e d i a t e l y b e f o r e flattening. If the t r a n s f o r m a t i o n applies, the tree is split. T h e subtree of which the w h - i t e m is the r o o t is f l a t t e n e d s e p a r a t e l y f r o m the r e m a i n d e r of the tree a n d is a t t a - ched in f r o n t e d position to the string resulting f r o m flattening the o t h e r part.

A f t e r w h - f r o n t i n g has b e e n applied, d o - s u p p o r t is invoked. In CO-OP, the underlying r e p r e s e n t a t i o n of

(7)

K a t h l e e n R. M c K e o w n P a r a p h r a s i n g Q u e s t i o n s U s i n g G i v e n a n d N e w I n f o r m a t i o n

1 SC: 1 condition:

I n p u t to rule:

1

t !

wh projects

the q u e s t i o n does n o t c o n t a i n m o d a l s of auxiliary verbs• Thus, fronting the w h - i t e m necessitates supply-

ing an auxiliary• T h e following rule is used for do-

support:

SD: NP - NP - tense - n u m - V - X

1 2 3 4

SC: 1 2 + d o 3 4

condition: 1 d o m i n a t e s wh

In_nput to rule:

1 2

• l I 5 I

'wh projects p r o g r a m m e r s in division

3

'past plur w o r k on?'

T r a n s f o r m e d input:

1 2 + d o

, , ,

wh projects p r o g r a m m e r s in division 5 do

3

I I

past plur w o r k on?

S u b j e c t - a u x inversion is activated i m m e d i a t e l y af- terwards. Again, if w h - f r o n t i n g is applied, s u b j e c t - a u x inversion will apply also. T h e rule is:

SD: NP - NP - AUX - X

2 3 4

3 + 2 0 4

1 d o m i n a t e s wh

I

p r o g r a m m e r s in division 5 3

do

'past plur w o r k on~

T r a n s f o r m e d input:

1 3 2

f ] ~ I

wh projects do p r o g r a m m e r s in division 5

4

past plur w o r k on?

T e n s e - p l a c e m e n t follows s u b j e c t - a u x inversion• Tense, n u m b e r , and negation (if p r e s e n t ) are attributes of all v e r b s in the p a r s e r ' s r e p r e s e n t a t i o n • W h e n an auxiliary is generated, the tense, n u m b e r , and n e g a t i o n are m o v e d f r o m the v e r b to the auxiliary• Formally:

SD: X - AUX - Y - t e n s e - n u m ( - n o - ) V - Z

1 2 3 4 5 6

SC: 1 2 + 4 3 0 5 6

I n p u t to rule:

1 2 3

• ' p r o g r a m m e r s in division 5 IWh projects r~o ~

4 5

' p a s t plur ~ 1work on?'

T r a n s f o r m e d input:

1 2 4

, , f'-"l , ,

Wh p r o j e c t s do past plur

3 5

t , I I

p r o g r a m m e r s in division 5 w o r k on?

S o m e t r a n s f o r m a t i o n a l a n a l y s e s p r o p o s e t h a t w h - fronting and s u b j e c t - a u x inversion apply to the relative

clause as well as the question. In the CO-OP p a r a -

phraser, the h e a d - n o u n is p r o p e r l y positioned b y the flattening process and w h - f r o n t i n g need not be used. S u b j e c t - a u x inversion, h o w e v e r , m a y be applicable. In cases where the h e a d noun of the clause is not its sub- ject, s u b j e c t - a u x inversion results in the p r o p e r order•

T h e rule for n e g a t i o n is tested during the transla- tion phase of execution. It has b e e n formalized as:

SD: X - t e n s e - n u m - V - NP - Y

2 3 4

2 + n o 3 4

4 m a r k e d as negative 1

SC: 1 condition:

I n p u t to rule:

1

'wh students'

2 3

I II I

pres plur h a v e advisors?

(advisors has p r o p e r t y " n e g " )

T r a n s f o r m e d input:

1 2 + n o 3

'wh students' 'pres plur hav~ ' no ' fadvisors?l

In the CO-OP r e p r e s e n t a t i o n , an indication of n e g a t i o n is carried on the object of a binary relation (see K a - plan 1979)• W h e n generating an English r e p r e s e n t a - tion of the question, it is possible in s o m e cases to e x p r e s s n e g a t i o n as m o d i f i c a t i o n of the n o u n (see question ( H ) below)• In all cases, h o w e v e r , n e g a t i o n can be indicated as part of the v e r b (see version (I) of question ( H ) ) . T h e r e f o r e , w h e n the o b j e c t is m a r k e d as negative, the p a r a p h r a s e r m o v e s the n e g a t i o n to b e c o m e part of the v e r b a l element•

( H ) Which students have no advisors?

(I) Which students d o n ' t h a v e advisors?

In English, the negative m a r k e r is a t t a c h e d to the auxiliary of the v e r b a l e l e m e n t and, t h e r e f o r e , as was the case for questions, an auxiliary must be generated• D o - s u p p o r t is used. T h e rule f o r d o - s u p p o r t a f t e r

(8)

K a t h l e e n R. McKeown Paraphrasing Questions Using Given and New I n f o r m a t i o n

n e g a t i o n differs f r o m the one used a f t e r wh-fronting. T h e y are p r e s e n t e d this w a y for clarity, but could have b e e n c o m b i n e d into one rule.

SD: X - t e n s e - n u m - V - n o - Y

1 2 3

SC: 1 d o + 2 3

I n p u t to rule:

1 2 3

i I ! I ' I

wh students pres plur have no advisors?

T r a n s f o r m e d input:

1 do + 2 3

I I I

'wh s t u d e n t s " do " p r e s plur have no advisors?

T e n s e - p l a c e m e n t , as d e s c r i b e d a b o v e , m o v e s the tense, n u m b e r , and n e g a t i o n f r o m the v e r b to the aux- iliary verb. T h e cycle of t r a n s f o r m a t i o n s i n v o k e d t h r o u g h application of n e g a t i o n is c o m p l e t e d with the c o n t r a c t i o n t r a n s f o r m a t i o n . T h e s t a t e m e n t of the c o n t r a c t i o n t r a n s f o r m a t i o n is:

SD: X - d o + t e n s e - n u m - V - n o - Y

1 2 3 4 5

SC: 1 # 2 + n ' t # 3 0 5

I n p u t to rule:

1 2 3 4 5

I i I I I I II I

' w h students do pres plur h a v e no advisors?

T r a n s f o r m e d rules:

1 # 2 + n ' + # 3 0 5

, i , I I , I I

wh students # d o + p r e s + p l u r + n ' t # h a v e advisors?

where # indicates that the result must be t r e a t e d as a unit f o r f u r t h e r t r a n s f o r m a t i o n s . T h e m o r p h o l o g y routines will c o m b i n e the result to p r o d u c e " d o n ' t " .

c o r r e c t i v e r e s p o n s e t h a t could be g e n e r a t e d b y the p a r a p h r a s e r if (J) were asked:

(J) Which p r o g r a m m e r s in division 3 w o r k on p r o - jects in o c e a n o g r a p h y ?

(K) I d o n ' t k n o w of any p r o j e c t s in o c e a n o g r a p h y .

A l t e r n a t i v e suggestions are also used b y the CO-OP s y s t e m w h e n the direct r e s p o n s e to the u s e r ' s question is negative. If an incorrect p r e s u p p o s i t i o n is r e m o v e d f r o m a question, the resulting question m a y no longer h a v e a n e g a t i v e r e s p o n s e . 11 In such cases, CO-OP suggests the wider class question to the user as a pos- sible interest. CO-OP p a s s e s the MQL r e p r e s e n t i n g this question to the p a r a p h r a s e r , which g e n e r a t e s the English for the suggestion. A s e q u e n c e like (J), (K) a b o v e might be followed b y the alternative suggestion (L):

(L) But y o u might be i n t e r e s t e d in p r o g r a m m e r s in division 3 that w o r k on a n y projects.

F o r b o t h types of responses, the p a r a p h r a s e r gener- ates the r e s p o n s e using the p a r a p h r a s e functions with m i n o r differences. T h e flattening process for g e n e r a - tion differs f r o m t h a t used for p a r a p h r a s e s in t h a t the tree is not divided into s u b t r e e s r e p r e s e n t i n g given and new i n f o r m a t i o n and, t h e r e f o r e , the tree is f l a t t e n e d as a whole. T h e t r a n s f o r m a t i o n a l g r a m m a r also applies to the g e n e r a t i o n p r o c e s s , with the o n e d i f f e r e n c e being the p o i n t at which the a p p l i c a b i l i t y of w h - fronting is tested for (described in Section 6.4). O t h e r t h a n these c h a n g e s a n d the use of d i f f e r e n t leading phrases (e.g., " B u t y o u might be i n t e r e s t e d in . . . " ) , the g e n e r a t i o n process is the s a m e as the p a r a p h r a s e r process. T h e g e n e r a t i o n f u n c t i o n is general e n o u g h that it could be used for o t h e r types of r e s p o n s e s in cases w h e n s o m e t h i n g o t h e r t h a n a direct r e s p o n s e is needed.

8. Related Research

7. O t h e r Features of t h e Paraphraser

The p a r a p h r a s e r is used for a second p u r p o s e in addi- tion to p a r a p h r a s i n g . It can g e n e r a t e an English ver- sion of the p a r s e r ' s r e p r e s e n t a t i o n as well as p a r a - phrase in the t h r e e - p a r t form. This function uses the same p r o c e d u r e s and g r a m m a r as the t h r e e - p a r t p a r a - phraser, but the tree is not split into three s e p a r a t e trees b e f o r e being flattened.

In CO-OP, g e n e r a t i o n is used to p r o d u c e alternative suggestions a n d c o r r e c t i v e r e s p o n s e s . A c o r r e c t i v e r e s p o n s e is used to c o r r e c t the user's false p r e s u p p o s i - tions. W h e n an existential p r e s u p p o s i t i o n e n c o d e d in the question is incorrect, the p o r t i o n of MQL r e p r e - senting the failed p r e s u p p o s i t i o n (this is d e t e r m i n e d b y CO-OP) is passed to the p a r a p h r a s e r , which g e n e r a t e s the corrective response. F o r example, (K) b e l o w is a

At the time of the CO-OP p a r a p h r a s e r i m p l e m e n t a t i o n , two m a i n o t h e r p a r a p h r a s e r s h a d b e e n d e v e l o p e d a n d i m p l e m e n t e d f o r d a t a b a s e q u e s t i o n - a n s w e r i n g sys- tems:

• PLANES, W a l t z et al. 1978;

• RENDEZVOUS V e r s i o n 1, C o d d 1978.

B o t h s y s t e m s used

templates

to f o r m the p a r a p h r a s e s . T e m p l a t e s are c a n n e d English p h r a s e s (or s e n t e n c e s ) containing slots that m a y be filled with d i f f e r e n t words to p r o d u c e a v a r i e t y of full English phrases.

T h e PLANES s y s t e m g e n e r a t e s the p a r a p h r a s e f r o m the f o r m a l d a t a b a s e q u e r y using templates. T h e p r o c -

ess involves three specific actions. English words are

substituted for a n y a b b r e v i a t i o n s or code n a m e s in the

l l S e e K a p l a n 1 9 7 9 f o r d e t a i l s o n d e t e r m i n i n g t h e m o s t a p -

p r o p r i a t e a l t e r n a t i v e s u g g e s t i o n .

(9)

Kathleen R. M c K e o w n Paraphrasing Questions Using Given and N e w Information

d a t a base query, using a table look-up. A single ap- p r o p r i a t e p a r a p h r a s e t e m p l a t e is selected for use b a s e d on the query, and the slots in the t e m p l a t e are t h e n

filled with words and phrases f r o m the query. T h e

m a j o r e f f o r t in designing this kind of s y s t e m is in the f o r m a t i o n , by hand, of t e m p l a t e s suitable for the p a r - ticular data base and for the types of questions that can be asked. An e x a m p l e of an English question and the PLANES p a r a p h r a s e for it are s h o w n b e l o w in (M):

(M) Q: H o w m a n y flights did plane 3 m a k e in J a n 73?

P: PLANES searches the MONTHLY FLIGHT and MAINTENANCE SUMMARIES and returns: T h e value of TOTAL FLIGHTS for plane SERIAL #3 during J a n u a r y 1973.

T h e RENDEZVOUS s y s t e m also g e n e r a t e s the p a r a - p h r a s e f r o m the f o r m a l q u e r y using t e m p l a t e s , al- though it is slightly m o r e sophisticated than Waltz's. T h e r e are three parts to generation, and two types of templates are used. A h e a d e r t e m p l a t e c o r r e s p o n d i n g to the type of query is chosen first. T h e r e are three types of queries in the s y s t e m (FIND, EXIST, COUNT), of which FIND occurs m o s t frequently. T h e h e a d e r for FIND is PRINT THE ... EVERY .... w h e r e the dots must be filled in. The second part to the p a r a p h r a s e is the target list. It specifies the attributes r e q u e s t e d by the user and is supplied by doing a table l o o k - u p on the attribute. T h e third p a r t of the p a r a p h r a s e is called the body. It is f o r m e d by extracting templates f r o m tables, a s s o c i a t e d with particular items in the query, that specify restrictions on the r e q u e s t e d values. An example of a query and the p a r a p h r a s e g e n e r a t e d by RENDEZVOUS is shown in (N) below.

(N) Q: I w a n t to find certain projects. Pipes were sent to t h e m in Feb. 1975.

P: Print the n a m e of e v e r y p r o j e c t to which a s h i p m e n t of a part n a m e d pipe was sent during F e b r u a r y 1975.

The goals of the RENDEZVOUS g e n e r a t i o n c o m p o - nent are i m p o r t a n t ones. The g e n e r a t e d English must be u n a m b i g u o u s , e a s y to u n d e r s t a n d , discriminating, and not misleading ( C o d d 1978). I n s t e a d of d e v e l o p - ing a general solution to achieve these goals, h o w e v e r , the r e s e a r c h seems to be c o n c e n t r a t e d on p a r t i c u l a r examples which d o n ' t m e e t these criteria. This results in part f r o m the use of templates. The t e m p l a t e s must be constructed b e f o r e h a n d for a particular data base, and great care must be t a k e n to choose phrases that can be easily p a t c h e d t o g e t h e r with a variety of other phrases. U n f o r e s e e n interaction b e t w e e n j u x t a p o s e d phrases is a p r o b l e m that frequently arises. Such an a p p r o a c h necessitates looking at particular examples, instead of the general f r a m e w o r k .

In b o t h of these s y s t e m s , the use of t e m p l a t e s m e a n s that the m a j o r e f f o r t in developing the s y s t e m

must be d o n e by hand in f o r m a t t i n g the English p h r a s - es. All questions that will be asked m u s t be anticipat- ed a h e a d of time, and a l t h o u g h the s y s t e m s c a n be e x t e n d e d b y adding n e w templates, undesirable inter- actions b e t w e e n new and old t e m p l a t e s must be specif- ically a v o i d e d , and e a c h n e w r e q u i r e d addition does not ease the addition of s u b s e q u e n t templates. N o t e that this m e a n s c o v e r a g e in a t e m p l a t e s y s t e m is also difficult to specify.

T h e use of a g r a m m a r in the CO-OP p a r a p h r a s e r m a k e s it m o r e flexible t h a n these earlier p a r a p h r a s e r s :

• less w o r k must be done by hand in formulating the system,

• i n t e r a c t i o n s b e t w e e n t e m p l a t e s are not a p r o b l e m since the g r a m m a r d e t e r m i n e s h o w to c o m b i n e words and phrases in an a c c e p t a b l e way, and • the s y s t e m is c a p a b l e of handling new questions for

which it has not b e e n explicitly p r e p a r e d , as long as they fall within the syntactic range of the system. T h e p a r a p h r a s e r ' s ability to p e r f o r m the g e n e r a t i o n task described in the previous section nicely illustrates its flexibility. N o t e f u r t h e r m o r e that the CO-OP p a r a - p h r a s e r specifically addresses the p r o b l e m s of disam- biguating relative clause m o d i f i c a t i o n in a general w a y and of generating a p a r a p h r a s e that differs f r o m the original question on a theoretical basis, issues not ad- dressed by either the PLANES or the RENDEZVOUS p a r a p h r a s e r .

9. C o n c l u s i o n s

T h e p a r a p h r a s e r d e s c r i b e d here is a s y n t a c t i c one. While this w o r k has e x a m i n e d the reasons for different f o r m s of e x p r e s s i o n , additions m u s t be m a d e in the a r e a of semantics. T h e s u b s t i t u t i o n of s y n o n y m s , phrases, or idioms for portions or all of the question requires an e x a m i n a t i o n of the e f f e c t of c o n t e x t on word m e a n i n g and of the intentions of the s p e a k e r on word or phrase choice. T h e lack of a rich s e m a n t i c base and contextual i n f o r m a t i o n dictated the syntactic a p p r o a c h used here, but the p a r a p h r a s e r can be ex- t e n d e d o n c e a wider r a n g e of i n f o r m a t i o n b e c o m e s available.

W h e n testing the i m p l e m e n t a t i o n of the CO-OP system and extending its linguistic coverage, the p a r a - p h r a s e r p r o v e d particularly helpful in debugging incor- rect parses. It p r o v i d e d fast, e a s y - t o - r e c o g n i z e notifi- c a t i o n w h e n an i n c o r r e c t i n t e r p r e t a t i o n h a d b e e n made. This leads us to believe that the p a r a p h r a s e would also p r o v e helpful to actual users of the s y s t e m were CO-OP to interpret a question differently t h a n it was intended. T e s t i n g of this facility with a large n u m b e r of actual users r e m a i n s a topic f o r f u t u r e work.

T h e CO-OP p a r a p h r a s e r has b e e n designed to be d o m a i n - i n d e p e n d e n t , a n d thus a c h a n g e of the d a t a base requires no change in the p a r a p h r a s e r . P a r a p h r a -

(10)

Kathleen R. M c K e o w n Paraphrasing Questions Using Given and N e w Information

sers that use the t e m p l a t e form, h o w e v e r , will require such changes. This is b e c a u s e the t e m p l a t e s or p a t - terns, which constitute the type of question that can be asked, are n e c e s s a r i l y d e p e n d e n t on the d o m a i n . D i f f e r e n t sets of t e m p l a t e s must be used for different data bases.

T h e CO-OP p a r a p h r a s e r also differs f r o m o t h e r s y s t e m s in that it generates the question using a trans- f o r m a t i o n g r a m m a r of questions. It a d d r e s s e s two specific p r o b l e m s involved in generating p a r a p h r a s e s :

1. ambiguity in determining which noun phrases a relative clause modifies;

2. the p r o d u c t i o n of a question that differs f r o m the user's.

T h e s e goals h a v e b e e n a c h i e v e d f o r q u e s t i o n s using relative clauses t h r o u g h the application of a t h e o r y of given and new i n f o r m a t i o n to the g e n e r a t i o n process.

Acknowledgments

This w o r k was partially s u p p o r t e d by an IBM fellow- ship and NSF grant MC78-08401. I would like to t h a n k Dr. A r a v i n d K. Joshi and Dr. Bonnie W e b b e r for their invaluable c o m m e n t s on the style and c o n t e n t of this paper.

R e f e r e n c e s

Akmajian, A. and Heny, F. 1975 An Introduction to the Principles o f Transformational Syntax. Academic Press, New York, New York. Chafe, W.L. 1977 Givennness, Contrastiveness, Definiteness,

Subjects, Topics, and Points of View. In Li, C.N., Ed., S u b j e c t and Topic. Academic Press, New York, New York.

Codd, E.F. et al. 1978 Rendezvous Version 1: An Experimental English-Language Query Formulation System for Casual Users

of Relational Data Bases. IBM Research Report RJ2144, IBM Research Laboratory, San Jose, California.

Cullicover, P.W. 1976. Syntax. Academic Press, New York, New York.

Danes, F., Ed. 1974 Papers on Functional Sentence Perspective.

Academia, Prague.

Firbas, Jan. 1966 On Defining the Theme in Functional Sentence Analysis. In Travaux Linguistiques de Prague 1. University of Alabama Press.

Fibras, Jan. 1974 Some Aspects of the Czechoslovak Approach to Problems of Functional Sentence Perspective. Papers on Func- tional Sentence Perspective. Academia, Prague.

Goldman, N. 1975 Conceptual Generation. In Schank, R.C., Ed.,

Conceptual Information Processing. North-Holland Publishing Co., Amsterdam.

Grice, H.P. 1975 Logic and Conversation. In Cole, P. and Mor- gan, J.L., Ed., Syntax and Semantics: Speech Acts, Vol. 3. Aca- demic Press, New York, New York.

Halliday, M.A.K. 1967 Notes on Transitivity and Theme in Eng- lish. Journal o f Linguisticsx 3.

Heidorn, G. 1975 Augmented Phrase Structure Grammar. In

TINLAP-1 Proceedings.

Joshi, A.K. 1979 Centered Logic: The Role of Entity Centered Sentence Representation in Natural Language Inferencing. in

IJCAI Proceedings.

Kaplan, S.J. 1979 Cooperative Responses from a Portable Natural Language Data Base Query System. Ph.D. dissertation. Univer- sity of Pennsylvania, Philadelphia, Pennsylvania.

McDonald, D.D. 1978 Subsequent Reference: Syntactic and Rhet- orical Constraints. In TINLAP-2 Proceedings.

McKeown, K. 1979 Paraphrasing Using Given and New Informa- tion in a Question-answering System. Master's thesis. Univer- sity of Pennsylvania, Philadelphia, Pennsylvania.

Morgan, J.L. and Green, G.M. 1977 Pragmatics and Reading Comprehension. University of Illinois.

Prince, E. 1979 On the G i v e n / N e w Distinction, C L S 15.

Simmons, R. and Slocum, J. 1972 Generating English Discourse from Semantic Networks, Communications o f the A CM 5 (10).

Waltz, D.L. 1978 An English Language Question Answering Sys- tem for a Large Relational Data Base, CACM 21(7).

Figure

Figure object and right subtrees for the wh-item or questioned noun. 1. The subtree whose root node is the wh-item contains these clauses

References

Related documents

Ak nebola zistená žiadna z vyššie uvedených príčin anémie u pacienta s CHRI, ak má pacient hladinu Hb opa- kovane v rozpätí niekoľko týždňov (trikrát po sebe) ≤ 100

• Collaborate with the DRC transitional government to create an emergency medical programme to provide essential medical care, free of charge, to rape survivors in both rural and

This model quantifies both historical and current horserace variable data; while the fundamental modelling techniques utilize selective researched variables for race

• some forward foreign currency contracts and cross currency swaps that are not in a designated hedge relationship for hedge accounting purposes, used to economically hedge fair

Fig. 5 Number of gonocytes or gonocyte colonies observed after applying different medium changing regimens. The cultured cells underwent one of the following medium changing

Perceived Risk Rather than focusing on the objective, or technical, dimension of risk, consumer research emphasizes the investor’s perception of ‘the uncertainty and

The required or needed skill-set of management accountants has become much more comprehensive than before. New BI systems have given time for management accountants to

The drawback to batch computation is that it has high latency -- a batch workflow that updates the databases that serve the application will take hours to complete.. However, we will