Shared Preferences

(1)

S H A R E D P R E F E R E N C E S

J a m e s B a r n e t t a n d I n d e r j e e t

M a n i

MCC

3500 West Balcones Center Dr.

Austin, TX

78759

[email protected]

! A b s t r a c t

T h i s p a p e r a t t e m p t s to develop a theory of heuristics or preferences t h a t can be shared between u n d e r s t a n d i n g and generation systems. We first develop a f o r m a l analysis of preferences and consider the relation between their uses in generation and understanding. We then present a bi- directional a l g o r i t h m for a p p l y i n g t h e m and ex- a m i n e typical heuristics for lexical choice, scope and a n a p h o r a in:, m o r e detail.

1

I n t r o d u c t i o n

Understanding a n d generation s y s t e m s m u s t both deal with ambiguity. In understanding, there are often a n u m b e r of possible meanings for a string, while there are usually a n u m b e r of different ways of expressing a given m e a n i n g in generation. To control the expl6sion of possibilities, researchers have developed a variety of heuristics or preferences - for e x a m p l e , a preference for low attach- m e n t of modifiers in understanding or for conci- sion in generation. T h i s p a p e r investigates the possibility of s h a r i n g such preferences between understanding a n d generation as p a r t of a bidirec- tional NL system. In Section 2 we formalize the concept of a preference, and Section 3 presents an a l g o r i t h m fo ! a p p l y i n g such preferences uni- f o r m l y in u n d e r s t a n d i n g and generation. In Sec- tion 4 we consider specific heuristics for lexical choice, s c o p e , a n d a n a p h o r a . These heuristics h a v e special properties t h a t p e r m i t a m o r e efficient i m p l e m e n t a t i o n t h a n the general algorithm from Section 3. Section 5 discusses s o m e of the short-comings o f the theory developed here and suggests directions for future research.

2

P r e f e r e n c e s

in

U n d e r -

s t a n d i n g a n d G e n e r a t i o n

Natural language understanding is a m a p p i n g from utterances to meanings, while generation goes in the opposite direction. Given a set String of input strings (of a given language) and a set Int of interpretations or meanings, we can represent understanding as a relation U C String x Int, and generation as G C I n t x S t r i n g . U and G are relations, rather t h a n functions, since they allow for ambiguity: multiple meanings for an utterance and multiple ways of expressing a m e a n i n g 1. A minimal requirement for a reversible s y s t e m is t h a t U and G be inverses of each other. For all s 6 String and i 6 Int:

(s,

O e u

(i,s)ec

(1)

Intuitively, preferences are ways of controlling the ambiguity of U and G by ranking some interpretations (for U) or strings (for G) more highly than others. Formally, then, we can view preferences as total orders on the objects in question (we will capitalize the t e r m when using it in this technical sense). 2 Thus, for any s 6 String an understanding Preference Pint will order the pairs {(s, 01(s,=) E U}, while a generation Preference

* The definitions of U and G allow for strings with n o

interpretations and meanings with no strings. Since any meaning can presumably be expressed in any language, we may want to further restrict G so that everything is expressible: Y i 6 Int (Bs 6 String [(s, ,) E GI).

2We use total orders rather than partial orders to a v o i d

having to deal with incommensurate structures. The requirement of commensurability is not burdensome in prac- tice, even though many heuristics apparently d o n ' t a p p l y

to certain structures. For example, a heuristic favoring low attachment of post-modifiers doesn't clearly tell us how to

(2)

P,,r will rank {(/,s)l(/,, ) 6

G} s.

Thus we can view the task of understanding as enumerating the interpretations of a string in the order given

by

Pint.

Similarly, generation will produce strings

in the order given by Po,,. Using Up,., and

Gp.,.

to denote the result of combining U and G with these preferences, we have, for all s G

String

and

i

6

Int:

Up,., (s)

=d,l (ix . . . . , in)

(2) where

U(s)

= { i l , . . . , in} and

[j < k] ~ [(s,

it) <p,., (s,

ik)]

GPo,.(i) =d,! (s,...so 0

(3)

where G(0 = { s x , . . . , sin} and

IJ < --. < p . , . (i, sk))]

Alternatively, we note that any Preference P induces an equivalence relation ='p which groups together the objects that are equal under p.4 We can therefore view the task of Generation and Understanding as being the enumeration of P's equivalence classes in order, without worrying about order within classes (note that Formulae 2 and 3 specify the order only of pairs where one member is less than the other under P.)

The question now arises of what the relation between understanding Preferences and generation Preferences should be. Understanding heuristics are intended to find the meaning that the speaker is most likely to have intended for an utterance, and generation heuristics should select the string that is most likely to communicate a given meaning to the hearer. We would expect these Prefer- ences to be inverses of each other: if s is the best way to express meaning i, then i should be the most likely interpretation of s. If we don't accept this condition, we will generate sentences that we expect the listener to misinterpret. There- fore we define

class(Preference, pair)

to be the equivalence class that

pair

is assigned to under

Preference's

ordering, 5 and link the the first

3 N o t e t h a t t h i s d e f i n i t i o n allows P r e f e r e n c e s to work ' a c r o s s d e r i v a t i o n s . ' F o r e x a m p l e , it allows Pint to r a n k pairs (s, ,}, (s', i9 w h e r e 8 # s ' . It p e r m i t s a P r e f e r e n c e to s a y t h a t i is a b e t t e r i n t e r p r e t a t i o n for s t h a n i ' is for s:. It is n o t c l e a r if t h i s s o r t o f p o w e r is n e c e s s a r y , a n d t h e a l g o r i t h t n s b e l o w r e q u i r e o n l y t h a t P r e f e r e n c e s b e a b l e t o r a n k d i f f e r e n t i n t e r p r e t a t i o n s ( s t r i n g s ) for a g i v e n s t r i n g ( i n t e r p r e t a t i o n ) .

4 A n y o r d e r P o n a s e t o f o b j e c t s D p a r t i t i o n s D i n t o a s e t o f e q u i v a l e n c e c l a s s e s b y a s s i g n i n g e a c h x E D t o t h e set {ulv _<P x :z x _<p u}.

Selass(Preference, pair) is d e f i n e d a s t h e n u m b e r o f c l a s s e s c o n t a i n i n g i t e m s t h a t r a n k m o r e h i g h l y t h a n pair

u n d e r Preference.

(most highly ranked) classes under

P/.,

and

P.,r

as follows:

elass(eo,r,

(/, 8)) = 0 (4)

- .

crass(P,.,,

(s, O) = 0

It is also reasonable to require that opposing sets of preferences in understanding be reflected in generation. If string s, has two interpretations it and i2, with it being preferred to is, and string ss has the same two interpretations with the preferences reversed, then s, should be a better way of expressing i, than

i2,

and vice-versa for ss:

[ (sl, il) <p,., @1, is) &

(as, i2) <e,., (as, il) ]

[(il,sd

<p.,.

(i ,82)

&

(is,as)

<p.,.

62,.x) ]

(5)

Formula 4 provides a tight coupling of heuristics for understanding and generating the most preferred structures, but it doesn't provide any way to share Preferences for secondary readings. Formula 5 offers a way to share heuristics for secondary interpretations, but it is quite weak and would be highly inefficient to use. To employ it during generation to choose between sl and ss as ways of expressing il, we would have to run the understanding system on both sl and ss to see if we could find another interpretation i2 that both strings share but with opposite rankings relative to il.

(3)

i

To begin to develop a theory of secondary Pref- erences, we will simply stipulate t h a t the heuristics in question are shared 'across the b o a r d ' be-

I

tween understanding and generatmn.~ T h e simplest way to do this is to extend Formula 4 into a biconditional, a~d require it to hold of all classes (we will reconsider this stipulation in Section 5). For all s 6 S t r i n ~ l and i 6 I n t , we have:

et.ss(P,.,,

(,,,))

= el.ss(P.,., (i, 8)) (6)

Since Preferences now work in either direction, we can simplify our notation and represent them as total orderings of a set T of trees, where each node of each tre4 is a n n o t a t e d with syntactic and semantic information, and, for any t 6 T , s t r ( t )

• k

returns the string in S t r i n g t h a t t dominates (i.e., spans), and sere(t) returns the interpretation in I n t for the root node of t. For a p r e f e r e n e e P on T and trees t l , th, we stipulate:

t, < p t2

U p ( s t r ( t l ) )

tl

< p

t2

=

&

G p ( s e m ( t l ) ) =

s t r ( t l ) = s t r ( t 2 ) (7)

~ . . s e m ( t l ) . . . sem(t2 ) . . .)

s e r e ( q ) = sere(t2) (8)

6 . . s t r ( t , ) . . . s t r ( t 2 ) . . . )

We close this Section by noting a property of Preferences t h a t w i l l be i m p o r t a n t in Section 4: an ordered list Of Preferences can be combined into a new Preference by using each item in the list to refine the bordering specified by the previ- ous ones. T h a t is, the second Preference orders pairs t h a t are equal under the first Preference, and the third Preference applies to those t h a t are still equal under the second Preference, etc. If P 1 - - . P , are Preferences, we define a new Com- plex Preference P<,...,> as follows:

tl <Pc-, ...,.)

t2

(9)

~-;B l < j < n [Q <pj t2]

& - , 3 i < j

[t2

<p, tl]

3

A n A l g o r i t h m

for S h a r i n g

P r e f e r e n c e s

If we consider ways o f sharing Preferences between understanding and generation, the simplest one is to simply produce all possible interpretations(strings), and then sort them using the Pref- erence. This is, of course, inefficient in cases

where we are interested in only the more highly ranked possibilities. We can do better if we are willing to make few assumptions a b o u t the structure of Preferences and the understanding and generation routines. T h e crucial requirement on Preferences is t h a t they be 'upwardly monotonic' in the following sense: if t , is preferred to t2, then it is also preferred to any tree containing tz as a subtree. Using s u b t r e e ( t , , t 2 ) to mean that tx is a subtree of t2, we stipulate

[tl < p t2 ~ subtree(t2,t3)] (10)

--~ tl < P gS

W i t h o u t such a requirement, there is no way to cut off unpromising paths, since we can't predict the ranking of a complete structure from that of its constituents•

FinaLly, we assume that both understanding and generation are agenda-driven procedures t h a t work by creating, combining, and elaborating trees. 6 Under these assumptions, the following high-level algorithm can be wrapped around the underlying parsing and generation routines to cause the o u t p u t to be enumerated in the order given by a Preference P . In the pseudo-code below, mode specifies the direction of processing and input is a string (if mode is understanding) or a semantic representation (if mode is generation). execute_item removes an item from the agenda and executes it, returning 0 or more new trees. generate_items takes a newly formed tree, a set of previously existing trees, and the mode, and adds a set of new actions to the agenda. (The underlying understanding or generation algorithm is hidden inside generate_items.) T h e variable ac- tive holds the set of trees t h a t are currently being used to generate new items, while frozen holds those t h a t won't be used until later, complete_tree is a termination test that returns True if a tree is complete for the mode in question (i.e., if it has a full semantic interpretation for understanding, or dominates a complete string for generation). T h e global variable classes holds a list of equivalence classes used by equiv_class (defined below), while level holds the number of the equivalence class currently being enumerated. Thaw~restart is called each time level is incremented to gener- ate new agenda items for trees t h a t may belong to that class.

A L G O R I T H M 1

(4)

classes := Nil; solutions := Nil; new-trees := Nil; agenda := Nil; level := 1;

frozen := initialize.agenda(input, mode); {end of global declarations}

w h i l e frozen d o b e g i n

solutions := get_complete_trees (frozen, level, mode); agenda := thaw&restart

(frozen, level, agenda, mode); w h i l e agenda d o

b e g i n

new_trees := execute_item(agenda); w h i l e new_trees d o

b e g i n

new_tree := pop(new_trees); i f equiv_class (P, new_tree)

, > level t h e n push(new_tree, frozen); else i f complete_tree

(new_tree,mode) t h e n push(newAree, solutions); else generate, items

(new_tree, active, agenda, mode); e n d ;

e n d ; {agenda exhausted for this level} {solutions may need partitioning} w h i l e solutions d o

b e g i n

complete_tree := pop(solutions); i f equiv_class(P, complete_tree)

> level t h e n push(complete_tree, frozen); else output(complete_tree, level) ; e n d

{increment level to output next class} level := level + 1;

e n d ;

The function

equiv_class

keeps track of the equivalence classes induced by the Preferences. Given an input tree, it returns the number of the equivalence class that the tree belongs to. Since it must construct the equivalence classes as it goes,along, it may return different values on different calls with the same argument (for example, it will always return 1 the first time it is called, even though the tree in question may end up ha a lower class.) However, successive calls

to

equiv_class

will always return a non-decreasing

series of values, so that a given tree is guaran- teed to be ranked no more highly than the value

returned (it is this property of

eqaiv_class

that forces the extra pass over the completed trees in the algorithm above: a tree that was assigned to class n when it was added to

solutions

may have been demoted to a lower class in the interim as more trees were examined).

Less_than

and

eqeai

take a Preference and a pair of trees and return True if the first tree is less than (equal to) the second under the Preference.

Create_class

takes a tree and creates a new class whose only member is that tree, while

insert

adds a class to

classes

in the indicated position (shifting other classes down, if necessary), and

select_member

returns an arbitrary member of a class.

f u n c t i o n equiv_class

(P: Preference, T: Tree)

b e g i n

class_num := 1; f o r class in classes do

b e g i n i f less_than

(P, T, select_member(class)) t h e n

b e g i n

insert(new_class(T), classes, class_num); return(classmum); e n d ;

else i f equal

(P, T, select_member(class)) t h e n

b e g i n

add_member(T, class); return(class_hum); e n d ;

else class_num := class_num + 1; e n d ;

{T < all classes} insert(new_class(T),

classes, class_num); return(class_num);

e n d {equiv_elass}

To see that the algorithm enumerates trees in the order given by < p , note that the first iteration outputs trees which are minimal under < p . Now consider any tree t , which is output on a subseqent itertion N. For all other t , , output on that iteration, t , = p

t,,.

Furthermore,

t,

contains a subtree

t,ub

which was frozen for all levels up to N. Using

T(J)

to denote the set of trees output on iteration J, we have: VI_< I < N

IV ti

6

T(I) ti

< p

t,ub]],

whence, by stipulation

(5)

all trees which were e n u m e r a t e d before it. To cal- culate the time complexity of the algorithm, note t h a t it calls

equiv_class

once for each tree created by the underlying understanding or generation algorithm (and once for each complete interpretation).

Equiv_class,

in turn, must potentially compare its argument with each existing equivalence class. Assuming t h a t the comparison takes constant time, the '.complexity of the algorithm de- pends on the n u m b e r k of equivalence classes < p induces: if the Underlying algorithm is

O(f(n)),

the overall comp~lexity is

O(f(n)) x k.

Depending on the Preference, k could be a small constant, or itself proportional to f ( n ) , in which case the complexity woul~ be

O(f(n)~).

4

O p t i m i z a t i o n

o f

P r e f e r -

e n c e s

As we make more restrictive assumptions about Preferences, more efficient algorithms become possible. Initialily , we assumed only t h a t Pref- erences specified! total orders on trees, i.e., that would take two I trees as input and determine if one was less than, greater than, or equal to the other ~. Given such an unrestricted view of Preferences, ~ve can do no better than pro- ducing all interp~-etations(strings) and then sorting them. This simple approach is fine if we want all possibilities, especially if we assume that there won't, be a large number of them, so that standard n ,2 or n l o g n sorting algorithms (see [Aho, Hopcroft, and Ullman, 1983]) won't be much of an addit~ional burden. However, this approach is inefficient if we are interested in only some of the possibilities. Adding the monotonic- ity restriction 10 permits Algorithm 1, which is more efficient in. t h a t it postpones the creation of (successors of) lower ranked trees. However, we are still opera'ting with a very general view of what Preferencesl are, and further improvements are possible when we look at individual Prefer- ences in detail, in this section, we will consider heuristics for lexical selection, scope, and anaphor resolution. We do not make any claims for the usefullness of these heuristics as such, but take them as concrete 'examples that show the impor- tance of considering the computational properties of Preferences.

Note that Algorithm 1 is stated in terms of a single Preference. It is possible to combine multiple Preferences into a single one using Formula 9,

rWe also assume [hat this test takes constant time.

and we are currently investigating other methods of combination. Since the algorithms below are highly specialized, they cannot be combined with other Preferences using Formula 9. T h e ultimate goal of this research, however, is to integrate such specialized algorithms with a more sophisticated version of Algorithm 1.

4 . 1 L e x i c a l C h o i c e

One simple preferencing scheme involves assign- ing integer weights to lexical items and syntactic rules. Items or rules with higher weights are less common and are considered only if lower ranked items fail. When combined with restriction 10, this weighting scheme yields a Preference <wt t h a t ranks trees according to their lexical and rule weights. Using

maz_wt(T)

to denote the most heavily weighted lexical item or rule used in the construction of T, we have:

tl <tot 7t2 ('~del m a z - w t ( t l ) < maz_wt(t2)

(11) T h e significant property here is t h a t the equivalence classes under <wt can be c o m p u t e d without directly comparing trees. Given a lexical item with weight n, we know t h a t any tree containing it must be in class n or lower. Noting t h a t our algorithm works by generate-and-test (trees are created and then ranked by

equiv_class),

we can achieve a modest improvement in efficiency by not creating trees with level n lexical items or rules until it is time to enumerate t h a t equivalence class. We can implement this change for both generation and understanding by adding

level as

a parameter to both

initialize_agenda

and

gener-

ate_items,

and changing the functions they call

to consider only rules and lexical items at or be-

low

level.

How much of an improvement this

yields will depend on how m a n y classes we want to enumerate and how many lexical items and rules there are below the last class enumerated.

4 . 2 S c o p e

Scope is another place where we can improve on the basic algorithm. We start by considering scoping during Understanding. Given a sentence s with operators (quantifiers) o l . . . o , , as- signing a scope amounts to determining a total order on ol . . . o , s. If a scope Preference can do

(6)

no more t h a n compare and rank pairs of scopings, then the simple generate-and-test algorithm will require O(n!) steps to find the best scoping since it will potentially have to examine every possible ordering. However, the standard heuristics for as- signing scope (e.g., give "strong" quantifiers wide scope, respect left-to-right order in the sentence) can be used to directly assign the preferred ordering o f o x . . . ON. If we assume t h a t secondary readings are ranked by how closely they match the preferred scoping, we have a Preference <,c can be defined. In the following

(ol, oj) 6

Sc(s) means t h a t

oi

preceeds oj in scoping

Sc

of sentence s, and

Scb,,t(s)

is the preferred ordering of the operators in s given by the heuristics:

Sc,(s)

<,~

Se2(s) ~ d , !

(12)

Vo ,oi [(o ,o9 e

Sc (s) --. (o,,o9

sc,(.)] ]

Given such a Preference, we can generate the scopings of a sentence more efficiently by first pro- ducing the preferred reading (the first equivalence class), then all scopes t h a t have one pair of operators switched (the second class), then all those with two pairs out of order, etc. In the following algorithm,

ops

is the set of operators in the sentence, and

sort

is any sorting routine,

switched?

is a predicate returning True if its two arguments have already been switched (i.e., if its first arg was to the right of its second in

Scbe,t(s)),

while

switch(o,, o2, ord)

is a function t h a t returns new

ordering which is the same as

ord

except t h a t o~ precedes o, in it.

{the best scoping}

root_set := sort(operators,

SCbe°~(s));

level := 1;

output(root_set, level); new_set := Nil;

old_set := add_item(root_set, Nil); {loop will execute n! - 1 times } w h i l e old_set d o

b e g i n

f o r ordering i n old_set d o b e g i n

f o r op i n ordering d o b e g i n

{consider adjacent pairs of operators} next := right_neighbor(op, ordering); {switch any pair t h a t hasn't already been} i f next a n d not(switched?(op, next)) t h e n d o

b e g i n

new_scope := switch(op, next, ordering);

add_item(new.scope, new_set); output(new_scope, level) ; e n d

e n d e n d

old_set := new_set; new_set := Nil;

e n d

While the Algorithm 1 would require

O(n!)

steps to generate the first scoping, this algorithm will o u t p u t the best scoping in the n 2 or n log n steps t h a t it takes to do the sort (cf [Aho, Hopcroft, and Ullman, 1983]), while each additional scoping is produced in constant time. 9 The algorithm is profligate in t h a t it generates all possible orderings of quantifiers, many of which do not correspond to legal scopings (see [Hobbs and Shieber, 1987]). It can be tightened up by adding a legality test before scope is output.

When we move from Understanding to Gener- ation, following Formula 6, we see t h a t the task is to take an input semantics with scoping

Sc

and enumerate first all strings t h a t have

Sc as

their best scoping, then all those with

Sc as

the second best scoping, etc. Equivalently, we enumerate first strings whose scopings exactly match Sc, then those t h a t match Sc except for one pair of operators, then those matching except for two pairs, etc. We can use the Algorithm 1 to implement this efficiently if we replace each of the two conditional calls to

equiv_class.

Instead of first computing the equivalence class and then testing whether it is less t h a n

level,

we call the following function

class_less_than:

{True iff

candidate

ranked at

level

or below}

{ Target

is the desired scoping}

f u n c t i o n

classAess_than( candidate, target, level)

b e g i n

switchAimit := level; {global variable} switches := O; {global variable}

r e t u r n test_order(candidate, target, target); e n d {class_less_than }

f u n c t i o n

test_order( eand, targ_rest, targ)

b e g i n i f null(cand)

r e t u r n True; e l s e

9switched.¢

can be implemented in constant time if

(7)

!

b e g i n

targ_tail := member(first(cand), targ_rest);

if targ_tail

r e t u r n test_order(rest(cand), targ_tail, targ);

else

b e g i n

switches := switches + 1;

i f >(switches, switch.limit)

r e t u r n FalSe;

e n d

else

i

i f (simple_test(rest(cand), targ_rest) r e t u r n tesLorder(cand, targ, targ);

else

r e t u r n False;

e n d

{test_order} i

f u n c t i o n

simple~test( cand_rest, targ_rest)

b e g i n

f o r cand i n cand_rest d o

b e g i n

i f n o t ( m e m b e r ( c a n d , targ_rest))

b e g i n

switches := switches + l; i f >(switches, switch_limit)

r e t u r n falsei e n d

e n d

r e t u r n true; e n d {simple_test}

To estimate the complexity of

class_less_than,

note t h a t if no switches are encountered,

test_orderwill

make one pass through

targ_rest

( =

targ)

in

O(n)

steps, where n is the length of

targ.

Each switch encoUntered results in a call'to _r

sita-

pie_test, O(n)

steps, plus a call to

test_arg

on the

full list

targ

for a n o t h e r

O(n)

steps. T h e overall complexity is thus

O((j+

1) x n), where

level = j

is the number switches permitted. Note that

class_less_than

tests a candidate string's scoping

only against the target scope, without having to inspect other possible strings or other possible scopings for the string. We therefore do not need to consider all strings t h a t can have

Sc

as a scoping in order to fifid the most highly ranked ones that do. Furthermore,

class_less_than

will work on partial constituents (it doesn't require that

cand

have the same number of operators as

targ),

so unpromising p i t h s can be pruned early.

4 . 3 ' A n a p h o i . a

Next we consider the problem of anaphoric reference. From the standpoint of Understanding,

resolving an anaphoric reference can be viewed as a m a t t e r of finding a Preference ordering of all the possible antecedents of the pronoun. Al- gorithm 1 would have to produce a separate interpretation for each object t h a t had been men- tioned in the discourse and then rank them all. This would clearly be extremely inefficient in any discourse more than a couple of sentences long. Instead, we will take the anaphora resolution algorithm from [Rich and Luperfoy, 1988], [Luperfoy and Rich, 1991] and show how it can be viewed as an implementation of a Complex Pref- erence, allowing for a more efficient implementation.

Under this algorithm, anaphora resolution is entrusted to Experts of three kinds: a Proposer finds likely candidate antecendents, Filters provide a quick way of rejecting m a n y candidates, and Rankers perform more expensive tests to choose among the rest. Recency is a good example of a Proposer; antecedents are often found in the last couple of sentences, so we should start with the most recent sentences and work back. Gender is a typical Filter; given a use of "he", we can remove from consideration all non-male objects that the Proposers have offered. Semantic plausibility or Syntactic parallelism are Rankers; they are more expensive than the Filters and assign a rational-valued score to each candidate rather than giving a y e s / n o answer.

(8)

class 1°, we see t h a t the algorithm above, when run with Proposer

Pr,

Filters

F1... Fn

and Rankers R t . . . Rj, implements the Complex Preference P(Ftl '

I.),pr,at...a~),

defined in accordance with Formu'la 9. We thus have the following algorithm, where

nezt_class

takes a Proposer and a pronoun as input and returns its next equivalence class of candidate antecedents for the pronoun.

class := 1; {global variable}

cand := next_class(Proposer, pronoun); filtered_cand := cand;

w h i l e (cand) d o b e g i n

f o r eand i n cands d o b e g i n

f o r filter i n Filters d o b e g i n

i f not(Filter(cand))

t h e n remove(cand, filtered_cand); e n d

e n d

{filtered_cand now contains class n under}

{P(F(,,...l.),pr ).

Rankers R 1 - . . R j }

{may split it into several classes} refine&output(filtered_cand, Rankers); cand := next_class(Proposer);

e n d

f u n c t i o n Refine&Output(cands, Rankers) b e g i n

refined_order := sort(cands, Rankers); i f rest(Rankers)

t h e n refine&output(refined_order,

rest(Rankers)); e l s e

b e g i n

loc_class := 1; f o r cand i n refined_order d o i f >(equiv_class

first(Rankers), cand), loc_class)

t h e n

b e g i n

loc_class := loe_elass + 1; class := class + ioc_class; e n d

o u t p u t ( c a n d , class); e n d

e n d {Refine&Output}

Moving to Generation, we use this Preference

*0 In b o t h cases, t h e result is: pl n f l , . - - P . , l'lfl, w h e r e

Pl . . . p . are t h e equivalence classes induced by t h e Pro- poser, a n d f l is t h e F i l t e r ' s first equivalence class.

to decide when to use a pronoun. Following For- mula 6, we want to use a pronoun to refer to object z at level n iff that pronoun would be inter- preted as referring to z in class n during Under- standing. First we need a test

occursf(Proposer,

z) that will return True iff

Proposer

will even- tually o u t p u t z in some equivalence class. For example, a Recency Proposer will never suggest a candidate that hasn't occurred in the antecedent discourse, so there is no point in considering a pronoun to refer to such an object. Next, we note t h a t the candidates t h a t the Proposer returns are really pairs consisting of a pronoun and an antecedent, and t h a t Filters work by comparing the features of the pronoun (gender, number, etc.) with those of the antecedent. We can implement Filters to work by unifying the (syntactic) features of the pronoun with the (syntactic and semantic) features of the antecedent, returning either a more fully-specified set of features for the pronoun, or .L if unification fails. We can now take a syntactically underspecified pronoun and z and use the Filter to choose the appropriate set of features. We are now assured t h a t the Proposer will suggest z at some point, and t h a t z will pass all the filters.

Having established that z is a reasonable candidate for pronominal reference, we need to determine what claxs z will be assigned to as an antecedent. Rankers such as Syntactic Parallelism must look at the full syntactic structure**, so we must generate complete sentences before doing the final ranking. Given a sentence s contaning pronoun p with antecedent z, we can determine the equivalence class of (p, z) by running the Pro- poser until it (p, z) appears, then running the Fil- ters on all other candidates, and passing all the survivors and ( p , x ) to

refine~ontpnt,

and then seeing what class (p, z) is returned in. Alterna- tively, if we only want to check whether (p, z) is in a certain class n or not, we can run the resolution algorithm given above until n classes have been enumerated, quitting if ( p , x ) is not in it. (See the next section for a discussion of this algorithm's obvious weaknesses.)

n T h e definitions we've given so far do n o t specify how Preferences should rank "unfinished" structures, i.e., those t h a t d o n ' t contain all the information the Preference requires. One obvious solution is to assign incomplete structures to the first equivalence class; M t h e structures become complete, t h e y can be moved down into lower daases if necessary. U n d e r such a strategy, Preferences such a s

(9)

5

D i s c u s s i o n

Related Work: There is an enormous amount of work on preferences for understanding, e.g., [Whittemore, Ferrara, and Brunner, 1990], [Jensen and Binot, 1988], [Grosz, Appelt, Mar- tin, and Pereira, 1987] for a few recent examples. In work on generation preferences (in the sense of rankings of structures) are less clearly identifiable since such rankings tend to be contained implic- itly in strategies for the larger problem of deciding what to say (but see [Mann and Moore, 1981] and [Reiter, 1990].)i Algorithm 1 is similar in spirit to the "all possibilities plus constraints" strategy that is common in principle-based approaches (see [Epstein, 1988])i, but it differs from them in that it imposes a preference ordering on interpretations, rather than rest'ricting the set O f legal interpretations to begin With.

I

Strzalkowski [Strzalkowski, 1990] contrasts two strategies for r~versibility: those with a single grammar and two intepreters versus those with a single interpreter and two grammars. Although the top-level algorithm presented here works for both understanding and generation, the underlying generatio~ and understanding algorithms can belong to either of Strzalkowski's categories. However, the more specific algorithms discussed in Section 4 belong to the former category. There is also a clear "directionality" in both the scope and the anaphora Preferences; both are basically understanding h~euristics t h a t have been reformu- lated to work b~i-directionally. For this reason, they are both considerably weaker as generation heuristics. In particular, the anaphora Prefer- ence is clearly insufficient as a method of choosing when to use a pronoun. At best, it can serve to validate the choices made by a more substantial planning compoOent.

The Two Directions: J In general, it is not clear what the relation between understanding and generation heuristics should be. Formulae 4 and 5 are reasonable requirements, but they are too weak to provide ithe close linkage between understanding and generation that we would like to have in a bi-directional system. On the other hand, Formula 6 is probably too strong since it requires t h e equlivalence classes to be the same across the boardl In particular, it entails the con- verse of Formula.4, and this has counter-intuitive results. For example, consider any highly convo- luted, but grammatical , sentence: it has a best interpretation, and by Formula 6 it is therefore one of the best ways of expressing t h a t meaning.

But if it is sufficently opaque, it is not a good way of saying anything. Similarly, a speaker m a y suddenly use a pronoun to refer to an object in a distant part of the discourse. If the anaphora Preference is sophisticated enough, it m a y resolve the pronoun correctly, but we would not want the generation system to conclude t h a t it should use a pronoun in t h a t situation. One way to tackle this problem is to observe that understanding systems tend to be too loose (they accept a lot of things t h a t you don't want to generate), while generation systems are too strict (they cover only a sub- set of the language.) We can therefore view generation Preferences as restrictions of understanding Preferences. On this view, one m a y construct a generation Preference from one for understanding by adding extra clauses, with the result that its ordering is a refinement of t h a t induced by the understanding Preference.

Internal Structure: Further research is necessary into the internal structure of Preferences. We chose a very general definition of Preferences to start with, and found t h a t further restrictions allowed for improvements in efficiency. Prefer- ences that partition input into a fixed set of equivalence classes that can be determined in ad- vance (e.g., the Preference for lexical choice discussed in Section 4) are particularly desireable since they allow structures to be categorized in isolation, without comparing them to other al- ternatives. Other Preferences, such as the scope heuristic, allow us to create the desired structures directly, again without need for comparison with other trees. On the other hand, the anaphora Preference is based on an algorithm that assigns rational-valued scores to candidate antecedents. Thus there can be arbitrarily many equivalence classes, and we can't determine which one a given candidate belongs to without look- ing at all higher-ranked candidates. This is not a problem during understanding, since the Pro- poser can provide those candidates efficiently, but the algorithm for generation is quite awkward, amounting to little more than "make a guess, then run understanding and see what happens."

(10)

the kind of heuristic-specific algorithms seen in Section 4. We also plan an implementation of these Preferences as part of the KBNL system [Barnett, Mani, Knight, and Rich, 1990].

R e f e r e n c e s

[Aho, Hopcroft, and Ullman, 1983] Alfred Aho, John Hopcroft, and Jeffrey Ullman.

Data

Structures and Algorithms.

Addison-Wesley,

1983.

[Barnett, Mani, Knight, and Rich, 1990]

Jim Barnett, lnderjeet Mani, Kevin Knight, and Elaine Rich. Knowledge and natural language processing.

CACM,

August 1990.

[Calder, Reape, and Zeevat, 1989] J. Calder, M. Reape, and H. Zeevat. An algorithm for generation in unification categorial grammar. In

Proceedings of the ~th conference of the Eu-

ropean Chapter of the ACL,

1989.

[Epstein, 1988] Samuel Epstein. Principle-based interpretation of natural language quanti- tiers. In

Proceedings of A A A I 88,

1988.

[Grosz, Appelt, Martin, and Pereira, 1987] Barbara Grosz, Douglas Appelt, Paul Mar- tin, and Fernando Pereira. Team: an exper- iment in the design of portable natural language interfaces.

Artificial Intelligence,

1987.

[Hobbs and Shieber, 1987] Jerry Hobbs and Stu- art Shieber. An algorithm for generating quantifier scopings.

Computational Linguis-

tics,

13(1-2):47-63, 1987.

[Jensen and Binot, 1988]

Karen Jensen and Jean-Louis Binot. Dic- tionary text entries as a source of knowledge for syntactic and other disambiguations. In

Second Conference on Applied Natural Lan-

guage Processing,

Austin, Texas, 9-12 Febru-

ary, 1988.

[Luperfoy and Rich, 1991] Susan Luperfoy and Elaine Rich. Anaphora resolution.

Compu-

tational Linguistics,

to appear.

[Mann and Moore, 1981] William

~Mann and James Moore. Computer generation of multiparagraph english text.

Amer-

ican Journal of Computational Linguistics,

7(1):17-29, 1981.

[Reiter, 1990] Ehud Reiter. The computational complexity of avoiding conversational impli- catures. In

Proceedings of the ACL,

Pitts- burgh, 6.9 June, 1990.

[Rich and Luperfoy, 1988] Elaine Rich and Susan Luperfoy. An architecture for anaphora resolution. In

Second Conference on Applied

Natural Language Processing,

Austin, Texas,

9-12 February, 1988.

[Shieber, van Noord, Moore and Pereira, 1989] S. Shieber, G. van Noord, R. Moore, and F. Pereira. A semantic head-driven generation algorithm for unification-based formalisms.

In

Proceedings of the ACL,

Vancouver, 26-

29 June, 1989.

[Strzalkowski, 1990] Tomek Strzalkowski. Re- versible logic grammars for parsing and generation.

Computational Intelligence,

6(3), 1990.

[Whittemore, Ferrara, and Brunner, 1990] Greg Whittemore, Kathleen Ferrara, and Hans Brunner. Post-modifier prepositional phrase ambiguity in written interactive dialogues.

In

Proceedings of the A CL,

Pittsburgh, 6-9