• No results found

SAUMER: Sentence Analysis Using Metarules

N/A
N/A
Protected

Academic year: 2020

Share "SAUMER: Sentence Analysis Using Metarules"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

S A U M E R : S E N T E N C E A N A L Y S I S U S I N G M E T A R U L E S

F r e d P o p o w i c h Natural Language Group

Laboratory for Computer and Communications Research Department of Computing Science

Simon Fraser U n i v e r s i t y Burnaby. B.C.. C A N A D A V5A 1S6

ABSTRACT

The SAUMER system uses specifications of natural language grammars, which consist of rules and metarules. to provide a semantic interpretation of an input sentence.

The SAUMER ' Specification Language (SSL) is a

programming language which c o m b i n ~ some of the

features of generalised phrase structure grammars (Gazdar.

1981 ). like the correspondence between syntactic and

semantic rules, with definite clause grammars (DCC-s)

(Pereira and Warren. 1980) to create an executable g r a m m a r specification. S S L rules are similar to D C G rules except that they contain a semantic component and m a y also be left recursive. Metarules are used to generate n e w rules t r o m existing rules before any parsing is attempted. A.n implementation is tested which can provide semantic interpretations for sentences containing tepicalisation, relative clauses, passivisation, and questions.

1. INTRODUCTION

The SAUMER system a l l o w s the user to specify a g r a m m a r for a natural language using rules and metarules

rhts g r a m m a r can then be u¢,ed ~ obtain a semantic

interpretation of an input sentence. The SAUMER

Specification l a n g u a g e (SSL). which L~ a variation of definite clause g r ~ s (DCGs) (Pereira and Warren. 1980). captures some ,ff the festures of generaI£.ted phrase structure grammar5 (GPSGs) (Gazdax, 1981) (GaTrl~r and Pullum. 1982). like rule schemata, rule transformations.

structured categories, slash categories, and the

correspondence between syntactic and semantic rules. The

semantics currently used in the system are based on

Schubert and Pelletiers description in (Schubert and

Pelletier. 1982). - which adapts the intetmional logic

intervretation associated w i t h GPSGs. into a more

conventional logical n o t a t i o n

2. THE SEMANTIC LOGICAL NOTATION

The logical notation associated w i t h the gr~mm~r differs f r o m . the usual notation of intensional logic_since it captures some i n t m t i v e aspects of natural language, l

Thus. individuals and objects are treated as entities. instead of collections of prope'rties, and actions are n - a r y relations between these entities. Many of the problems that the intensional notation w o u l d solve are handled by

allowing ambiguity to be represented in the logical

notation. Consequently. as is common in other approaches. (e.g.. Gawron. 1982). much of the processing is deferred to the pragmatic stage. The s t r u c t u r e of the lexicon, and the

appearance of post processing markers (sharp angle

brackets) are designed to reflect this ambiguity. The

lexicon is organised into t w o levels. For the semantic interpretation, the first level gives each w o r d a t e n t a t i v e

interpretation. During the pragmatic analysis, more

complete processing information w i l l result in the final interpretation being obtained f r o m the second level of the lexicon. For e ~ m p l e , the sentence John misses John could be given an initial interpretation of:

(2.1) [ Johnl misa2 John3 ]

w i t h Johnl, miss2 and John3 obtained f r o m the first level of the t w o level lexicon. T h e pragmatic stage w i l l determine if J o h a l and John3 both refer to the same entry, say JOHN SMITH1. of the second level of the lexicon, or if t h e y correspond to d i f f e r e n t entries, say

JOHN_JONES1 and J O H N _ E V A N S 1 . During the

pragmatic stage, the e n t r y of MISS which is referred to by miss2 w i l l be determined (if possible). For example, does John miss John because he has been a w a y for a long time, or is it because he is a poor shot w i t h a rifle?

A n y interpretation contained in sharp angle brackets. < . . . > . m a y require post processing. This is apparent in interpretations containing determiners and co-ordinators.

The

proverb:

(2.2) e v e r y m a n loves some w o m a n could be given the interpretation:

(2.3) [ < e v e r y l m a n 2 > love3 < s o m e 4 w o m a n S > ] w i t h o u t explicitly stating whmh of the two readings is intended. During pragmatic analysis, the scope of every and some w o u l d presumably be determined.

111 should also be noted that. due Io the separabili'~y of the semantic component from ",he g r a m m a r rule, • different semantic notation could easily be introduced at long as ~u~ a p p ~ p r i a t e ~.mantic proce~in8

(2)

The s y n t a x of t h i s logical n o t a t i o n can be b-~mmav~sed as follows. Sentences a n d c o m p o u n d predicate f o r m u l a s are c o n t a i n e d w i t h i n s q u a r e brackets. So. (2.4) s t a t e s t h a t 3oim w a n t s to kiss Mary:

(2.4) [ J o h n l w a n t 2 [John1 kiss3 Mary4]]

These f o r m u l a s can also be expressed e q u i v a l e n t l y in a more f u n c t i o n a l f o r m according to t h e equivalence

(2.5) [ t n P t I . . . t a d ] --- ( • . . ( ( P t l ) t 2) . . . t n ) - - ( P t t . t.

)

Consequently. (2.4) could also be represented as:

(2.6) ( ( w a n t 2 ( ( k i s s 3 M a r y 4 ) J o h n l ) } J o h n l )

However. t h i s n o t a t i o n is u s u a l l y used f o r i n c o m p l e t e phrases, w i t h t h e s q u a r e b r a c k e t s used to o b t a i n a cortvent/ona/ final reading Modified predicate f o r m u l a s are contained in braces. T h u s . a little dog likes F i d o could be expressed as:

(2.7) [ < a l {little2 d o g 3 } > likes4 FidoS]

The l a m b d a calculus o p e r a t i o n s of l a m b d a a b s t r a c t i o n a n d e l i m i n a t i o n are also allowed. W h e n a v a r i a b l e is a b s t r a c t e d f r o m a n expression as in:

(2.8) kx [ • w a n t 2 [ • love3 M a r y 4 ] ]

application of t h i s n e w expression to a n a r g u m e n t , s a y d o h n l :

(2.9) ( k x [ • w a n t 2 [ • love3 l~u~J'4 ] ] J o h n l ) will r e s u l t in an int~,v,©tation of John w a n t s to love Mary: (2.10) [ J o h n l w a n t 2 [ J o h n l love3 M a r y 4 ] ]

F u r t h e r details on this n o t a t i o n are a v a i l a b l e in ( S c h u b e r t a n d Pelletier. 1982).

3. T H E S A U M E R S P E C I F I C A T I O N L A N G U A G E

The SAUMER Specification Language (SSL) is a

p r o g r a m m i n g l a n g u a g e t h a t a l l o w s t h e user to d e f i n e a g r a m m a r of a n a t u r a l language "in ~ of rules, and metarules. M e t a r u l e s operate on rules to produce n e w rules. The language is basically a GPSG realised in a DCG setting. U n l i k e GPSGs. t h e g r a m m a r s defined b y t h i s s y s t e m are not required to be c o n t e x t - f r e e since procedure calls are allowed w i t h i n the rules, and since logic v a r i a b l e s are allowed in t h e g r a m m a r s y m b o l s .

The basic objects of the language are a t o m s , variables. t e r m s , and lists. A n y w o r d s t a r t i n g w i t h a l o w e r case letter, o r enclosed in single q u o t e s is a n atom. V a r i a b l e s s t a r t w i t h a capital letter or a n underscore. A t e r m is a n atom. o p t i o n a l l y followed b y a series of objects ( a r g u m e n t s ) , w h i c h are enclosed in parentheses and separated b y commas. L a s t l y . a list is a series o f one o r m o r e objects, separated b y commas, t h a t are enclosed in

s q u a r e b r a c k e t s

3 . 1 R u l e s

T h e r u l e s are presented in a v a r i a t i o n of t h e DCG n o t a t i o n , a u g m e n t e d w i t h a s e m a n t i c r u l e c o r r e s p o n d i n g to each s y n t a c t i c rule. Each r u l e is of t h e f o r m "A - - > B : ~," w h e r e A is a t e r m w h i c h denotes a n o n t e r m i n a l s y m b o l . B is e i t h e r a n a t o m list r e p r e s e n t i n g a t e r m i n a l s y m b o l or a c o n j u n c t i o n of t e r m s ( s e p a r a t e d by c o m m a s ) c o r r e s p o n d i n g to n o n t e r m i n a l s y m b o l s , and y is a s e m a n t i c r u l e w h i c h m a y reference t h e i n t e r p r e t a t i o n of t h e c o m p o n e n t s of ~ in d e t e r m i n i n g the s e m a n t i c s of A. The r u l e a r r o w . - - > . separates t h e t w o sides of the rule. w i t h t h e colon. :. separating t h e s y n t a c t i c c o m p o n e n t f r o m t h e s e m a n t i c component. If t h e r u l e is preceded b y t h e w o r d a d d , it can be subjected to t h e t r a n s f o r m a t i o n s described in section 3.2. The n o n t e r m i n a l s y m b o l s can possess a r g u m e n t s , w h i c h m a y be used to c a p t u r e t h e f l a v o u r of t h e s t r u a u r a d categor/~s of GPSGs. ~ m a y also possess a r b i t r a r y p r o c e d u r a l r e s t r i c t i o n s c o n t a i n e d in braces.

T consists of expressions in t h e s e m a n t i c n o t a t i o n . T h e d i f f e r e n t t e r m s of t h i s s e m a n t i c expression are joined b y t h e s e m a n t i c connector, t h e a m p e r s a n d "&'. T h e a m p e r s a n d d i f f e r , f r o m t h e s y n t a c t i c connector, t h e c o m m a , sinc~ t h e f o r m e r associates to t h e r i g h t w h i l e t h e l a t t e r associates to the left. The /og/col a n d s y m b o l . w h i c h t r a d i t i o n a l l y m a y also be d e n o t e d b y t h e a m p e r s a n d , m u s t be entered as "&&'. Due to c o n s t r a i n t s imposed b y the c u r r e n t i m p l e m e n t a t i o n , "( exFr )" m u s t be entered as " < [ expr ]'. " < expr >" as " < < [ expr ]'. a n d "k x expr" as "x l m d a expr." An expression m a y c o n t a i n references to t h e i n t e r p r e t a t i o n s of t h e e l e m e n t s of 18 b y s t a t i n g t h e a p p r o p r i a t e n o n t e r m i n a l f o l l o w e d b y t h e left quote, ". To p r e v e n t a m b i g u i t y in "these references t h a t m a y arise w h e n t w o identical s y m b o l s a p p e a r in B. a n o n t e r m i n a l m a y be a p p e n d e d w i t h a m i n u s sign f o l l o w e d b y a u n i q u e integer.

U n l i k e s t a n d a r d Prolog i m p l e m e n t a t i o n s of DCGs. left recursion is allowed in rules, t h u s p e r m i t t i n g m o r e n a t u r a l d e s c r i p t i o n s of certain p h e n o m e n a (like c o - o r d i n a t i o n ) . Since t h e left r e c u r s i v e rules are i n t e r p r e t e d , r a t h e r t h a n c o n v e r t e d into rules t h a t are not left recursive, t h e n u m b e r of rules in the d a t a b a s e will not be affected. H o w e v e r . t h e efficiency of the sentence a n a l y s i s m a y be affected d u e to the e x t r a processing required. Rules of t h e f o r m "A - - > A. A" are not accepted.

A n e x a m p l e of a p r o d u c t i o n t h a t derives J o h n f r o m a p r o p e r n o u n . n p r . is s h o w n in (3.1):

(3.1) n p r - - > [ ' J o h n ' ] : " J o h n ' #

The s e m a n t i c i n t e r p r e t a t i o n of t h i s n p r will be J o h n # . w i t h " # " replaced b y a u n i q u e integer d u r i n g e v a l u a t i o n . (3.2) i l l u s t r a t e s a v e r b p h r a s e r u l e t h a t could be used in sentences like J o h n w a n t s to wa/k:

(3.2) v p ( N u m ) - - >

v ( N u m . R o o t ) w i t h Root in [want.like]. v p ( i n f )

(3)

F i r s t nottce t h a t a restriction on t h e v e r b appears w i t h i n t h e w / t h s t a t e m e n t . In t h e GPSG f o r m a l i s m , t h i s t y p e of r e s t r i c t i o n w o u l d be o b t a i n e d b y n a m i n g t h e r u l e s a n d associating a list of v a l i d r u l e n a m e s w i t h each lexical e n t r y . A l t h o u g h t h e w/~h r e s t r i c t i o n m a y c o n t a i n a n y v a l i d in-ocedure, t y p i c a l l y the i n o p e r a t i o n ( f o r d e t e r m i n i n g list m e m b e r s h i p ) is used. T h e d o u b l e p o u n d . # # . is replaced b y t h e s a m e u n i q u e integer in t h e e n t i r e expression w h e n t h e expression is e v a l u a t e d . I f " # " w e r e used instead, each i n s t a n c e of x # w o u l d be d i f f e r e n t . For t h e a b o v e example, if v' is w a n t 2 a n d

vp' is runJ. then

t h e s e m a n t i c expression could e v a l u a t e to:

(3.3) x4 l m d a [x4 & w a n t 2 & [x4 & r u n 3 ] ] F u r t h e r m o r e . if

np" is Johrtl. then:

(3.4) [np" & v p ' ]

could r e s u l t in:

(3.5) [Johnl & w a n t 2 & [Johnl & run3]]

3.2 T h e Metarules

T r a d i t i o n a l t r a n s f o r m a t i o n a l g r a m m a r s p r o v i d e t r a n s f o r m a t i o n s t h a t operate on parse trees, or s i m i l a r s t r u c t u r e s , a n d o f t e n require t h e t r a n s f o r m a t i o n s to be used in sentence recognition r a t h e r t h a n in generation ( R a d f o r d . 1981). H o w e v e r . t h e approach suggested b y (GaT~2r. 1981) uses t h e t r a n s f o r m a t i o n s g e n e r a t i v e l y a n d applies t h e m to the g r a m m a r . T h u s . t h e g r a m m a r can r e m a i n contex:-free b y compiling t h i s t r a n s f o r m a t i o n a l k n o w l e d g e into t h e g r a m m a r . T r a n s f o r m a t i o n s a n d r u l e s c h e m a t a f o r m t h e m a a z u / ~ s of SSI- 2

Rule s c h e m a t a a l l o w t h e user to specify e n t i r e classes of r u l e s b y p e r m i t t i n g v a r i a b l e s w h i c h range o v e r a selection of categories to a p p e a r in t h e rule. To c o n t r o l t h e v a l u e s of t h e variables, t h e f o r a / / c o n t r o l s t r u c t u r e can be used in the s c h e m a declaration. T h e s c h e m a f o r a / / X ~n List, Body w i l l execute Body f o r each e l e m e n t of L i ~ . w i t h X i n s t a n t i a t e d to t h e c u r r e n t element. T h e

use of this statement is illustrated in the following

m e t a r u l e t h a t generates t h e t e r m i n a l p r o d u c t i o n s f o r p r o p e r nouns."

(3.6) f o r a l l T e r m i n a l in [ ' B o b ' . ' C a r o l ' . ' r e d ' . ' A l i c e ' ] , ( n p r - - > [ T e r m i n a l ] : T e r m i n a l # ) .

T r a n s f o r m a t i o n s m a t c h w i t h g r a m m a r r u l e s in t h e database, using a r u l e p a t t e r n t h a t m a y be a u g m e n t e d w i t h a r b i t r a r y procedures, a n d produce new r u l e s f r o m t h e old rules. A t r a n s f o r m a t i o n is of t h e f o r m :

(3.7) a - - > /i : y - - - > a' - - > B" : 7"

The m e t a r u l e a r r o w . - - > , separates t h e p a t t e r n , a - - > ~ : T. f r o m the t e m p l a t e , a" - - > /i" : T'-

2 O f l e n . metarule~ are considered 1o consisl of t r a n s f o r m a t i o n s o n l y , w h i l e s c h e m a t a are p u l inlo a c a t e g o r y of their o w n . H o w e v e r . sinoe t h e y can both be considered i~ p a r t of • m e t a g r a m m a ~ , t h e y are called me~trule~ in t h l , distna~inn.

T h e ~ n ~ a ~ p a t t e r n , Q - - > /i. c o n t a i n s n o n t e r m i n a l s . w h i c h c o r r e s p o n d to s y m b o l s t h a t m u s t a p p e a r in t h e m a t c h e d rule, a n d free v a r i a b l e s , w h i c h r e p r e s e n t

don't

~ r ~ r e g i o n s o f zero or m o r e n o n t e r m i n a l s . T h e p a t t e r n n o n t e r m m a l s m a y also possess a r g u m e n t s . For each r u l e s y m b o l , a m a t c h i n g p a t t e r n s y m b o l describes p r o p e r t i e s t h a t must exist, b u t n o t all the p r o p e r t i e s t h a t may exist. T h u s . if v p appeared in the p a t t e r n , it w o u l d m a t c h a n y of

vp. vp(Num),

o r

vp(Nura2"ype) with Type in /transl.

H o w e v e r .

pp(to)

w o u l d n o t m a t c h pp or

pp(frora),

b u t it w o u l d m a t c h

plMto,_).

T h e m a t c h i n g c o n d i t i o n s are s u m m a r i s e d in Figures 3-1 and 3-2. In Figure 3-1. A a n d B are n o n t e r m i n a l s . X is a free v a r i a b l e , a n d a a n d /i are c o n j u n c t i o n s o f one o r m o r e s y m b o l s , y a n d 8 o f Figure 3-2 are also c o n j u n c t i o n s of one or m o r e s y m b o l s . "=" is d e f i n e d as u n i f i c a t i o n (Clocksin a n d Mellish, 1981). P a r t s of the r u l e c o n t a i n e d in braces are ignored b y t h e p a t t e r n m a t c h e r . T h e s y n t a c t i c p a t t e r n m a y also c o n t a i n a r b i t r a r y restrictions. 3 enclosed in braces, t h a t are e v a l u a t e d d u r i n g t h e p a t t e r n m a t c h . T h e s e m a n t / c p a t t e r n , y, is v e r y p r i m i t i v e , h m a y c o n t a i n a free v a r i a b l e , w h i c h w i l l b i n d to the e n t i r e s e m a n t i c s field of the m a t c h e d r u l e , or it m a y c o n t a i n t h e s t r u c t u r e < [ ? ~]. w h i c h w i l l b i n d to the e n t i r e s t r u c t u r e c o n t a i n i n g t h e s y m b o l x. If < [ ? y] t h e n a p p e a r s in y ' , t h e r e s u l t will be t h e s e m a n t i c c o m p o n e n t of t h e m a t c h e d r u l e w i t h x replaced b y y.

P a t t e r n

Rule

(B. /3) B

(A.

a )

(X. a )

A

X

A m a t c h e s B A m a t c h e s B a n d a n d a m a t c h e s ~ a is a free v a r i a b l e

(X. a ) m a t c h e s /i a m a t c h e s B or a matches (B. ~ )

No A m a t c h e s B

y e s Yes

F i g u r e 3-1: P a t t e r n M a t c h i n g f o r C o n j u n c t i o n s

P a t t e r n

Rule

b(/i[ .... /I n) b(,/i I .... /in ) with 8

a(a I . . . . a m )

a ( a I . . . . a = ) w i t h

a = b . m ~ < n .

ati=/i i, 1~<i~<m

No

a - - b . m ~ n . a i = / i i, l ~ i ~ m

a = b . m ~ n . a i = / i i. l ~ < i ~ < m . "

m a t c h e s 8

F i g u r e 3-2: P a t t e r n M a t c h i n g for N o n t e r m i n a l s

[image:3.612.318.572.359.706.2]
(4)

The behaviour of patterns can be seen in the following examples. Consider the sentence rule:

(3.8) s(decl) --> n p ( n o m . N u m b ) .

v p ( _ J q u m b ) w i t h agreement(Numb) : [ rip" & vp" ]

The patterns shown in (3.9a) w i l l match (3.8). while those of (3.9b) will not match it.

(3.9) (a) s(A) - - > {not element(A,[foo])L X. vp : Sere s - - > np(nom), X. vp(pass). Y : Sere

(b) s(inter) - - > np. v p : Seam

s - - > vp : Sere

For the v e r b phrase rule shown in (3.10): (3.10) vp(active.[MIN]) - - >

v([MIN],Root,Type,_) w i t h (intrans in Type) : v"

the patterns of (3.11a) will result in a successful match. will those of (3.11b) w i l l not:

W i t h external modification, any nonterminal, or variable instantiated to a nonterminal, m a y be f o l l o w e d by the sequence @rood. This w i l l result in rood being inserted i n t o the a r g u m e n t list f o l l o w i n g the

specified

arguments. Thus, mf N@junk appeared in a rule w h e n N was instantiated to

np(more),

it w o u l d be expanded as rip(more,junk }. Similarly, if the pattern s y m b o l vp

matched

v,v{NumS)

in a rule, then the appearance of

vp@foo in the template w o u l d result in

vp(foo~Vumb)

appearing in the new rule. This extra argument.

introduced by the modifier, can be useful w h e n dealing w i t h the missing components of

slash

or

derived

categories

(Gazdar, 1981).

Internal modification allows the m o d i f i e r to be p u t directly into the argument list. If an a r g u m e n t is followed by @rood. it will be replaced by rood. In the case where @rood appears as an argument by itself, rood is

added as a new argument. For example, if

v(Numb@pastpart)

were contained in a template, it w o u l d

IT-match v(Numb)

in the pattern, and w o u l d result in the appearance of

v(pastpart)

in the new rule.

(3.11)

(a)

v p - >

v :

<[?v]

v p - - > v( . . . . T y p e . _ )

with (X, intrans in Type.

Y).

Z : S e m

(b) v p - - > v ( . . . _ . T y p e . _ )

w i t h (X. trans in Type) : S e m

v p - > v ( _ ~ o o t .... )

w i t h (Root in [fool. X)

:Sem

For every rule that matches the pattern, the template of the transformation is executed, resulting the creation of a new rule. A n y nonterminal. N, that matches a symbol 8 i on the left side of the transformation, will appear in the new rule if there is a symbol ~i" in 8" that

irura-transformation

(IT) matches w i t h ~i" If there are several symbols in 8" that IT-match ~i" the leftmost s y m b o l w i l l be selected. No symbol on one side of the transformation may IT-match with more than one s y m b o l on the other side. T w o symbols will IT-match o n l y if they have the same n u m b e r of arguments, and those

arguments are identical. A n y w/th expressions and

modifiers associated w i t h symbols are ignored during IT- matching. 8" m a y also contain extra s y m b o l s that do not correspond to anything in 8. In this case. t h e y are inserted directly into the new rule. Once again, if the transformation is preceded by the command add. then the

resulting r u l ~ can be subjected to subsequent

transformations.

3.3 Modifiers

Both rules and metarules m a y c o n t a i n s modifiers that alter the ~tructure of the nonterminal symbols. There are

t w o types of modification, which have been dubbed

external

and /nzerrud modification.

4. IMPLEMENTATION

T h e S A U M E R system is currently implemented in highly portable C-Prolog (Pereira. 1984). and runs on a Motorola 68000 based S U N Workstation supporting U N I X 4.

Calls to Prolog are allowed by the system, thus providing

useful tools for debugging grsmmars, and tracing

derivations. However. due to the highly declarative

nature of SSL, it is not restricted to a Prolog

... implementation. Implementations in other languages w o u l d d i f f e r externally o n l y in the syntax of the procedure calls that m a y appear in each rule. Use of the system is described in detail in (Popowich, 1985).

The current implementation converts the g r a m m a r as specified by the rules and metarules into Prolog clauses. This conversion can be examined in terms of how rules are processecl, and h o w the schemata and transformations are processed.

4.1 Rule

P r o c e s s i n g

The syntactic component of the rule processor is based

on Clocksin and Mellish's definite clause grammar

processor (Clocksin and Mellish. 1981) which has been

implemented in C-Prolog. For a DCG rule. each

nonterminal is converted into a Prolog predicate, with t w o

additional arguments, that can be processed by a t o p - d o w n

parser. These ~ t n arguments correspond to the list to be parsed, and the remainder of the list after the predicate has parsed the desired category. W i t h the addition of

semantics to each rule, another argument is required to

represent the semantic interpretation of the current

symbol. Thus. w h e n e v e r a left quoted category name. x'.

(5)

appears in the semantics of the rule. it'is'repla~gl b y a variable bound to the semantic argument of the

corresponding symbol, x. in the rule. The semantic

expression is then evaluated by the eva/ routine w i t h the result bound to the semantic argument of the nonterminal on the left hand side of the production. For ~ffiample. the sentence /ule:

(4.1) add s(decl) - >

np(nom.Numb).

v p ( _ 2 q u m b ) w i t h agreement(Numb) : [ np" & vp" ]

will result in a Prolog expression of the form: (4.2) s(SemS.decl._l. 3) :-

nlKSemNP.nom2qumb. 1 . 2 ) . vp(SemVP, 2qumb. 2. 3). agreement(Numb).

eval([SemNP & SemVP],SemS).

Consequently. to process the sentence John runs. one w o u l d try to satisfy:

(4.3) :- s(Sem, Type. ['John'.runs]. []).

The first argument returns the interpretation, the second a r g u m e n t returns the t y p e of sentence, the third is the

initial input list. and the final argument corresponds to the list rPmaining after finding a sentence. A n y rule R, that is preceded by add w i l l have the axiom r'ul~(R) inserted into the database. These axioms are used by the transformations during pattern matching.

The eva/ routine processes the s u f f i x symbols, # and # # along w l t h the lambda .expressions, and m a y perform

some- reorganisation of the given expression-- before

returning a new semantic form. For each expression of the f o r m n a m e # , a unique integer N is ca-eared and

nan~-N is returned. With " # # ' . the procedure is the

same except that the first occurrence of " # # " w i l l generate a unique integer that w i l l be saved for all subsequent occurrences. To evaluate an expression of the f o r m :

(4.4) ( expr i Lmda e ~ F j & X )

every subexpression of exprj is recursively searched for an

occurrence of expr i. which is then replaced by X.

Left recursion is removed w i t h the aid of a gap predicate identical to the one defined to process gapping g r - a m m a r S

(Dahl

and Abramson. 1984) and unre~Lricte~

gapping g r a m m a r s (Popowich. forthcoming). For any rule of the form:

(4.5) A - - > A. B. a

where A does not equal B. the result of the translation is:

(4.6) A f _ I . N n) :- g a p ( G . _ l . 2). B ( 2 . N o ) . A(G,[]). <Xl (No,N 1 ) . . . tXn(Na_l.Nn), According to (4.6). a phrase is processed by skipping over a region to find a B - - the first non-terminal that does not equal A. The skipped region is then examined to

ensure that it corresponds to an A before the rest of the phrase is processed.

4.2 Schema Processing

To process the metarule control structures used by schemata, a f m l predicate is inserted to force Prolog to t r y all possible alternatives. T h e simple recursive definition of / o r e / / X / ~ /./rt:

(4.7) f o r a l l ( X in [], Body). f o r a l l ( X in [YIRest]~xty) :-

(X=Y. c a l l l ( B o d y ) , fail) : forall(X. Rest. Body).

uses f a / / to undo the binding of Y, the first element of the list. to X before calling fore// w i t h the remainder of the list. The predicate ¢.<d/l is used to evaluate Body since it w i l l prevent the f a / / predicate f r o m causing backtracking into Body.

4.3 Transformation Processing

Execution of transformations requires the most

complex processing of all of the m e t a g r a m m a t i c a l operations. This processing can be divided into the three

stages of transformation c r Y . pattern matching, and rule crem,/on. 5

During the rrar~fornuU/~n trot/on phase, the predicate

rrarts(M,X,Y) is created for the metarule. M . This

predicate will transform a list of elements. X: into another ILSL Y, according to the syntax specification of the metarule. Elements that IT-match will be represented by the s a m e free variable in both lists. This binding will be one to one. since an element cannot match with m o r e than one element on the other side. S y m b o l s that appear on only one side will not have their free variable appearing on the opposite side. Expressions in braces are ignored during this stage. If a transformation like:

( 4 . 8 ) a - - > b, c. X - - > a@foo - - > b. X. c(foo) appears, then a predicate of the form:

(4.9) t r ~ s ( M . L 1 . _ 2 . _ 3 . X ] . L 1 . _ 2 . X . _ 4 ] )

will be created. Notice that the appearance of a modifier does not cause a@/oo to be distinguished from a. since all modifiers are removed before the p a t t e r n - t e m p l a t e match is attempted. However. c and c(foo) are considered to be d i f f e r e n t symbols. M is a unique integer associated w i t h the transformation.

The pattern match phase determines if a rule matches the pattern, and produces a list for each successful match which w i l l be transformed by the trans predicate. Each element of the list is either one of the matched s y m b o l s f r o m the rule. or a list of s y m b o l s corresponding to the don't care region of the pattern. A n y predicates that

(6)

appear in braces in the pattern a r e evaluated during t h e pattern match. Consider the operation of an active-passive v e r b phrase transformation:

(4.10) vp(active~Numb) - - > v(Numb.R.Type.SType) w i t h (X.trans in Type.Y). np. Z

< [ ? np']

v ~ p a s s . N u m b ) - - >

v(Numb.be.T.S)-I w i t h auz in T. v([email protected]) w i t h (X.trans in Type.Y).

z. pp(by._)

: x # # Imda [pp(by)" & <[7 x # # ] ] on the following v e r b phrase:

(4.11) vp(active.Numb) - - >

v(Numb~R.Type._) w i t h trans in Type.

n~[x.A.x] . . . .

)

: < [ v" & np" ] .

The list produced by the pattern match w o u l d resemble:

'.12) [ vp(active.Numb).

v ( N u m b . R . T y p e . _ ) w i t h [[].trans in Type~]].

nr([x.A.~]

.... ).

[]

]

Notice that there w a s nothing in the rule to bind with X. Y or Z. Consequently. these variables were assigned the null list. []. T h e pattern match of the semantics of the

rule will result in an expression which lambda abswacts np" out the of semantics:

(4.13) < [ np" lmda < [ v" & np" ] ]

Finally. the ru/~ crea¢/on phase applies the

transformation to the list produced by the pattern match. and then uses the new list and the template to obtain a new rule. This phase includes conversion of the new list back into rule form. the application of modifiers, and the addition of any extra symbols that appear on the right hand side only. To continue w i t h our *Tample. the trans predicate a.~ociated w i t h (4.10) w o u l d be:

(4.14) trans(N. [ _ 1 . _ 2 . _ 3 . Z ] . [ _ . 3 . 4 . _ 2 1 . . 5 ] )

Notice that the two v p ' s on opposite sides of the metarule do not match. So the transformed list w o u l d resemble: (4.15) [ _ 3 .

4 ,

v ( N u m b . R . T y p e . _ ) w i t h [[].trans in Type,[]].

[3.

_ 5 1

The rule generated by the rule creation phase w o u l d be: (4.16) v p ( p a s s ~ l u m b ) - - >

v ( N u m b . b e . T ~ ) - I w i t h aux in T. v ( p a s t p a r t . R , T y p e . _ ) w i t h t n n s in Type. p p ( b y . _ )

: x # # lmda [ pp(by)" & < [ v" & x # # ] ]

• Notice that the expression " < [ v" & x # # ]'. which is • contained in the semantics of (4.16) was obtained by the

application of (4.13) to x # # .

5. A P P L I C A T I O N S

To examine the usefulness of this t y p e of g r a m m a r

specification, as w e l l as the adequacy of the

implementation, a g r a m m a r was developed that uses the

domain of the Automated Academic Advisor ( A A A )

(Cercone et.al.. 1984). The A A A is an interactive

information system under development at Simon Fraser University. It is intended to act as an aid in "curriculum planning and management', that accepts natural language queries and generates the appropriate responses. Routines

for performing some morphological analysis, and for

retrieving lexical information were also provided.

The SSL grammar allows questions to be posed.

permits some possessive forms, and allows auxiliaries to appear in the sentences. From the base of t w e n t y six rules, eighty additional rules were produced by three metarules in about e i g h t y - f i v e seconds. Ten more rules

were needed to link the lexicon and the grammar. A

selection of the rules and metarules appears in Figure 5-1.

The complete grammar and lexicon is provided in

(Popowich. 1985).

In the interpretations of some ~ m p l e sentences, which can be found in Figure 5-2, some liberties are taken w i t h the semantic notation. Variables of the f o r m wN. where N is any integer, represent entities that are to be instantiated f r o m some database. Thus. any interpretation containing w N w i l l be a question. Possessives. like John's tab/e are represented as:

(5.1) < t a b l e & [John poss table]>

Although m u l t i p l e possessives which associate f r o m left to right are allowed, group possessives as seen in:

(5.2) the man who passed the course's book and in phrases like:

(5.3) John's driver's lice.ace

can not be interpreted correctly by the grammar.

Inverted sentences are preceded by the w o r d Q u e r y in the output. Also. proper nouns are assumed to unambiguously refer to some object, and t h u s are no longer followed by

a unique integer. Analysis times for obtaining an

interpretation are give 9 in CPU seconds. The total time includes the t i m e spent looking for all other possible parses.

(7)

/ - Case ,s d e s c r i b e d by a mask. [N.A,G], w i t h f r e e v a r i a b l e s f o r Ham., Ace. and Gen. * /

add vp(octive.Numb) ~ > v(Numb. Root. T, _) w i t h (Root in [ p a s s . g i v e , t e a c h , o f f e r ] , indabj in T. t r e e s in T ) ,

np([x.D.x] . . . . ) . n p ( [ x . * . x ] . . . . )-1 : <[ v' a np' a np-t' ]

Je WH--<lueetions in i n v e r t e d sentences * / e v c l ( y ~ , V a r ) , NP - np(Case.Numb,Feat)

• ( NPONP ~ > [ ] . |agreement(Case)| : Var )

, ( e ( i n v ) ~ > n p ( [ x , A , x ] , N o m b , F e a t ) w i t h Clword in Feat, e ( i n v ) O n p ( [ x , A , x ] , N u m b , F e a t ) : <[ (Vat lads s ' ) • np' ] ) .

/* p a s s i v e t r e n e f a r n m t i o n e /

add vp(octive.Numb) - - > v(Numb.R.Type.Subtype) w i t h (X. t r e e s in Type0 Y). npo Z : <[? np °] mE> vp(poss,Humb) ~ > v(Numb,be,T,S)--I w i t h aux in T,

v(Numi:gpaetpart, R. Type, Subtype) w i t h (X, t r e e s in Type, Y), Z. o p t i a n a l ( p p ( b y . _ ) ) : x ~ Imda [ o p t i o n a l " k <[ ? x ~ ] ] . / * sentence i n v e r s i o n */

add v p ( T . [ M i N ] ) ~ > v([MJN],R,Type,S) w i t h (X, aux in Type, Y ) , Z : $em

m > s ( i n v ) - - > v ( [ U I N ] , R , T y p e , S ) w i t h (X.aux in Type,Y), n p ( [ N l , x , x ] , [ M l N ] , _ ) , Z : [ n p ' a Semi.

/ , m e t a r u l e f o r the p r o p a g a t i o n of " h o l e s " in the " s l o s h " c a t e g o r i e s e /

f a r a i l Hole in [ p p ( P r e p , F e a t ) , n p ( C a s e , N o m b , F o o t ) ] . ( f o r a l l Cat1 in [ s ( T y p e ) , v p . p p ( P r e p , F e a t ) , o p t i o n a l ]

• ( f o r a l l Cat2 in [ v p , p p ( P r e p , F e a t ) , n p ( C a a e , N u m b , F o a t ) , o p t i o n a l ]

, ( Cat1 m > X. Cot2, Y : Sem m > C e t l I H o i e m > X, Cat2OHalo, Y : Sen ) ) ) .

Figure 5-1: Excerpt from Grammar

Sentence Query: A n a l y o , e : .

d i d Fred take o m p t l e l . [Fred t a k e s c m p t l e l ]

2.25 eec. T o t a l : 4. 28334 sea.

Sentence: who wonts to teach F r e d ' s p r o f e s s o r ' s course. Semantics: [ <wl • [wl onlmgte]>

wont4

[ <wl • [wl an im at e ] > teach13

<course14 k [ < p r o f e s s a r I S • [Fred pace p r o f o s e a r l S ] > poes course14]>

] ]

A n a l y s i s : 6.58337 eec. T o t a l : 18.9834 ee¢.

Sentence' Query"

A n a l y s i s :

whose course does the student whom John l i k e n want t o be t a k i n g . [ <<the38 student39> • [John like4S <the38 s t u d e n t 3 9 > ] >

wont46

[ <<the38 student39> • [John like4S <the38 student39>]> takeS6

<course29 • [<w3e • [w3e a n im a te ]> pose caurwe29]>

] ]

21.9999 eec. T o t a l : 39.4 sac.

Sentence: Query:

A n a l y s i s :

t o whom daee the p r o f e s s o r want which paper to be g i v e n . [ <the14 p r o f e s s o r l S >

want17

[ x39 givo3S <w7 k [w7 aninmte]> <w21 k [w21 paper22]> ]

]

14.3167 sec. T o t a l : 29.5167 sec.

[image:7.612.54.529.64.731.2]
(8)

g r a m m a r u s e d by SAUMER. except that it has a much smaller lexicon, and allows neither relative clauses nor

possessive forms. Running on the same machine as

SAUMER. ProGram required about 35 seconds to parse the sentence

does John take cmpelOl,

w i t h a total processing time of abo,.u 140 second.~ SAUMER required just o v e r 2 seconds to parse this phrase, and had a total processing time of about 4 seconds.

As it stands, the semantic notation used by SAUMER does "not contain much of the relevant information that " w o u l d be required by a real system. Tense. n u m b e r and adverbial information, including concepts like location and time. w o u l d be required in the A A A . If the SSL description were to be extended, w i t h the resulting system behaving as a natural language interface of the A A A . a more database directed semantic notation w o u l d prove invaluable.

6.

PRESENT IXMITATIONS

Although this application of metarules a l l o w s succinct descriptions of a grammar, several problems have been observed.

Since each metarule is applied to the r u l e base only once. the order of the metarules is v e r y important. In our sample grammar, the passive v e r b phrases were generated before the sentence inversion t r a n s f o r m a t i o n was

processed, and then the slash category propagation

transformations were executed. For the c u r r e a t

implementation, if a rule generated by t r a n s f o r m a t i o n T1 is to be subjected to transformation T2. then T1 m u s t appear before T2. Moreover. no rule t h a t is the result of

. . . . T 2 - c a n be operated on by T I . It w o u l d be preferable to remove this restriction and impose one. that is less severe. such as the finite closure restriction which is described in

(Thompson. 1982) and used by ProGram. W i t h this

improvement, the o n l y restriction w o u l d be that a

transformation could only be applied once in the

derivation of a rule.

The system can not c u r r e n t l y process rules expressed in the Immediate Dominance/ Linear Precedence (ID/LP) format. (Gazdar and Pullum. 1982). With this format, a production rule is expressed w i t h an unordered right hand

side with the ordering determined by a separate

declaration of //near precedence. For example, a passive verb phrase rule could appear something like"

(6.1) vp(pass.[MIN]) - - > v([MIN], be . . . . ).

v ( _ . Root. Type. _ ) w i t h

(Root in [pass.carry.give]. indobj in Type.

trans in Type).

p p ( t o ) .

optional(pp(by)) : x # # Imda

[optional" & <[v" & pp(to)" & x # # ] ]

w i t h the components having a linear precedence of:

(6.2) v(_.be) < v < pp

The result w o u l d be that the

pp(by)

could appear before or after the

pp(to),

since there is no restriction on ' t h e i r relative positions. I f this f o r m a t were implemented, o n l y one passive m e t a r u l e w o u l d have to be explicitly stated. The direct processing of ID/LP gremm~rs is discussed in (Shieber. 1982). (Evans and Gazdar. 1984). and (Popowich. forthcoming).

7. CONCLUSIONS

SSL appears to adequately capture the f l a v o u r of GPSG descriptions w h i l e allowing more procedural control. Investigation into a relationship between SSL and GPSG

grammars

could result in a method for translating GPSG

grammars

into SSL for execution by SAUMER. F u r t h e r research could also provide a relationship between SSL and other g r a m m a r formalisms, such as /ex/c~-funct/on,d

granmu~$ (Kaplan and Bresnan. 1982). The prolog

implementation of SAUMER. allowing left recursion in rules, should facilitate a more detailed s t u d y of the specification language, and of some problems associated w i t h m e t a r u l e specifications. Due to the easy separability of the semantic rules, one could attempt to introduce a more database oriented semantic notation and develop an interface to a real database. One could then examine system behaviour w i t h a larger rule base and more involved transi'ormations in an applications e n v i r o n m e n t like that of the A A A . However. as is apparent f r o m the application presented here and f r o m p r e l i m i n a r y

experimentation (Popowich. 1984) (Popowich. 1985),

f u r t h e r investigation of the efficient operation of this Prolog implementation w i t h large grammars w i l l be required.

ACKNOWLEDGEMENTS

l w o u l d like to thank Nick Cercone for reading an earlier version of this paper and providing some useful

suggestions. The comments of the referees were also

helpful. Facilities for this research were provided by the Laboratory for Computer and Communications Research. This w o r k Was supported by the Natural Sciences and Engineering Research Council of Canada under Operating

Grant no. A4309. Installation Grant no. SMI-74 and

Postgraduate Scholarship #800.

REFERENCES

Cercone. N.. Hadley. R.. Martin F.. McFetridge P. and Strzaikowski. T. D e a i ~ i n ~ a n d a u t o m a t i n g t h e q u a l i t y mmesmment o f a k n o w l e d g e - b a . m ~ s y s t e m : t h e i n i t i a l a u t o m a t e d a c a d e m i c a d v i s o r e x p e r i e n c e , pages

193-205. IEEE Principles of Knowledge-Based Systems

Proceedings. Denver. Colorado. 1984.

(9)

Dahl. V. and Abramson. H. On Gapping G r ~ m m ~ . Proceedings of the Second International Joint Conference on Logic. University of Uppsala. Sweden. 1984.

Evans. R. and Gazdar. G. The P r o G r a m M a n u a l . Cognitive Science

Programme.

University of Sussex, 1984.

Fawcett. B. personal c o m m n n i c a t i o n . Dept. of Computing Science. University of Toronto. 1984.

Gawron. J.M. et.aL Procemiag English w i t h a GenersliT~d Phrase S t r u c t u r e G r a m m a r . pages 74-81. Proceedings of the 2Oth Annual Meeting of the Association for Computational Linguistics, June. 1982.

Gazdar. G. Phrase Structure Grammar. In Po Jacobson and G.K. Pullum (Ed.). The N a t u r e o f Syn~cx.ic Representation, D.Reidel. Dortrecht, 1981.

Gazdar. G. and Pullum. G.K. G e n e r a l i z e d Phrase S t r u c t u r e Gr~mm,~r:. A T h e o r e t i c a l Synopsis. Technical Report. Indiana University Linguistics Club. Bloomington Indiana. August 1982.

Kaplan. R. and Bresnan. J. Lexical-Functional Grarnmar: A Formal System for Grammatical Representation. I n J. Bresnan (Ed.). M e n t a l R e p r e s e n t a t i o n o f G r a m m a t i c a l Relation& M r r Press. 1982.

Pereira. F.C.N.(ed). C-Prolog User's Manual. Technical Report. SRI International. Menlo Park. California. 1984.

Pereira. F.C.N. and Warren, D.H.D. Definite Clause Grammars for Language Analysis. A r t i f i c i a l I n t e l l i g e n c e . 1980. 13, 231-278.

Popowich. F. S A ~ Sentence ,t~nlysi~ Using ]~ETaJ~lL].es (]Pl-el iminal-y Report). Technical Report TR-84-10 and LCCR TR-84-2. Department of Computing Science. Simon Fraser University. August

1984.

Popowich. F. The SAUMER User's Manual. Technical Report TR-85-3 and LCCR TR-85-4. Department of Computing Science. Simon Fraser University, 1985.

Popowich. F. E f f e c t i v e I m p l e m e n t a t i o n a n d A p p l i c a t i o n o f Ulxrestricted Gapping

GrammArS.

Master's thesis. Department of Computing Science. Simon Fraser University. forthcoming.

Radford. A. Tr,~-~t'ormational S y n t a x . Cambridge University Press. 1981.

Schubert. L.K. and Pelletier. F J . From English to Logic: Context-Free Computation of "Conventional" Logical Translation. A m e r i c a n J o u r n a l o f C o m p u t a t i o n a l 1=i~nfi,~tics. January-March 1982. 8(1). 26-44.

Shieber. S.M. D i r e c t Parsing o f I D / L P G r a m m a r . draft. 1982.

Figure

Figure 3-1:
Figure 5-1: Excerpt from Grammar

References

Related documents