• No results found

A Lexicon for a Computer Question Answering System

N/A
N/A
Protected

Academic year: 2020

Share "A Lexicon for a Computer Question Answering System"

Copied!
93
0
0

Loading.... (view fulltext now)

Full text

(1)

America Journal

of

Com~rttational

Lianuistics

Microfiche 83

A

L E X I C O N

FOR A COMPUTER

Q U E S T I O N - A N S W ~ R I

N G

SYSTEM

Deparement of Computer Science Depar~ment

o f

Linguistics

I l l i n o i s I n s t i t u t e of Technology Northwestern U n i v e r s i l y

This i s an expanded v e r s i o n of a paper presented a t t h e F i f t e e n t h Annual Meeting of

the Assooiation

f o r Computational L i n g u i s t i c s , March

1 6 ,

1 9 7 7 . Accepted f o r p u b l i c a t i o n S e p t e m b e r 3 0 , 1 9 7 8 .

Copyright

@

1979

(2)

SlJMMARY OF

A 'L;EXICON FOR A COMPUTER Q U F

ST!!

ON-AHSWCRING SYSTEM

An integral part of any natural language understanding sys tern,

but one which has received very l i t t l e a t t e n t i o n i n application, i s

the lexicon. It i s needed during the parsing of the input text for

making inferences, and for generating language output or performing

some a c t i o n . This paper d i s c u s s e s Lhe principal questions concern-

ing the lexicon as i t relates i n particular t o a question-answering system and proposes a specific type of lexicon to f u l f i l l the needs o f t h i s system.

R a t h e r than make a d i s t i n c t i o n between dictionary and encyclo- pedia, w e have a s i n g l e global data base which we c a l l the lexicon.

Homographs are d i f f e r e n t i a t e d and phrases with fixed meanings are

treated as separate entries. A l l the information i n this l e x i c o n i s encoded i n the form O Z relations and

words

or word senses. These

form a large network w i t h the words as nodes and the relations as

edges. In a d d i t i o n the relations define semsntic f i e l d s and these are used t o treat problems of ambiguity. Relations are use6 to en-

code both syntactic and s m n t i c information. Axiom schemes are associated with each relation and these are used f o r inferencing.

The l e x i c a l r e l a t i o n s then are a t the heart (or brain) o f the system for representation, r e t r i e v a l , and inferencfng.

(3)

been

in£

luenced by

the

work

of 4presfan, Me1

cuk,

and Z o l k o v s ~ y

.

The

lexical r e l a t i o n s we

have

p o s i t e d are

the

t r a d i t i o n a l svnonvmy and

antonymy, taxonomy, part whole, gradfig and approximately f o r t y o t h e r s .

The

whole

set,

deliberately

l e f t open ended, is subdiudded i n t o n i n ~

subsets which include attribute

relatibns,

coLl~cational

r e l a t i o n s

and paradigma ti* onEs.

Each

relation has

its own

lexical

entry givhng its properties

and

t e l l i n g how to interpret

lexical

r e l a t i o n s h i p s

i n

a f i r s t order

predicate calculus form. lor example, the

tnformatfon

for the lext- cal entry

dog

gncludes t h e statement

dog

T

animal,

that is, t h a t a

dog

is

a kind of

mim~z.

The

l e x i c a l

enfry

f o r

T,

t h e taxonomic re- lation, i n its turn

includes

infprmatic

,

which allows

the

statement

to

be interpreted

as

HoZda(Ncom(dog,X))

-

RoZds(lYcorn(anima1,X)).

The

inventory

of r e l a t i o n s i s

expandable

s i m p l y by adding l e x i c a l entries

for new

r e l a t i o n s .

In

a d d i t i o n having both

the

l e x i c a l

en-

t r i e s and the relations in the entrdes expressed in t h e same nota-

t i o n a l form as t h a t of input sentences, namely in a Eirst order predicate

calculus

notation, allows for a c o n s i s t e n t , coherent, and

(4)

T A B U OF CQNTENTS

. .

m . . . . . . . . . . . . . . . . . . . . 1, In t r o d u ~ t ion

. . .

2. Design Decfsiohs

a . The Dictionary and the Encyclopedia

-

One Data Base

. . .

or TWO

# .

b . Lexical Models

-

Componential ~ e a t u k e Analysis v s .

. . .

Relational Networks

c . S e l e c t i o n Preference

. . .

d . The Homonymy

-

Polysemy Problem

-

Criteria for

SeparateEntries

. . .

- - -

.

e . Idioms

f. Preliminary Design Decisions f o r the Lexicon

. . .

3 . Some Theories of Lexical Relations

. . .

4 . The Set of Lexical Relations

. . .

a .

me

C l a s s i c a l Relations: Taxonmy and Synonymy

. . .

3 . Antonymy *

. . .

. . .

C . G r a d i n g

e m Parts and Wholes

. . .

. . .

f . Typical Case Relations

. . .

. . .

g . Other Callocation Relations

h . Paradigmatic Ke1;1tions

. . .

j

.

I n f l e c t i o n a l Relations

. . . * . . .

5. The Organization of the Lexicon and the Semantic

Representations

. . .

6 . Tha rorm of the L e x i c a l Entry

. . .

. . .

7. Sunnary

. . .

A p p e n d i x I. The Semantic Representations

(5)

I. INTRODUCTION

The l e x i c o n p r e s e n t e d h e r e i s b e i n g developed a s a n i n t e g r a l p a r t

of a computer question-answering system which answers m u l t i p l e - c h o i c e q u e s t i a n s about simple c h i l d r e n ' s s t o r i e s , It t h u s must make informa- t i o n r e a d i l y a v a i l a b l e f o r t h e p a r s i n g p r o c e s s , f o r b u i l d i n g an i n t e r n a l

nodel O F t h e s t o r y b e i n g r e a d , and f o r making i n f e r e n c e s . Knovledge

nbout words and knowledge about t h e world must both b e s t o r e d i n a com-

pact b u t mediately a c c e s s i b l e form.

Many d e c i s i o n s must b e made, t h e r e f o r e , about t h e d e s i g n of t h e l e x i c o n . The f i r s t problem i s t o d e c i d e on a n o r g a n i z i n g s t r u c t u r e ,

Should Lexical and " e n c y c l o p e d ~ c " i n f o r m a t i o n b e s t o r e d separately o r t o g e t h e r ? Which ltems w i l l have s e p a r a t e l e x i k 3 l entries? Which w i l l be included i n o t h e r e n t r i e s ? What a b o u t homonymy and polysemy? What

connecting l i n k s between words and word s e n s e s w i l l b e recorded and how?

The n e x t problem i s t o determine a c h a r a c t e r i z a t i o n of word mean-

i n g s . This l e a d s t o some d e e p t h e o r e t r c a l q u e s t r o h s What kind of

l e x i c a l semantic r e p r e s e n t a t i o n s are appropriate? What i s t h e structure o t t h e s e r e p r e s e f i t a t i o n s ? What are t h e semantic primes, t h e elements of t h a t s t r u c t u r e ? The d e s i g n of the l e x i c a l e n t r y i s t h u s s u b j e c t t o theo- r e t i c a l b i a s e s a s w e l l as the p r a c t i c a s c o n s t r a j n t s of space, r e t r f e v a l e f f k i e n c y , and e f f e c t i v e s u p p o r t of rnference-making.

The d e c i s i o n t o u s e l e x i c a l r e l a t i o n s as fundamental elements of

(6)

informatgon makes up.a s i g n i f k a n t part o f , t h e lexical e n t r y ,

Lexical r e l a t i o n s offer s i g n i f i c a n t advantages. They aS.Low us to generalize f a m i l i a r gnference patterns into axiom schemes. They can en- capsulate the defining formulae of the commercial d i c t i d n a r y . They have an iritwitive a p p e a l which w e b e l i e v e reflects a certain measure of psy- c h o l o g i c a l r e a l i t y . On a p r a c t i c a l l e v e l they a l l b w ~ t l s t o express both syntactic and semantic informa-tion i n a form that i s compact and eaay t o r e t r i e v e . They can be used i n many ways. For example, the following paragraph from a test administered t o f i r s t and second g r a d e r s - b y

,

l o c a l school system says:

(Pl) Ted has a puppy. H i s name i s Happy. T e d and Happy l i k e to play.

(Ql) The p e t i s a: dog boy toy

In order t o answer t h i s question w e need to know what p& means. In our Lexicon the lexical entry for pet contains a simple definition. a pet i s an animal that is owned by a human. In order t o answer t h i s question w e also need t o know that a puppy is a young dog. 'This information in pre- dicate c a l c u l u s form would b e p a r t o f the lexical e n t r y f o r puxpy. W e would, of course, need axioms o f t h e same form as w ' e l l f o r the entrqies

for

kitten, jamb, etc. Instead of such a representation w e express t h i s ;information b y uslnp, a l exical r e l a t i o n , CIIILD. The Lexical entry for puppy contdins CHILD Jog. S i m i l a r l y , the lexjcal entry for kitten con- tains CHILD cut; w h i l e the lexical entry for CHILD contains the axio~tl scheme from which t h e relevant axioms are formed when needed.

We treat verb& in a s i m i l a r way. Corresponding to each case re- Lation there is a lexical relation which points to t y p f c a l . f i l l e r s of

(7)

also i h c l u d e s ~

T

make where T 1s t h e well-known taxonomy r e l a t i o n , so t h a t i f t h e s t o r y says t h a t "Mother baked a cake " w e can1 in t e r t h a t s h e inade

one add CAUSE

bakel

s o t h a t w e car deduce that t h e cake has baked he s e l e c t Lon r g s t r i c t i o n s t h a t h e l p u s t e l l i n s t a n c e s sf bakel and bdke2 apn'rt can a l s o be expressed compactly u s i n g t h e T r e l a t i o n . W a l s o e need t o make d e d u c t i ~ n s from main verbs i n p r e d i c a t e complemhnt con- s t r u c t i o n s , deduetions such ag t h e s p e a k e r ' s view of t h e t r u t h of t h e

proposition stated i n the complement a s d e r i v e d from t h e f a c t i v i t y of t h e verb. I n o r d e r t o answer s e v e r a l q u e s t i o n s from t h e t e s t c i t e d

above the r e a d e r must infer t h a t everything t h a t Mother says i s t r u e . ~ e x i c a l e n t r i e s f o r main verbs t h a t t a k e p r e d i c a t e complements contakn p o i n t e r s t o t h e i m p l i c a t i o n c l g s s . These r e l a t i o h s can then be expanded t o g i v e t h e proper axioms.

The l e x i c o n i n c l u d e s s e p a r a t e e n t r i e s f o r each derived form u n l e s s

the root can be i d e n t i f i e d by a simple suffix-chopp-tng r o u t i n e . L e x i c a l r e l a t i o n s are u s e f u l , h e r e too i n saving space. The l e x i c a l e n t r y f o r

man

c o n t a i n s PLURAL men. The 1ex;bcal e n t r y f o r

went

c o n s i s t s of PAST go, The lexical entry f o r death consJ.sts of NOMV die. There are, as w e l l ,

l e x i c a l e n t r i e s f o r some m u l t i p l e word expressions such a s b i r t h d a y par ~IJ,

baZZ g m , piggy bank, and thank you.

As t o t h e form of p r e s e n t a t i o n h e r e , t h e next s e c t i o n p r e s e n t s some

of t h e p r a c t i c a l problems and t h e o r e t i c a l c o n v i c t i o n s which determined our most c r i t i c a l design d e c i s i o n s . Then, a f t e r a b r i e f d e s c r i p t i o n of some

e a r l i e r developments i n t h e theory of l e x i c a l r e l a t i o n s , we e x p l a i n t h e system of r e l a t i o n s which s t r u c t u r e o u r l e x i c o n , d i s c u s s i n g each group of

(8)

8

2. DESIGN ~JECISIONS

The l e x i c b n i n t h i s system must make i n f o r m a t i o n r e a d i l y available

f o r p a r s i n g , f o r b u i l d i n g t h e s t o r y model, and for making i n f e r e n c e s

d u r i n g q u e s t i o n answerivg

.

So t h knowledge about words and knowledge about

the world must b e s t o r e d i n a compact b u t immedrlately a c c e s s i b l e form. T h e r e f o r e , many d e c i s i o n s must b e made concerning t h e d e s i g n of t h e lexi-

con The problems involved i n c l u d ~ t h e a s g a n i z a t i a n of l e x i c a l and ency-

c l o p e d i c i.nformation, the c h o t c e of a l e x d c a l model and t h e determ3nafion of a p p r o p r i a t e s e r a n t i c primes, t h e r e p r e s e n t a t i o n of s e l e c t i o n prefer-

e n c e s , t h e r e c o g n i t i o n and s t o r a g e of homographs, and t h e criteria f o r

e s t a b l i s h i n g s e p a r a t e e n t r i e s f o r idioms and o t h e r f i x e d phrases. Th i s

paper a t t e m p t s t o develop some c o n s x s t e n t s o l u t i o n s t o these problems,

sol u t l o n s which determine t h e d e s i g n d e c i s i o n s f o r t h e l e x i c o n i n t h i s ques t ~ o n - - r n s a erinp s y s t e m .

a. 1'he dze-tzonary a d the elzcgctoy~dza

-

one data-base o r two7

Any question-answering system must uBe l e x i c a l i n f o r m a t i o n i n a t

least two ways, In p a r s i n g and i n making i n f e r e n c e s . The f l r s t c r i t l -

c a l dec5s50n t h a t must b e made 9s whether tm s e p a r a t e data-bades a r e needed t o s u p p o r t t h e s e sepdrate f u n c t i o n s o r whether a s i n g l e u n i f i e d

~ l o b a l datq-base i s better. T r a d i t i o n a l l y human b e j n f s have used two

separate stores of inf orma tlon, the d i c t i o q a r y and t h e encycl o p e d i a .

Some linguistic and c o m p u t a t i o n a l mode1.s of language have also been

based on t h e assumption t h z t i n f o r m a t i o n about words should b e s t o r e d i n t w o separate co1 l e c t i a n s .

Ln Chomsky's f l ~ p p ~ t o m o d e l (1965) t h e r e a r e two separate storage

p l z c c .; for lexical Informztf o n , one in t h e base component 2nd a n o t h e r i n

(9)

9

'

dictionary" and semantic inf o m a t i o n i n an

'

encyclopedia" Winograd

(1971) has two separate word lists, one used by the parser and one by the semantic routines, even though the parslng and the semantic routgnes are very closely interwoven in his BLOCKS s y s t e m

Before deciding on whether to carry on this tradition one must ask

whether there is r e a l l y a clearcut d ~ s t i n c t i - o n between these two klnds

of lexical information Is there a simple algorltnm f o r deciding which

data should go where?

B z e m ,

Bzemzsch,

m e f e r ,

and

t h e

Smantzc

Funct

ton

of

the

Lpxzcon

Both the dictionary and the encyclopedia are ways o f recording i n - formation stored i n human memory But human memory n s probably not orga- nized i n the usual graphic form of an alphabetic word l i s t , therefore alternative memory s-tructures should be examlned One such a l t e r n a t ~ v e

has been p r e s e n t e d by Bierman (1964) In h i s system lexical-semantic f i e l d s &re prlmary, they define the b a s i c o r g a n i z a t i o n o f semantic In- formation The function o f the lexicon,

if

it has one

4x1

the semantic domain, i s t o index thebe fields, t o store pointers t o

the

l o c a t i o n of

a word In t h e varlous f i e l d s c o n t a i n i n g it An appropr:iate image f o r such a system is a very large single page d i c t i o n a z y with Language speci- f l c nodes connected by semantic relations (See a l s o Werner

1969)

Can the dictionary and the encyclopedia be d i s t m g u i s h e d i n this

context? Bierwisch and Kiefer (1970) assume that b o t h kinds of informa- t i o n are contained i n the same lexical entry The distinction between

(10)

The

core

o f , a l e x i c a l reading c o p p r i s e s a l l and only t h o s e semantic s p e c i f i c a t i o n s t h a t determine, roughly

speaking, i t s p l a c e w i t h i n t h b system of d i c t i o n a r y e n t r i e s , i. e. d e l j m i t i t from o t h e r (non-synonymous') e n t r i e s . The

periphez-8

c o n e l s t s of t h o s e m a n t i c s p e c i f i c a t i o n s whikh could be removed from

i t s

r e a a i n g withour changing i t s r e l a t i o n t c o t h e r l e x i c a l r e a d i n g s w i t h i n t h e same grammar (2bid: 69-70)

Unfortunately they do n o t s p e q i f y whether t h e lexical-gemanttc r e l a t i o n s

which form t h e s t r u c t u r e of t h e f i e l d s are p a r t of t h e c o r e o r t h e p e r i -

The major d i f f i c u l t y with this crikerion i s i t s i n s t a b i l i t y . A s new e n t r i e s a r e added t o t h e system, information s u f f i c i e n t t o O i s t i n g u i s h one

e n r r y from another may have go b e s h i f t e d from t h e periphery t o t h e c o r e

--and t h u s from t h e encyclopedia t o t h e lexicop. For i n s t a n c e , s b ~ p o s e a

new e n t r y , "2eopard--a l a r g e , w i l d c a t " i s t o be added. The e n t i r e l e x i c o n must be ~ e a r ' c h e d f o r e n t r i e s which mention large w i l d c a s s . I f one i s

found, s a y "lion--a l a r g e wlld c a t " , then enough information must b e added

t o both d e f i n i t i o n s t o d i f f e r e n t i a t e

leopard

and z w f l from each other.

Soviet Lexicography and the Lmical Universe.

Apresyan, iolkovsky, and el

'

Zuk run i n t o t h e same d i f f i c u l t y of d i s - t i n g u i s h i n g d i c t i o n a r y and encyclopedic $nformation i n a t t e m p t i n g t o d e f i n e

t h e l e x i c a l u n i v e r s e of a word C 0'

The main themes d e a l t w i t h under t h e heading ' l e x i c a l u n i v e r s e ' a r e : 1) t h e t y p e s of C o ; 2 ) t h e main p a r t s o r phases of Co; 3) t y p i c a l s i t u a t i o n s o c c u r r i n g % e f o r e o r a f t e r Co etc. Thus, t h e s e c t i o n l e x i c a l u n i v e r s e f o r t h e ward

skis

consis-ts of a l i a t of t h e t y p e s of s k i s

(racing,

mountain,

jumping,

hunting),

t h e i r main p a r t s ( s k i s prdper and

bindings),

t h e main o b j e c t s and a c t i o n s n e c e s s a r y 2o.r

t h e c o r r e c t u s e ( e x p l o i t a t i b n ) of s k i s ( s t i c k s , grease, $0

WLZE), t h e main types of a c t i J i t i e s connected w i t h skis (a

(11)

information about

skis

( t h e i r h i s t o r y , t h e way

they are manufactured, etc.) i s n o t supplied hgre: t h e s e c t i o n s contain only such words and phrases as are necessary f o r t a l k i n g on t h e t o p i c , and nothing e l s e . (1970:19,)

The problem here

is

that

"what

is

needed

f o r t a l k i n g about t h e toqic" hepends very much on who i s going t o do t h e t a l k i n g . The d e f i n i t i o n of

slEi

i n

Webster

's

Nm

tntemationaZ

(2nd Edition) begins :

One of a p a i r of narrow s t r i p s of wood, metal, o r p l a s t i c , u s u a l l y i n conibinatian, bound one on each f o o t and used fox g l i d i n g over a snow-covered surface.

Apresyan, iolkovsky, and

Me1

cuk do n o t provide f o r three of the items, mentioned here: what s k i s a r e made of (wood, p l a s t i c , or m e t a l ) ,

what

shape they come i n (long and narrow) and where they belong s p a t i a l l y

(on t h e f e e t ) . Yet t h e s e items could b e e s s e n t i a l i n understanding in- ferences i n a s t o r y .

It was snowing, Jim took o u t h i s skis. H e waxed t h e wooden s t r i p s . . . .

You could need t h i s information i n answering questions. Jim s k i d down t h e mountain....

What was he wearing on h i s f e e t : s l i p p e r s s k i s skates?

Although

i n English o r Russian i t i s p o s s i b l e t o refer t o s k i s without knowing t h a t t h e y a r e long and narrow, i t i s n o t p o s s i b l e i n Navajo where physical shapes determine

verb

forme. While t h e entry i n Webster's goes

on a t l e n g t h beyond t h e sentence given above, i t does not i n c l u d e a l l

V

the i t e m s which Apresyan, Zolkovsky, dhd Mel'guk mention. This, however, is n o t surprising; t h e boundaries o f t h e l e x i c a l universe a r e not w e l l

(12)

Difficulties

in

updating a system

with

separate dictionary and encyclopedia,

This lack of definition cahses tremepdous problem in a dynamic system.

A "real" d i c t i o n a r y

ar

encyclopedia, t h e one i n a p e r s o n ' s b r a i n , is

c o n s t a n t l y changing. I n f o m a t i o n i s added, c o r r e c t e d , apd perhaps l o s t .

A truly i n t e r e s t i n g memorymodel must be dynamic. T h e problems of u p d a t i n g t h i s information are n o t easy t o s o l v e , t h e problem of d i s t i n g u i s h i n g be- tween d i c t i o n a r y and encyclopedic i n f ~ m i a t i o n i n t h e updating p r o c e s s

seems i n s u p e r a b l e ,

Recognizing d e f i n i t i o n s phrased i n o r d i n m y E n g l i s h i s a l r e a d y , d i f f i

cult (Bferwisch and R i e f e r 1970, Lawler 1972). D e t e r m h i n g t h e r e l i a b i l i t y of such f n f o r n a t i o n i s a l s o a problem and t h e dichotomy of d i c t i o n a r y and

eacyclop&dia i n c r e a s e s t h i s d i f f i c u l t y . U n f o r t u n a t e l y i n f ~ r m a t i o n does

pot cpme n e a t l y packaged and marked " f o r t h e dict30nary1' o r " f o r

the

en-

c y c l o p e d ~ a " . And a d d i t i o n of ififormation t o one p a r t of the e n t r y may n e c e s s i t a t e updating ' o t h e r p a r t s of th-e e n t r y . For example, i f w e l e a r n

t h a t record is a v e r b as well as a noun w e need t o add morphological i n - f o r m a t i o n , l e s c r i b e the r e l a t i s o n s between record and write, and we should #

probably describe r e c o r d i n g materials. Mention must be made t h a t

record

i s a f a c t i t v , i . e , i f someone r e c o r d s t h a t something happened, one can

assume t h a t from t h e s t a n d p o i n t o f t h e speaker t h e something r e a l l y d i d happen. W h i ~ h of t h i s i n f o r m a t i o n i s d i c t i o n a ~ y i n f o r m a t i o n and which i s

encyclopedicf And once this d e c i s i o n i s made, i n f o r m a t i o n added to t h a t e n t r y may r e q u i r e a d d i t i o n s t o o t h e r e n t r i e s I n t h e record example, t h e e n t r i e s for ebase and write would have t o b e updated. Also, a d e c i s i o n

(13)

The w r k of Kiparsky and Kiparsky (1970)

,

Lakof f (1971) and KcCdwley,

(1968) h a s shown t h a t s y n t a x and semantics cannot be s e p a r a t e d i n t o such

n e a t compartments. But i f s y n t a x and semantics a r e interwoven then does

i t make sense t o put s y n t a c t i c information i n one box and semadtic informa- t i o r i n a n o t h e r ? The enswer t o this questgon given a t l e a s t by g e n e r a t i v e semantics c a l l s i n t o q u e s t i o n t h e t r a d i t i o n a l distinction between t h e d i c -

e".&

t i o n a r y and t h e encyclopedia.

We accept the generative semantikist arguments that syntax a d seman-

t i c s cannot be s e p a r a t e d and t h u s do n u t s e p a r a t e s y n t a c t i c and semantic

i n b r m a t i o n . Furthermore, as shown above t h e r e seem t o be no p r a c t i c a l cr,iteria f o r d i g t i n g u i s h i n g d i c t i o n a r y i n f ~ r m a t i o n from encyclopedic in-

formation.- Thus our system h a s one s i n g l s g l o b a l d a t a base. For b r e v i t y

and' s i n c e i t i s a kind of v l l e c t i o n of words, i t w i l l be c a l l e d " t h e lexicon".

b.

Lex{eaZ

Models

-

ComponentiaZ

Feature

AnaZysis vs.

ReZationaZ

Netr~orFs.

A second c r i t i c a l l y important d e c i s i o n i n v o l v e s t h e c h o i c e of an

a p p r o p r i a t e l e x i c a l model, t h e determination of what semantic primes t o

u s e and how t h e y should b e combined i n l e x i c a l s e n a n t i c s t r u c t u r e s . Two

importarit competing models a're provided by componential f e a t u r e a n a l y s i s and by r e l a t i o n a l networks. I n a componential a n a l y s i s model the p r i m e s

are semantic features and words are deffried by bundles of features. This is a n a t u r a l e x t e n s i o n of t h e d j ~ i n c t i v e feature approach t o phoneme

d e s c r i p t i o n which h $ ~ ; been used to e x p l a i n many phonological phenomena. C e r t a i n p r a c t i c a l problems a r i s e . The number of words i n any 1anguage'Fs

f a r l a r g e r than t h e number of phonemes. The number of d i s t i n c t i v e fea-

(14)

14

the phoneme-phonetic f e a t u r e natrix. I n a d d i t i o n , t h i s matrlx woyld b e extremely s p a r s e , Also, i t $8 n o t clear Qhether a l l t h e e n t r i e s i h t h i s

malcr* cd'uld b e

+/-

as i n a phoneme

m a r i x .

Axe semantic f e a t u r e s e i t h e r d e f i n i t e l y a b s e n t o r b f i n i t e l g p r e s e n t d r a r e some f e a t u r e s p r e s e n t by

d e g r e e s ? The s i z e of the c o m p ~ n e n t i a l a n a l y s i s matrix would imfnediatelg i h t r o d u c e d i f f i c u l t i e s i n a computerized model. F o r t u n a t e l y , b o t h numeric

c31 a n a l y s i s and document r e t r i q v a l o f f e t experience i n h a n d l i n g immense

matrices by machipe. When a s i x i s extremely s p a r s e i t t u t n a o u t t p b e s e n s i b l e t o s t o r e a l i s t of e n t l r r e w i t h SOW and wlumn numbers Here i t

would mean s t o r i n g a l i s t of f e a t u r e s f o r each word. This. i n f a c t , i s c l o s e t o K a t z ' s p r o p o s a l (1966).

Ip a r e l a t z a n a l network model, however, t h e primes are r e l a t i o n s and

words o r word s e n s e s . R e l a t i o n s connect w'osds t o g e t h e r i n a network i n

wnlch the woPds a r e nodes and t h e r e l a t i o n s a r e edges. In f a c t , wo.rds,are defined i n terms of t h e i r r e l a t i o n s h i p s t o o t h e r words.

These *models d i f f e r r a d i c a l l y i n t h e i r approach t o t h e c r i t i c a l Lexi-

c a l t a s k p f f i n d i n g r e l a t e d words. I n t h e componential a n a l y s i s model re-

l a t e d words s h a r e r e l a t e d f e a t u r e s . Prl~.;umably, t h e more f e a t u r e s two

words s h a r e t h e more c l o s e l y r e l a t e d khey a r e . Thus, some kind of c l u s t e r

a n a l y s i s must b e u s e d t o i d e n t i f y r e l a t e d worgs. I n t h e r e l a t i o n a l network

model t h e l e x i c o n i s Eormed from 2eTatfionships between words. Thus relatl ed

words a r e immedzately a v a i l a b l e

Both models, componential and r e l a t i o n a l , r e q u i r e a s e a r c h f o r semanric primes The componential a n a l y s i s model r e q u i r e s t h e d i s c o v e r y of p o s s i b l y

thousaads 0% semantic f e a t u r e s

.

For a r e l a t i o n a l network m o b 1 an inventc ry

of l e x i c a l relatAons and theil; p r o p e r t i e s must b e developed. T h i s i s apparent-ly

(15)

w

number of rekvant r e l a t i o n s i s probably q u i t e small. Zolkovsky and ~ e l ' & k (1970) list about f i f t y i n t h e i r paper.

R e l a t e d t o b o t h of t h e s e models i s t h e n o t i o n of semant5c f i e l d s . I n t u i t i v e l y , semantlc f i e l d s are c o l l e c t i o n s of r e l a t e d words used t o t a l k about a p a r t i c u l a r s u b j e c t . Semantic f i e l d s seem-to o f f e r some

h e i p i n coping w i t h t h e problems of ambiguity and c o n t e x t . Many u t t e r -

ances, t a k e n o u t of c o n t e x t , are ambiguous. But remarhlQy, p e o p l e almost n e v e r p e r c e i v e t h i s ambf g u i t g

.

They immediately choose

the

c o r r e c t word s e n s e and i g n o r e t h e o t h e r s . ~ ~ ~ a r e i t l ~ the t o p l c of con-

v e r s a t i o n deterrdines a s'emantic f i e l d and t h e word s e n s e chosen i s t h e one which lj e s i n t h i s f i e l d . The semantic f i e l d pomehow d e f i n e s t h e

I I

v e r b a l c o n t e x t . ( O r as F i l l m o r e 1977 :59 p h r a s e s i t meanings are re-

l a t l v i z e d t o scenes". )

The componential a n a l y s i s model makes i t p o s s i b l e t o d e f i n e d i s -

t i n c t s e m a n t i c f l e l d s , b u t g e t t i n g ,from one word nn t h e f f e l d t o t h e o t h e r s

may take a sigrrifxcant amount of processing time. b e r y s e t o f semantic

markers c a n be used t o d e f i n e a semantic f l e l d ; t h e f i e l & c o n s i s t s of a l l t h e words wldh d e f i n i t i o n s c o n t a i n i n g the markers. The smaller t h e number

of,markers t h e l a r g e r t h e f i e l d o b t a i n e d . I t i s p o s s i b l e t o d e c i d e i m e -

d i a t k l y whether aogfven word i s i n t h e f i e l d o r n o t , j u s t by checking i t s

l i s t of markers.

I n t h e r e l a t i o n a l network model r ~ l a t e d words a r e e a s y t o f i n d , b u t the b o u n d a r i e s of semantic f i e l d 5 are e x t r e m e l y f u z z y and i n d i s t i n c t . A semantic f i e l d can be d e f i n e d by s t a r t i n g a t a p a r t i c u l a r node and going

a given number of s t e p s i n any d i r e c t i o n . The semantic f i e l d s o b t a i n e d

(16)

Certain b a s h philosophical -psycho1 o g i c a l assumptions may create

a strong bias f o r one o f these models over the other. Someone

who

b e l i e v e s that semantic

features exist

as Platonic ideals

or

who accepts

them as psychological r e a l i t i e s may easily find componential a n a l y s i s a most natural kind of description and regard the necessary search for features or sememes as highly relevant. Someone who Feels that "There

i s no thought without words" would be much mre

likely

to prefer a re-

lational network description. A lexicon 2 8 , in an important sense, a memory model. Intuitio'i abbut our own internal memory models must have a strong influence on the lexicon w e s h a ~ e .

We have chosen a r e l a t i o n a l network model for both i n t f i i t i v e and

practical reasons. We find lexical-semantic r e l a t i o n s theoretically

i n t e r e s t i n g (see Evens et

al.

ms), Useful mventorles of thesd re-

lations are available, i n a l a t e r s e c t i o n we describe some of these

sources. As will b e show they proviae a

convenient

way of

s t o r i n g

axiom schemes far deductive inference.

A s l e x i c a l semantic structures @e use t h e same first-order pre-

dicate calculus notation in

which

semantic representations a r e written in the que9tion-answering system

-

meanings of words and meanings of

s e n t e n c ~ s must have t h e same underlyjng form As McCawley (1970) h a s

argued " d e n y i s t " and "doctor who creats teeth" must contain the same u n i t s o f mehnlng t i e d together i n the same %ray

c "'e Zectzqn Preferences

A third important problem to be faced in constructing a l e x i c o n

(17)

Chomsky (1965) developed the theory of s e l e c t i o n r e s t r i c t i o n s

i n

order t o block t h e generation of nonsense sentences i n t h e s y n t a c t i c com-

ponent of h i s model, The l e x i c a l e n t r y f o r

frzghten,

f o r examae,

contains the information that i t r e q u i r e s as object a noun with the featur [+animate]

,

while drznk r e q u l r e s an animate s u b j e c t

.

I f t h e s e c o n d i t i o n s a r e n o t met, generation i s blocked. S e l e c t i o n a l r e s t r i c t r i o n s seem much

too r e s t r i c t i v e . Traller trucks drink diesel f u e l and the

earth

d r i n k s i n the r a i n .

I n

d e s c r i b i n g dreams w e can invent p e r f e c t l y a p p r o p r i a t e sentences i n which inanimate o b j e c t s by t h e dozens g e t up, run around,

and d r i n k u n t j l frightened back t o place. S t i l l i t is %rue t h a t sen- tences l i k e t h e s e are somehow more s u r p r i s i n g than sentences i n which

cows d r i n k from a brook and a r e frightened by l i g h t h i n g . We need some method ~f recording t h e ordinary, everyday ways i n which words combine

w ~ t h o u t excluding t h e unusual, t h e p o e t i c , t h e nletaphoric uses, We

k i l l c a l l them

seZectzonprefe~ences

i n s t e a d of s e l e c t i o n restrictions.

Some truly semantic means of i d e n t i f y i n g semantic anomalies a r e needed. Raphael mentions t h i s question r a t h e r c a s u a l l y , almost as an a s i d e i n t h e SIR paper. He draws taxonomic t r e e s , one f o r t h e nouns and one f o r t h e verbs, from t h e vocabulary of a first grade reader.

Then he makes statements l i k e t h i s

1. Any noun below node 1 i s a s u i t a b l e subject f o r any verb below node 1'.

2 . Only nouns below nodes 3 o r

4

may b e subjects f o r verbe below node 3'. (1968, p. 51)

(18)

The

complete model cohlposed of tree structures

and statements abodt their possible

connectionp,

is

a representation f o r the class o f a l l pos-

s i b l e events. In other words, i t represents the computer's knowledge of the world. We

now

have a mechanism f o r t e s t i n g the

'

coherence'

or

'meaningfulness' of new samples of t e x t . (1968, p .

51)

Werner (1972) has suggested a method for handling the s e l e c t l o n a l prob-

l e m which uses noun taxonomies i n very much the same way that Raphael does.

H i s proposal includes an elegant way of storing s e l e ~ t i o n a l information

within h i s memory model. In

his

network model, noun phrase arguments are

connected t o the

verb

by prepositions. The node representing the lexical entry f o r the verb

has

arcs connecting i t to compound nodes, one for each

p r e p w i t i o n which can b e used with the verb. The o b j e c t of each prepo- s i t i o n i s a node i n the noun taxonomy. This noun or any noun below ft

[image:18.744.135.614.578.739.2]

i n the taxonomy may serve as an argument f o r the verb. Hem i s an over- simplified example o f a network for saZ2.

Figure

1:

Werner's Ansyer t o the Selection Problem

This network says that oeZZ takes a human subject, a t h i n g as

o b j e c t , the preposit-n to followed by a human, t h e preposition

Jbr

followed by money. The square brackets around [human] indicate that this

i s j u s t a pointer t o the top noun i n the taxonomy for human beings.

Any node i n t h i s taxonomy below

the

node marked human, whether i t i s

(19)

not use the

verb

taxonomy as Raphael does. Each

verb

haa

its

own

set

of s e l e c t i o n i n d i c a t o r s

f n h i s discussion of the goals of a semantic theory Winogtad

describes

semantic markers and s e l e c t i o n r e s t r i c t i o n s

,

quotes Katz and Fodor (1964)

and i n d i c a t e s t h a t he i n t e n d s t o mbody t h i s theory i n h i s system. But

i n f a c t semantic markers i n t h e

BLOCKS

program

are

derived from a marker t r e e (Winograd, 1971, Figure

59)

which i s organized taxonomically.

In

t h e implementation process Winograd seems t o have moved from .a s t r i c t

Katz-Chomsky p o s i t i o n t o a p o s i t i o n somewha1 c l o s e r t o RaphaeJ. a

J

Werner.

The Raphael and Werner proposals are t h e guiding p r i n c i p l e s here, ajapted t o a c c m o d a t e case-defxned arguments. The lexlcal entry f o r

mooel,

t h e i n t r a n s i t i v e pave, must t e l l u s about s e l e c t i o n as w e l l as how t o r e l a t e s u b j e c t , o b j e c t , and p r e p o s i t i o n a l phrases t o cases. The in-

formation is organized t h i s way:

gramnutical function

case

frame

selection

i n f a r a t i o n

1. s u b j e c t &per fencer thing

move1 # 2.

from

source t h i n g , place

3,

to,

into,

onto goal t h i n g , place

The

numbers 1, 2 ,

3 ,

indicate argdment

positions

for t h e predicate c a l ~ u l d ~

r e p r e s e n t a t i o n . The next column lists the grammatical function. Next

come dase i n d i c a t i o n s . Last comes t h e s e l e c t i o n information, the t o p

node i n t h e r e l e v a n t p a r t of t h e taxonomy. For move1 the subject i s an

experiencer.

The

source i s u s u a l l y marked by t h e prepositaon f m m . The

(20)

c a n be a physical argument o r

thzw,

t h e s o u r c e and goal can both b e p l a c e s . There is a r u l e t h a t any p h y s i c a l g o a l can b e r e p l a c e d by a

c l a s s of adverbs c o n t a i n i n g

back

and there, s o t h e s e a l t e r n a t i v e s do n o t have t o b e l i s t e d ,

An a t t e m p t i s being made t o u s e t h e verb taxorlomy as Raphgel suggested. I n t h i s l e x i c o n go i s marked a s taxonomically r e l a t e d t o

move. The e n t r y f o r go does n o t c o n t a i n t h e i n f o r m a t i o n l a b e l l e d #

above. I n s t e a d , when t h i s i n f o r m a t i o n i s needed, the look-up r o u t i n e climbs t h e taxonomic t r e e i n t h e l e x i c o n u n t i l i t f i n d s a v e r b which h a s this information and oopies i t from t h a t e n t r y . Thus i t gets,case-

argument and s e l e c t i o n a l i n f o r m a t i o n f o r

go

from t h e e n t r y f o r

move.

It i s n o t c l e a r y e t whether t h i s w i l l r e a l l y work with a s i z a b l e vo-

cabulary.

T h i s selec t i o n a l informa t i o n is t r e a t e d a s s e l e c t f o n p r e f e r e n c e

and n o t s e l e c t i o n r e s t r i c t i o n . Each c a n d i d a t e word s e n s e f o r a verb i s

checked f o r s e l e c t i o n a l p r e f b r e n c e . I f no arrangement of t h e avail-

able noun p h r a s e arguments 1s c o n s i s t e n t w i t h t h e s e preferences a n o t h e r word senqe i s examined But i f a l l word s e n s e e have been r e j e c t e d on

t h e basis of s e l e c t i o n a l information, t h e s e n t e n c e is n o t rejected

Instead w e look again a t the c a n d i d a t e word s e n s e s and count for ~ a c h one t h e number of q t e p s up the tzxonomic t r e e w e have t o make t o m r e s o l v e t h e c o n f l i c t . The word-sensf which requires t h e fewest s t e p s I s ~ h d s e n .

The hope is tHat t h e system will be a b l e t o "understand" s i m p l e metaphors t h i s way. I t would b e i n t e r e s t i n g t o t r y t o c r e a t e metaphors by picking noun phrase arpuments c l a w t o but: n o t u n d t r t h e nodcs I n d i c i t t d by the

(21)

d .

The

Amonm-PoZyeemy Problm

-

Cr$te&a

for

Separate

Entriee.

Words with t h e same

phyeical

shape b u t

dgffprent

meanings c o n s t a n t l y

cause t r o u b l e i n n a t u r a l language processing. I n designing a lexicon w e must decide whether o r not t o c r e a t e a separate e n t r y f o r each v a r i a t i o n

t

meaning aid type

of

use. Q u i l l i a n i s p a r t i c u l a r l y i n t e r e s t e d i n words w'ith m u l t i p l e meafiings and he experimented with tseveral i n h i s memory

model. I n Q u i l l i a n (1968) the word

ptmt

i s treated as a three-way homo-

nym with three separate ttrpe nodes, each with a s e p a r a t e d e f i n i t i o n - p l a n e :

PLANT Living s t r u c t u r e which i s n o t ah animal, f r e q u e n t l y with leaves, g e t t i n g i t s food from a i r ,

water,

earth.

PUNT2 Apparatus used f o r any process i n industry. W T 3 P u t (seed, p l a n t , etc.) i n e a x h f o r growth.

The type node f o r t h e first forms a d i s j u n c t i v e set w i t h token nodes pointing t o the other two

The word food has a s i n g l e d e f i n i t i o n w i t h a l t e r n a t i v e formulations: That which J i v i n g being has t o take i n t o keep i t

l i v i n g and f o r growth. Things f omllng meals,

e s p e c i a l l y o t h e r than dklnk.

A polysemous word l i k e t h i s has a s i n g l e type node and a s i n g l e d e f i n i t i o n - plane, b u t t h e two a l t e r n a t i v e d e f i n i t i o n s a r e combined with an OR l i n k .

r/

Apresyan,

Me1

'guk and ZolkovsQ a t tack t h e homonymy-polysemy problem

with v i g o r . Graphically coincident worda a r e considered homonyms, given

d i s t i n c t i v e s u p e r s c r i p t s and l i s t e d as s e p a r a t e e e n t r i e s , i f t h e i r d e f i n i t i o n s

V

(22)

n o t define "a common part," b u t t h e y do give an example.

KOCA

(scythe),

2 3

KOCA ( b r a i d of h a i r ) ,

KOCA

( s p i t ) . If two gefinitibns have a single

common p a r t , t h e word i s classified as polysemantic w i t h a single entry

divided i n t o s e p a r a t e p a r t s . They d i s t i n g u i s h two t y p e s of polysemy. Ifi

one case the difference between two words i s r e g u l a r .

The

r e l a t i o n of a verb t o i t s typical ob J ect i s such a regular meaning change, e. q. record (v)

-

r e c o r d ( n )

,

f ieh(v)

-

f

i s h ( n )

,

and aid(v)

-

a i d (n)

.

ghese regular

vari-

ations i n meaning a r e numbered with Arabic numerals, while irreguldr vari-

ations are numbered with Roman numerals. Thus part 3 o f t h e l e x i c a l e n t r y f o r

bm,

the d e f i n i t i o n , might have the form:

bod

XC 1. To bend t h e head i n assent o r r e v e ~ e n c e . (vt)

2. To submit o r y r e l d . ( v i )

3 . TO cause t o bend. ( v t )

a

4. An i n c l i n a c i b n of t h e head. (n)

5. A

bent amplement used t o propel an arrow

o r play a stringed mstrument. (n) I . 1. The forward part of a b o a t . (n)

2. One

who

rows in the bow of a boat. (n)

There seems t o b e some redundancy between d e f i n i t i o n - e l e m e n t s and the

lexical f u n c t i o n s . Shouldn't r e g u l a r v a r i a t i o n s i n meaning b e captured by fegular l e x i c a l fllnctions?

If

s o , then the distinction Apresyan,

u?

Zolkovsky and ~el'fuk make between regular and irregular meaning variations

w i l l be appareht from the form and need not h e i n d i c a t e d by d i f f e r e n t no-

tation, such as Arabic and Roman numerals.

For convenience i n lexical lookup we have a single physical e n t r y

(23)

is numbered s e p a r a t e l y w i t h Arabic numerals. Thus t h e a d ~ e c t i v e i s

coozl,

coo22 i$ t h e verb t o become

coozl,

a d c o d 3 i s t h e verb meaning t o cause

t o

become cooll. Separate information aboub. l e x i c a l r e l a t i ~ n s , etc. is

s t o r e d f o r each subentry.

e.

Xdzornb

Idloms present a s e r i o u s problem t o t h e designer of an English l e x i c o n Some c r i t e r l a must be e s t a b l i s h e d f o r deciding which idioms deserve s e p a r a t e

l e x i c a l erntries and how multi-word phrases should be s t o r e d .

When does an idiom deserve t o be t r e a t e d as a s e p a r a t e Lexical unlt? v

Apresyan, Zolkovsky, and

el'

Euk (1970) and Klparsky (1979) r e p r e s e n t

oppbsfte p o l e s of opinion here. I n t h e explanatory-combinatory dictionary

(ECF) of t h e Soviets word c o m b i n a t i ~ n s which have a d e f i n i t i o n of t h e i r own o r "a p e c u l l a r 'combinability p a t t e r n " have s e p a r a t e e n t r i e s . Kiparsky

(1975) c o n s i d e r s an idiom as a s e p a r a t e l e x i c a l u q i t only i f i t invoxves s y n t a c t i c patterns which a r e no longer productive. T h ~ s "house b e a u t i f u l "

and "come h e l l o r high water" are t r e a t e d be u n i t s , but "make headway" i s no't. Instead

headmy

i s defxned as "progress" and marked as appearing after

make'

and

rose.

U p a r s k y ' s proposal p l a c e s a greater burden on the recognition program which would have t o be a b l e t o r e t r i e v e and put to- g e t h e r t h e p i e c e s of t h e idiom using h i s l e x i c o n The system descrQed

here follows Apresyan, ~ o l k o v s k y and Me1 'Euk, and t r e a t s f i x e d phrases

as u n i t s . I n p a r t i c u l a r , a l l noun-noun combinations l i k e pzggg bank and

bz~thday

cake a r e s e p a r a t e l y defined, although t h l s is c e r t a i n l y a productive p a r t u o f English.

(24)

Levi

t h e underlying structure f o r "birthday boy" i s "boy-have-b5rthday"

and the underlying s truc t p r e f o r "Sir thday cake" is "cake-for-bir thday

.

tl

Then under c e r t a i n conditions

h e ,

for,

etc. can be d e l e t e d t o

give

us t h e noun adjunct expressions. Given these r u l e s , she argues, i t is not necessary t o treat these exptessions acCl s e p a r a t e lexical items.

While

her

rules

seem sufficient t o allow us t o syntheeiize these compounds c o r r e c t l y , difficlClJties a r i s e when we t r y t o use them for a n a l y s i s . The question-

answering system needs t o be able t o infer from "birthday boy" that t h e boy i n question is having a birthday, but t o avoid i n f e r r i n g from "birthday cake" t h a t t h e cake is having a b i r t h d a y . For c o r r e c t r e c o g n i t i o n w e need

t o b e a b l e t o recover t h e qnique underlying stl'ucture i f one exists. (For

a similar criticism see Downing 1977 : 814-15. )

Levi'

s t h e o r y accounts

for

the generation of new noun-noun compounds. However, i n order t o pcrount f o r the r e c o g n i t i o n process we need l e x i c a l entries for f a ~ p i l i a r f i x e d com- pounds and her theory t o analyze new compounds. We have used

Le+iVs

struc- ture as a basis for our r e p r e s e n t a t i o n of omp pound nouns.

Noun-noun colppounds have separate e n t r i e s . A

birthday

cake

is treated

as "a cake f o r a birthday." A ball gme is represented as "a game that has a ball". A piggy

bank

1 s deflned as "a bank t h a t i s a pj g. I t

The system i s t o l d that Jim has a piggy bank and asked what the bank looks l i k e . It could be argued t h a t anyone with s u f f i c i e n t c u l t u r a l know-

ledge ought t o be able t o answer t h i s even i f a l l t h e banks i n his p a s t

were shaped

like

bee-hives, b u t w e need a place t o write down this c u l t u r a l encyclopeaic knowledge and a l e x i c a l entry f o r

piggy

bank

seems l i k e a good

(25)

Becker i n

hie

wrk

on

"The Phrasal

Lexicon" (1975) has produced evidence

on

t h e Soviet s i d e of t h i s

argument,

H i s d a t a suggest t h a t

f

h e d phrases comprise approximately h a l f of our spoken o u t p u t and have an independent

lexical

existewe.

He includes i n h i s l e x i c o n euphemisms

("the o l d e s t profession")

,

p h r a s a l c o n s t r a i n t r ('8y

i"""

s h e e r coincidence"), d e i c t i c l o c u t i o n s ("for t h a t matter"), sentence b u i l d e r s (" (person A) gave (person B) a loag song and dance about

(a

topic)'!), s i t u a t i o n a l u t t e r a n c e s ("HOW can I ever repay yau?")

,

and verbatim t e x t s (proverbs, song t i t l e s ,

etc

.)

.

He claims t h a t

we speak mostly .by s t i t c h i n g t o g e t h e r bswatcbs of t e x t that w e have heard before; productive processes have t h e secondary r o l e of adapting t h e o l d phrases

t o t h e new situation....most u t t e r a n c e s a r e produced

i n

s t e ~ e k yped s o c i a l s i t u a t i o n s , where t h e c o m n i - c a t i v e and r i t u a l i s t i c functions of languagg demand not novelty, b u t r a t h e r an a p p r o p r i a t e combination of formulas, c l i c h e a , idioms, a l l u s i o n s , slogans.

.

.

(1975 : 60)

H e has c o l l e c t e d 25,000 phrases f o r t h e p h r a s a l lexicon.

Catherinq Flournoy (1975) has found s e v e r a l hundred f i x e d phrases i n d computer scudy of Father Coughlin's speeches.

This

i s

not a new

i d e a t o s t u d e n t s of o r a l epic poetry. Homer c o n s t a n t l y used f i x e d phrases t o f i t s y n t a c t i c and m e t r i c a l s l o t s . Dawn is always "rosy-tingered";

Hector is c o n s t a n t l y " t a i l Hector of t h e

shining

helm. I I

There

i s

a s e r i o u s space-time t r a d e o f f h e r e between p a r s i n g

time

and

lexical

storage space. It i s probably t r u e t h a t people possess and

c o n s t a n t l y u s e

a phrasal

lexicon. Whether we should

use

atorage

space

(26)

Currently we provide separate e n t r i e s f o r any phrase that we

cannot

parse and i n t e r p r e t c o r r e c t l y from t h e e n t r i e s f o r

individual

words. Briaf entries f o r these phrases seem a b s o l u t e l y necessaty f o r any prwtacal re- c o g n i t i o n scheme. These entries a l s o seem t o b e the a p p r o p r i a t e place for indexing p o i n t e r s t o t h e c u l t u r a l i n £ ormation necessary b r mWog

ifif

er- ences and answering question6 about b i r t h d a y cakes and b i r t h d a y parties.

There

a r e

theoretical

arguments f o r such

entries as

well.

W

e

believe,

as

Becker does, i n t h e p h r a s a l lexicon, although we do n o t Pnslude entries f o r any phrases t h a t can b e parsed and i n t e r p r e t e d correttly without a

s e p a r a t e entry. Any complete system

for

language proceestng must a l s o ,

o f course, contain r u l e s l i k e Levi's t o providg an a b i l i t y t o process

novel forms.

f

.

Prelhznary Design Deczsions

f o r

the

Lex{con.

The

goal of t h i s project is a lexicon suf f i c i e a t f o r p a r s i n g , forming

semantic r e p r e s e n t a t i o n s , and making inferences, c o q a c t b u t s t i l l a l I 0 ~ 4 . n g r a p i d l e x i c a l lookup.

The

lexicon is

a

data-base f o r the question-answering system,

a combination lexicon-encyclopedia

.

S y n t a c t i c aad semantdc i n f o bt i o n are combined i h t h e same l e x i c a l enrries. L e x i c a l semanttc r e p r e s e n t a t i o n s are m i t t e n i n the same form as t h e semantic r e p r e s e n t a t i o n s for sentenced,

i n a many-tiorted. f i r s t o r d e r p r e d i c a t e c a l c u l u s . Homographs w3ich vary i n meaning o r u s e a r e d i f f e r e n t i a t e d by Arabic numetal s u b s c r i p t s . Separate

entries are included f o r

phras

es w i t h fixed meaning.

(27)

The

relations are

used

t o express

and retrieve

plany d i f f

erem kzncls

informaticn,

from

p a s t

participles

t o s e l e c t i o n

preferences

t o proper habjtats

for

l i o n s .

Thus

the system of l e x i c a l relations is crucial t o

representation, retrieval, and inferance.

3 .

s O M ~

THEORIES OF LEItgcat

RELATIONS

While developing

our

lexical relations we examined a variety of

relational theories

in

anthropology and lingui st ice

and

even

c o l l e c t e d

folk

defilpitions of our own (Evens

1975)

.

We have been parrlcularly

influenced by the anthropological fieldwork of Casagrande and

Hale

(1967)

by

the

memory models of Raphael (1968) and Werner (19741, and most of all

by the

ECD of

Apresyan, Me1 cuk,

and

Zolkovsky (1970). But we looked a t

each

of

these relational theories

from

t h e peaaliar point crf view o f

corn

put er ques

tion-answering

and

the

pal-titular

lexical

environment of

c h i l -

dren's stories, adding and discarding relations t o fit the problem.

Cdsagmnde

mrd

Bale

-

L&caZ

RejSations

in

FoZk

Definitions.

Casagrande and

Hale

(1967) collected 800 Papago folk-defini*tions

and sorted them i n t o groups on the basis b f semantic and grammatical

s i m i l a r i t i e s . They produced the following list of thirteen lexical re-

l a t i o n s . (Table 1)

Table

1.

The

Relatims

of Casagrande and Ha3e

ReZatiofi

Word

EngZish

0208s

of

Papagu

Cefinitiow

1,

Attributive

burrming

~ I Z

but they are small; and they act l i k e mice; they l i v e i n holes.

2.

Contingency

t o

get

a g r y

When we

do not

like

something

we

get

angry.

Figure

Figure 1: Werner's Ansyer t o  the Selection Problem
Table 2. TA33LE OF LEXICAL S'EMANTIC RELATIONS*
Table 3 Classification of Main Verbs in Predicate Complement Constructions (adaptled from Joshl and Weischedel 1973)

References

Related documents