America Journal
of
Com~rttational
Lianuistics
Microfiche 83A
L E X I C O N
FOR A COMPUTER
Q U E S T I O N - A N S W ~ R I
N G
SYSTEMDeparement of Computer Science Depar~ment
o f
Linguistics
I l l i n o i s I n s t i t u t e of Technology Northwestern U n i v e r s i l yThis i s an expanded v e r s i o n of a paper presented a t t h e F i f t e e n t h Annual Meeting of
the Assooiation
f o r Computational L i n g u i s t i c s , March1 6 ,
1 9 7 7 . Accepted f o r p u b l i c a t i o n S e p t e m b e r 3 0 , 1 9 7 8 .Copyright
@
1979SlJMMARY OF
A 'L;EXICON FOR A COMPUTER Q U F
ST!!
ON-AHSWCRING SYSTEMAn integral part of any natural language understanding sys tern,
but one which has received very l i t t l e a t t e n t i o n i n application, i s
the lexicon. It i s needed during the parsing of the input text for
making inferences, and for generating language output or performing
some a c t i o n . This paper d i s c u s s e s Lhe principal questions concern-
ing the lexicon as i t relates i n particular t o a question-answering system and proposes a specific type of lexicon to f u l f i l l the needs o f t h i s system.
R a t h e r than make a d i s t i n c t i o n between dictionary and encyclo- pedia, w e have a s i n g l e global data base which we c a l l the lexicon.
Homographs are d i f f e r e n t i a t e d and phrases with fixed meanings are
treated as separate entries. A l l the information i n this l e x i c o n i s encoded i n the form O Z relations and
words
or word senses. Theseform a large network w i t h the words as nodes and the relations as
edges. In a d d i t i o n the relations define semsntic f i e l d s and these are used t o treat problems of ambiguity. Relations are use6 to en-
code both syntactic and s m n t i c information. Axiom schemes are associated with each relation and these are used f o r inferencing.
The l e x i c a l r e l a t i o n s then are a t the heart (or brain) o f the system for representation, r e t r i e v a l , and inferencfng.
been
in£
luenced bythe
work
of 4presfan, Me1cuk,
and Z o l k o v s ~ y.
Thelexical r e l a t i o n s we
have
p o s i t e d arethe
t r a d i t i o n a l svnonvmy andantonymy, taxonomy, part whole, gradfig and approximately f o r t y o t h e r s .
The
wholeset,
deliberately
l e f t open ended, is subdiudded i n t o n i n ~subsets which include attribute
relatibns,
coLl~cational
r e l a t i o n sand paradigma ti* onEs.
Each
relation has
its ownlexical
entry givhng its propertiesand
t e l l i n g how to interpretlexical
r e l a t i o n s h i p si n
a f i r s t orderpredicate calculus form. lor example, the
tnformatfon
for the lext- cal entrydog
gncludes t h e statementdog
T
animal,
that is, t h a t adog
is
a kind ofmim~z.
The
l e x i c a lenfry
f o rT,
t h e taxonomic re- lation, i n its turnincludes
infprmatic,
which allows
the
statementto
be interpreted
asHoZda(Ncom(dog,X))
-
RoZds(lYcorn(anima1,X)).The
inventory
of r e l a t i o n s i sexpandable
s i m p l y by adding l e x i c a l entriesfor new
r e l a t i o n s .In
a d d i t i o n having boththe
l e x i c a len-
t r i e s and the relations in the entrdes expressed in t h e same nota-
t i o n a l form as t h a t of input sentences, namely in a Eirst order predicate
calculus
notation, allows for a c o n s i s t e n t , coherent, andT A B U OF CQNTENTS
. .
m . . . . . . . . . . . . . . . . . . . . 1, In t r o d u ~ t ion. . .
2. Design Decfsiohsa . The Dictionary and the Encyclopedia
-
One Data Base. . .
or TWO# .
b . Lexical Models
-
Componential ~ e a t u k e Analysis v s .. . .
Relational Networksc . S e l e c t i o n Preference
. . .
d . The Homonymy
-
Polysemy Problem-
Criteria forSeparateEntries
. . .
- - -.
e . Idioms
f. Preliminary Design Decisions f o r the Lexicon
. . .
3 . Some Theories of Lexical Relations. . .
4 . The Set of Lexical Relations
. . .
a .me
C l a s s i c a l Relations: Taxonmy and Synonymy. . .
3 . Antonymy *. . .
. . .
C . G r a d i n ge m Parts and Wholes
. . .
. . .
f . Typical Case Relations
. . .
. . .
g . Other Callocation Relations
h . Paradigmatic Ke1;1tions
. . .
j
.
I n f l e c t i o n a l Relations. . . * . . .
5. The Organization of the Lexicon and the Semantic
Representations
. . .
6 . Tha rorm of the L e x i c a l Entry
. . .
. . .
7. Sunnary
. . .
A p p e n d i x I. The Semantic Representations
I. INTRODUCTION
The l e x i c o n p r e s e n t e d h e r e i s b e i n g developed a s a n i n t e g r a l p a r t
of a computer question-answering system which answers m u l t i p l e - c h o i c e q u e s t i a n s about simple c h i l d r e n ' s s t o r i e s , It t h u s must make informa- t i o n r e a d i l y a v a i l a b l e f o r t h e p a r s i n g p r o c e s s , f o r b u i l d i n g an i n t e r n a l
nodel O F t h e s t o r y b e i n g r e a d , and f o r making i n f e r e n c e s . Knovledge
nbout words and knowledge about t h e world must both b e s t o r e d i n a com-
pact b u t mediately a c c e s s i b l e form.
Many d e c i s i o n s must b e made, t h e r e f o r e , about t h e d e s i g n of t h e l e x i c o n . The f i r s t problem i s t o d e c i d e on a n o r g a n i z i n g s t r u c t u r e ,
Should Lexical and " e n c y c l o p e d ~ c " i n f o r m a t i o n b e s t o r e d separately o r t o g e t h e r ? Which ltems w i l l have s e p a r a t e l e x i k 3 l entries? Which w i l l be included i n o t h e r e n t r i e s ? What a b o u t homonymy and polysemy? What
connecting l i n k s between words and word s e n s e s w i l l b e recorded and how?
The n e x t problem i s t o determine a c h a r a c t e r i z a t i o n of word mean-
i n g s . This l e a d s t o some d e e p t h e o r e t r c a l q u e s t r o h s What kind of
l e x i c a l semantic r e p r e s e n t a t i o n s are appropriate? What i s t h e structure o t t h e s e r e p r e s e f i t a t i o n s ? What are t h e semantic primes, t h e elements of t h a t s t r u c t u r e ? The d e s i g n of the l e x i c a l e n t r y i s t h u s s u b j e c t t o theo- r e t i c a l b i a s e s a s w e l l as the p r a c t i c a s c o n s t r a j n t s of space, r e t r f e v a l e f f k i e n c y , and e f f e c t i v e s u p p o r t of rnference-making.
The d e c i s i o n t o u s e l e x i c a l r e l a t i o n s as fundamental elements of
informatgon makes up.a s i g n i f k a n t part o f , t h e lexical e n t r y ,
Lexical r e l a t i o n s offer s i g n i f i c a n t advantages. They aS.Low us to generalize f a m i l i a r gnference patterns into axiom schemes. They can en- capsulate the defining formulae of the commercial d i c t i d n a r y . They have an iritwitive a p p e a l which w e b e l i e v e reflects a certain measure of psy- c h o l o g i c a l r e a l i t y . On a p r a c t i c a l l e v e l they a l l b w ~ t l s t o express both syntactic and semantic informa-tion i n a form that i s compact and eaay t o r e t r i e v e . They can be used i n many ways. For example, the following paragraph from a test administered t o f i r s t and second g r a d e r s - b y
,
l o c a l school system says:(Pl) Ted has a puppy. H i s name i s Happy. T e d and Happy l i k e to play.
(Ql) The p e t i s a: dog boy toy
In order t o answer t h i s question w e need to know what p& means. In our Lexicon the lexical entry for pet contains a simple definition. a pet i s an animal that is owned by a human. In order t o answer t h i s question w e also need t o know that a puppy is a young dog. 'This information in pre- dicate c a l c u l u s form would b e p a r t o f the lexical e n t r y f o r puxpy. W e would, of course, need axioms o f t h e same form as w ' e l l f o r the entrqies
for
kitten, jamb, etc. Instead of such a representation w e express t h i s ;information b y uslnp, a l exical r e l a t i o n , CIIILD. The Lexical entry for puppy contdins CHILD Jog. S i m i l a r l y , the lexjcal entry for kitten con- tains CHILD cut; w h i l e the lexical entry for CHILD contains the axio~tl scheme from which t h e relevant axioms are formed when needed.We treat verb& in a s i m i l a r way. Corresponding to each case re- Lation there is a lexical relation which points to t y p f c a l . f i l l e r s of
also i h c l u d e s ~
T
make where T 1s t h e well-known taxonomy r e l a t i o n , so t h a t i f t h e s t o r y says t h a t "Mother baked a cake " w e can1 in t e r t h a t s h e inadeone add CAUSE
bakel
s o t h a t w e car deduce that t h e cake has baked he s e l e c t Lon r g s t r i c t i o n s t h a t h e l p u s t e l l i n s t a n c e s sf bakel and bdke2 apn'rt can a l s o be expressed compactly u s i n g t h e T r e l a t i o n . W a l s o e need t o make d e d u c t i ~ n s from main verbs i n p r e d i c a t e complemhnt con- s t r u c t i o n s , deduetions such ag t h e s p e a k e r ' s view of t h e t r u t h of t h eproposition stated i n the complement a s d e r i v e d from t h e f a c t i v i t y of t h e verb. I n o r d e r t o answer s e v e r a l q u e s t i o n s from t h e t e s t c i t e d
above the r e a d e r must infer t h a t everything t h a t Mother says i s t r u e . ~ e x i c a l e n t r i e s f o r main verbs t h a t t a k e p r e d i c a t e complements contakn p o i n t e r s t o t h e i m p l i c a t i o n c l g s s . These r e l a t i o h s can then be expanded t o g i v e t h e proper axioms.
The l e x i c o n i n c l u d e s s e p a r a t e e n t r i e s f o r each derived form u n l e s s
the root can be i d e n t i f i e d by a simple suffix-chopp-tng r o u t i n e . L e x i c a l r e l a t i o n s are u s e f u l , h e r e too i n saving space. The l e x i c a l e n t r y f o r
man
c o n t a i n s PLURAL men. The 1ex;bcal e n t r y f o rwent
c o n s i s t s of PAST go, The lexical entry f o r death consJ.sts of NOMV die. There are, as w e l l ,l e x i c a l e n t r i e s f o r some m u l t i p l e word expressions such a s b i r t h d a y par ~IJ,
baZZ g m , piggy bank, and thank you.
As t o t h e form of p r e s e n t a t i o n h e r e , t h e next s e c t i o n p r e s e n t s some
of t h e p r a c t i c a l problems and t h e o r e t i c a l c o n v i c t i o n s which determined our most c r i t i c a l design d e c i s i o n s . Then, a f t e r a b r i e f d e s c r i p t i o n of some
e a r l i e r developments i n t h e theory of l e x i c a l r e l a t i o n s , we e x p l a i n t h e system of r e l a t i o n s which s t r u c t u r e o u r l e x i c o n , d i s c u s s i n g each group of
8
2. DESIGN ~JECISIONS
The l e x i c b n i n t h i s system must make i n f o r m a t i o n r e a d i l y available
f o r p a r s i n g , f o r b u i l d i n g t h e s t o r y model, and for making i n f e r e n c e s
d u r i n g q u e s t i o n answerivg
.
So t h knowledge about words and knowledge aboutthe world must b e s t o r e d i n a compact b u t immedrlately a c c e s s i b l e form. T h e r e f o r e , many d e c i s i o n s must b e made concerning t h e d e s i g n of t h e lexi-
con The problems involved i n c l u d ~ t h e a s g a n i z a t i a n of l e x i c a l and ency-
c l o p e d i c i.nformation, the c h o t c e of a l e x d c a l model and t h e determ3nafion of a p p r o p r i a t e s e r a n t i c primes, t h e r e p r e s e n t a t i o n of s e l e c t i o n prefer-
e n c e s , t h e r e c o g n i t i o n and s t o r a g e of homographs, and t h e criteria f o r
e s t a b l i s h i n g s e p a r a t e e n t r i e s f o r idioms and o t h e r f i x e d phrases. Th i s
paper a t t e m p t s t o develop some c o n s x s t e n t s o l u t i o n s t o these problems,
sol u t l o n s which determine t h e d e s i g n d e c i s i o n s f o r t h e l e x i c o n i n t h i s ques t ~ o n - - r n s a erinp s y s t e m .
a. 1'he dze-tzonary a d the elzcgctoy~dza
-
one data-base o r two7Any question-answering system must uBe l e x i c a l i n f o r m a t i o n i n a t
least two ways, In p a r s i n g and i n making i n f e r e n c e s . The f l r s t c r i t l -
c a l dec5s50n t h a t must b e made 9s whether tm s e p a r a t e data-bades a r e needed t o s u p p o r t t h e s e sepdrate f u n c t i o n s o r whether a s i n g l e u n i f i e d
~ l o b a l datq-base i s better. T r a d i t i o n a l l y human b e j n f s have used two
separate stores of inf orma tlon, the d i c t i o q a r y and t h e encycl o p e d i a .
Some linguistic and c o m p u t a t i o n a l mode1.s of language have also been
based on t h e assumption t h z t i n f o r m a t i o n about words should b e s t o r e d i n t w o separate co1 l e c t i a n s .
Ln Chomsky's f l ~ p p ~ t o m o d e l (1965) t h e r e a r e two separate storage
p l z c c .; for lexical Informztf o n , one in t h e base component 2nd a n o t h e r i n
9
'
dictionary" and semantic inf o m a t i o n i n an'
encyclopedia" Winograd(1971) has two separate word lists, one used by the parser and one by the semantic routines, even though the parslng and the semantic routgnes are very closely interwoven in his BLOCKS s y s t e m
Before deciding on whether to carry on this tradition one must ask
whether there is r e a l l y a clearcut d ~ s t i n c t i - o n between these two klnds
of lexical information Is there a simple algorltnm f o r deciding which
data should go where?
B z e m ,
Bzemzsch,
m e f e r ,
and
t h eSmantzc
Funct
ton
of
the
Lpxzcon
Both the dictionary and the encyclopedia are ways o f recording i n - formation stored i n human memory But human memory n s probably not orga- nized i n the usual graphic form of an alphabetic word l i s t , therefore alternative memory s-tructures should be examlned One such a l t e r n a t ~ v e
has been p r e s e n t e d by Bierman (1964) In h i s system lexical-semantic f i e l d s &re prlmary, they define the b a s i c o r g a n i z a t i o n o f semantic In- formation The function o f the lexicon,
if
it has one4x1
the semantic domain, i s t o index thebe fields, t o store pointers t othe
l o c a t i o n ofa word In t h e varlous f i e l d s c o n t a i n i n g it An appropr:iate image f o r such a system is a very large single page d i c t i o n a z y with Language speci- f l c nodes connected by semantic relations (See a l s o Werner
1969)
Can the dictionary and the encyclopedia be d i s t m g u i s h e d i n this
context? Bierwisch and Kiefer (1970) assume that b o t h kinds of informa- t i o n are contained i n the same lexical entry The distinction between
The
core
o f , a l e x i c a l reading c o p p r i s e s a l l and only t h o s e semantic s p e c i f i c a t i o n s t h a t determine, roughlyspeaking, i t s p l a c e w i t h i n t h b system of d i c t i o n a r y e n t r i e s , i. e. d e l j m i t i t from o t h e r (non-synonymous') e n t r i e s . The
periphez-8
c o n e l s t s of t h o s e m a n t i c s p e c i f i c a t i o n s whikh could be removed fromi t s
r e a a i n g withour changing i t s r e l a t i o n t c o t h e r l e x i c a l r e a d i n g s w i t h i n t h e same grammar (2bid: 69-70)Unfortunately they do n o t s p e q i f y whether t h e lexical-gemanttc r e l a t i o n s
which form t h e s t r u c t u r e of t h e f i e l d s are p a r t of t h e c o r e o r t h e p e r i -
The major d i f f i c u l t y with this crikerion i s i t s i n s t a b i l i t y . A s new e n t r i e s a r e added t o t h e system, information s u f f i c i e n t t o O i s t i n g u i s h one
e n r r y from another may have go b e s h i f t e d from t h e periphery t o t h e c o r e
--and t h u s from t h e encyclopedia t o t h e lexicop. For i n s t a n c e , s b ~ p o s e a
new e n t r y , "2eopard--a l a r g e , w i l d c a t " i s t o be added. The e n t i r e l e x i c o n must be ~ e a r ' c h e d f o r e n t r i e s which mention large w i l d c a s s . I f one i s
found, s a y "lion--a l a r g e wlld c a t " , then enough information must b e added
t o both d e f i n i t i o n s t o d i f f e r e n t i a t e
leopard
and z w f l from each other.Soviet Lexicography and the Lmical Universe.
Apresyan, iolkovsky, and el
'
Zuk run i n t o t h e same d i f f i c u l t y of d i s - t i n g u i s h i n g d i c t i o n a r y and encyclopedic $nformation i n a t t e m p t i n g t o d e f i n et h e l e x i c a l u n i v e r s e of a word C 0'
The main themes d e a l t w i t h under t h e heading ' l e x i c a l u n i v e r s e ' a r e : 1) t h e t y p e s of C o ; 2 ) t h e main p a r t s o r phases of Co; 3) t y p i c a l s i t u a t i o n s o c c u r r i n g % e f o r e o r a f t e r Co etc. Thus, t h e s e c t i o n l e x i c a l u n i v e r s e f o r t h e ward
skis
consis-ts of a l i a t of t h e t y p e s of s k i s(racing,
mountain,
jumping,hunting),
t h e i r main p a r t s ( s k i s prdper andbindings),
t h e main o b j e c t s and a c t i o n s n e c e s s a r y 2o.rt h e c o r r e c t u s e ( e x p l o i t a t i b n ) of s k i s ( s t i c k s , grease, $0
WLZE), t h e main types of a c t i J i t i e s connected w i t h skis (a
information about
skis
( t h e i r h i s t o r y , t h e waythey are manufactured, etc.) i s n o t supplied hgre: t h e s e c t i o n s contain only such words and phrases as are necessary f o r t a l k i n g on t h e t o p i c , and nothing e l s e . (1970:19,)
The problem here
is
that"what
isneeded
f o r t a l k i n g about t h e toqic" hepends very much on who i s going t o do t h e t a l k i n g . The d e f i n i t i o n ofslEi
i nWebster
'sNm
tntemationaZ
(2nd Edition) begins :One of a p a i r of narrow s t r i p s of wood, metal, o r p l a s t i c , u s u a l l y i n conibinatian, bound one on each f o o t and used fox g l i d i n g over a snow-covered surface.
Apresyan, iolkovsky, and
Me1
cuk do n o t provide f o r three of the items, mentioned here: what s k i s a r e made of (wood, p l a s t i c , or m e t a l ) ,what
shape they come i n (long and narrow) and where they belong s p a t i a l l y
(on t h e f e e t ) . Yet t h e s e items could b e e s s e n t i a l i n understanding in- ferences i n a s t o r y .
It was snowing, Jim took o u t h i s skis. H e waxed t h e wooden s t r i p s . . . .
You could need t h i s information i n answering questions. Jim s k i d down t h e mountain....
What was he wearing on h i s f e e t : s l i p p e r s s k i s skates?
Although
i n English o r Russian i t i s p o s s i b l e t o refer t o s k i s without knowing t h a t t h e y a r e long and narrow, i t i s n o t p o s s i b l e i n Navajo where physical shapes determineverb
forme. While t h e entry i n Webster's goeson a t l e n g t h beyond t h e sentence given above, i t does not i n c l u d e a l l
V
the i t e m s which Apresyan, Zolkovsky, dhd Mel'guk mention. This, however, is n o t surprising; t h e boundaries o f t h e l e x i c a l universe a r e not w e l l
Difficulties
in
updating a system
with
separate dictionary and encyclopedia,
This lack of definition cahses tremepdous problem in a dynamic system.
A "real" d i c t i o n a r y
ar
encyclopedia, t h e one i n a p e r s o n ' s b r a i n , isc o n s t a n t l y changing. I n f o m a t i o n i s added, c o r r e c t e d , apd perhaps l o s t .
A truly i n t e r e s t i n g memorymodel must be dynamic. T h e problems of u p d a t i n g t h i s information are n o t easy t o s o l v e , t h e problem of d i s t i n g u i s h i n g be- tween d i c t i o n a r y and encyclopedic i n f ~ m i a t i o n i n t h e updating p r o c e s s
seems i n s u p e r a b l e ,
Recognizing d e f i n i t i o n s phrased i n o r d i n m y E n g l i s h i s a l r e a d y , d i f f i
cult (Bferwisch and R i e f e r 1970, Lawler 1972). D e t e r m h i n g t h e r e l i a b i l i t y of such f n f o r n a t i o n i s a l s o a problem and t h e dichotomy of d i c t i o n a r y and
eacyclop&dia i n c r e a s e s t h i s d i f f i c u l t y . U n f o r t u n a t e l y i n f ~ r m a t i o n does
pot cpme n e a t l y packaged and marked " f o r t h e dict30nary1' o r " f o r
the
en-c y c l o p e d ~ a " . And a d d i t i o n of ififormation t o one p a r t of the e n t r y may n e c e s s i t a t e updating ' o t h e r p a r t s of th-e e n t r y . For example, i f w e l e a r n
t h a t record is a v e r b as well as a noun w e need t o add morphological i n - f o r m a t i o n , l e s c r i b e the r e l a t i s o n s between record and write, and we should #
probably describe r e c o r d i n g materials. Mention must be made t h a t
record
i s a f a c t i t v , i . e , i f someone r e c o r d s t h a t something happened, one canassume t h a t from t h e s t a n d p o i n t o f t h e speaker t h e something r e a l l y d i d happen. W h i ~ h of t h i s i n f o r m a t i o n i s d i c t i o n a ~ y i n f o r m a t i o n and which i s
encyclopedicf And once this d e c i s i o n i s made, i n f o r m a t i o n added to t h a t e n t r y may r e q u i r e a d d i t i o n s t o o t h e r e n t r i e s I n t h e record example, t h e e n t r i e s for ebase and write would have t o b e updated. Also, a d e c i s i o n
The w r k of Kiparsky and Kiparsky (1970)
,
Lakof f (1971) and KcCdwley,(1968) h a s shown t h a t s y n t a x and semantics cannot be s e p a r a t e d i n t o such
n e a t compartments. But i f s y n t a x and semantics a r e interwoven then does
i t make sense t o put s y n t a c t i c information i n one box and semadtic informa- t i o r i n a n o t h e r ? The enswer t o this questgon given a t l e a s t by g e n e r a t i v e semantics c a l l s i n t o q u e s t i o n t h e t r a d i t i o n a l distinction between t h e d i c -
e".&
t i o n a r y and t h e encyclopedia.
We accept the generative semantikist arguments that syntax a d seman-
t i c s cannot be s e p a r a t e d and t h u s do n u t s e p a r a t e s y n t a c t i c and semantic
i n b r m a t i o n . Furthermore, as shown above t h e r e seem t o be no p r a c t i c a l cr,iteria f o r d i g t i n g u i s h i n g d i c t i o n a r y i n f ~ r m a t i o n from encyclopedic in-
formation.- Thus our system h a s one s i n g l s g l o b a l d a t a base. For b r e v i t y
and' s i n c e i t i s a kind of v l l e c t i o n of words, i t w i l l be c a l l e d " t h e lexicon".
b.
Lex{eaZ
Models-
ComponentiaZFeature
AnaZysis vs.ReZationaZ
Netr~orFs.A second c r i t i c a l l y important d e c i s i o n i n v o l v e s t h e c h o i c e of an
a p p r o p r i a t e l e x i c a l model, t h e determination of what semantic primes t o
u s e and how t h e y should b e combined i n l e x i c a l s e n a n t i c s t r u c t u r e s . Two
importarit competing models a're provided by componential f e a t u r e a n a l y s i s and by r e l a t i o n a l networks. I n a componential a n a l y s i s model the p r i m e s
are semantic features and words are deffried by bundles of features. This is a n a t u r a l e x t e n s i o n of t h e d j ~ i n c t i v e feature approach t o phoneme
d e s c r i p t i o n which h $ ~ ; been used to e x p l a i n many phonological phenomena. C e r t a i n p r a c t i c a l problems a r i s e . The number of words i n any 1anguage'Fs
f a r l a r g e r than t h e number of phonemes. The number of d i s t i n c t i v e fea-
14
the phoneme-phonetic f e a t u r e natrix. I n a d d i t i o n , t h i s matrlx woyld b e extremely s p a r s e , Also, i t $8 n o t clear Qhether a l l t h e e n t r i e s i h t h i s
malcr* cd'uld b e
+/-
as i n a phonemem a r i x .
Axe semantic f e a t u r e s e i t h e r d e f i n i t e l y a b s e n t o r b f i n i t e l g p r e s e n t d r a r e some f e a t u r e s p r e s e n t byd e g r e e s ? The s i z e of the c o m p ~ n e n t i a l a n a l y s i s matrix would imfnediatelg i h t r o d u c e d i f f i c u l t i e s i n a computerized model. F o r t u n a t e l y , b o t h numeric
c31 a n a l y s i s and document r e t r i q v a l o f f e t experience i n h a n d l i n g immense
matrices by machipe. When a s i x i s extremely s p a r s e i t t u t n a o u t t p b e s e n s i b l e t o s t o r e a l i s t of e n t l r r e w i t h SOW and wlumn numbers Here i t
would mean s t o r i n g a l i s t of f e a t u r e s f o r each word. This. i n f a c t , i s c l o s e t o K a t z ' s p r o p o s a l (1966).
Ip a r e l a t z a n a l network model, however, t h e primes are r e l a t i o n s and
words o r word s e n s e s . R e l a t i o n s connect w'osds t o g e t h e r i n a network i n
wnlch the woPds a r e nodes and t h e r e l a t i o n s a r e edges. In f a c t , wo.rds,are defined i n terms of t h e i r r e l a t i o n s h i p s t o o t h e r words.
These *models d i f f e r r a d i c a l l y i n t h e i r approach t o t h e c r i t i c a l Lexi-
c a l t a s k p f f i n d i n g r e l a t e d words. I n t h e componential a n a l y s i s model re-
l a t e d words s h a r e r e l a t e d f e a t u r e s . Prl~.;umably, t h e more f e a t u r e s two
words s h a r e t h e more c l o s e l y r e l a t e d khey a r e . Thus, some kind of c l u s t e r
a n a l y s i s must b e u s e d t o i d e n t i f y r e l a t e d worgs. I n t h e r e l a t i o n a l network
model t h e l e x i c o n i s Eormed from 2eTatfionships between words. Thus relatl ed
words a r e immedzately a v a i l a b l e
Both models, componential and r e l a t i o n a l , r e q u i r e a s e a r c h f o r semanric primes The componential a n a l y s i s model r e q u i r e s t h e d i s c o v e r y of p o s s i b l y
thousaads 0% semantic f e a t u r e s
.
For a r e l a t i o n a l network m o b 1 an inventc ryof l e x i c a l relatAons and theil; p r o p e r t i e s must b e developed. T h i s i s apparent-ly
w
number of rekvant r e l a t i o n s i s probably q u i t e small. Zolkovsky and ~ e l ' & k (1970) list about f i f t y i n t h e i r paper.
R e l a t e d t o b o t h of t h e s e models i s t h e n o t i o n of semant5c f i e l d s . I n t u i t i v e l y , semantlc f i e l d s are c o l l e c t i o n s of r e l a t e d words used t o t a l k about a p a r t i c u l a r s u b j e c t . Semantic f i e l d s seem-to o f f e r some
h e i p i n coping w i t h t h e problems of ambiguity and c o n t e x t . Many u t t e r -
ances, t a k e n o u t of c o n t e x t , are ambiguous. But remarhlQy, p e o p l e almost n e v e r p e r c e i v e t h i s ambf g u i t g
.
They immediately choosethe
c o r r e c t word s e n s e and i g n o r e t h e o t h e r s . ~ ~ ~ a r e i t l ~ the t o p l c of con-
v e r s a t i o n deterrdines a s'emantic f i e l d and t h e word s e n s e chosen i s t h e one which lj e s i n t h i s f i e l d . The semantic f i e l d pomehow d e f i n e s t h e
I I
v e r b a l c o n t e x t . ( O r as F i l l m o r e 1977 :59 p h r a s e s i t meanings are re-
l a t l v i z e d t o scenes". )
The componential a n a l y s i s model makes i t p o s s i b l e t o d e f i n e d i s -
t i n c t s e m a n t i c f l e l d s , b u t g e t t i n g ,from one word nn t h e f f e l d t o t h e o t h e r s
may take a sigrrifxcant amount of processing time. b e r y s e t o f semantic
markers c a n be used t o d e f i n e a semantic f l e l d ; t h e f i e l & c o n s i s t s of a l l t h e words wldh d e f i n i t i o n s c o n t a i n i n g the markers. The smaller t h e number
of,markers t h e l a r g e r t h e f i e l d o b t a i n e d . I t i s p o s s i b l e t o d e c i d e i m e -
d i a t k l y whether aogfven word i s i n t h e f i e l d o r n o t , j u s t by checking i t s
l i s t of markers.
I n t h e r e l a t i o n a l network model r ~ l a t e d words a r e e a s y t o f i n d , b u t the b o u n d a r i e s of semantic f i e l d 5 are e x t r e m e l y f u z z y and i n d i s t i n c t . A semantic f i e l d can be d e f i n e d by s t a r t i n g a t a p a r t i c u l a r node and going
a given number of s t e p s i n any d i r e c t i o n . The semantic f i e l d s o b t a i n e d
Certain b a s h philosophical -psycho1 o g i c a l assumptions may create
a strong bias f o r one o f these models over the other. Someone
who
b e l i e v e s that semantic
features exist
as Platonic idealsor
who acceptsthem as psychological r e a l i t i e s may easily find componential a n a l y s i s a most natural kind of description and regard the necessary search for features or sememes as highly relevant. Someone who Feels that "There
i s no thought without words" would be much mre
likely
to prefer a re-lational network description. A lexicon 2 8 , in an important sense, a memory model. Intuitio'i abbut our own internal memory models must have a strong influence on the lexicon w e s h a ~ e .
We have chosen a r e l a t i o n a l network model for both i n t f i i t i v e and
practical reasons. We find lexical-semantic r e l a t i o n s theoretically
i n t e r e s t i n g (see Evens et
al.
ms), Useful mventorles of thesd re-lations are available, i n a l a t e r s e c t i o n we describe some of these
sources. As will b e show they proviae a
convenient
way ofs t o r i n g
axiom schemes far deductive inference.A s l e x i c a l semantic structures @e use t h e same first-order pre-
dicate calculus notation in
which
semantic representations a r e written in the que9tion-answering system-
meanings of words and meanings ofs e n t e n c ~ s must have t h e same underlyjng form As McCawley (1970) h a s
argued " d e n y i s t " and "doctor who creats teeth" must contain the same u n i t s o f mehnlng t i e d together i n the same %ray
c "'e Zectzqn Preferences
A third important problem to be faced in constructing a l e x i c o n
Chomsky (1965) developed the theory of s e l e c t i o n r e s t r i c t i o n s
i n
order t o block t h e generation of nonsense sentences i n t h e s y n t a c t i c com-ponent of h i s model, The l e x i c a l e n t r y f o r
frzghten,
f o r examae,contains the information that i t r e q u i r e s as object a noun with the featur [+animate]
,
while drznk r e q u l r e s an animate s u b j e c t.
I f t h e s e c o n d i t i o n s a r e n o t met, generation i s blocked. S e l e c t i o n a l r e s t r i c t r i o n s seem muchtoo r e s t r i c t i v e . Traller trucks drink diesel f u e l and the
earth
d r i n k s i n the r a i n .I n
d e s c r i b i n g dreams w e can invent p e r f e c t l y a p p r o p r i a t e sentences i n which inanimate o b j e c t s by t h e dozens g e t up, run around,and d r i n k u n t j l frightened back t o place. S t i l l i t is %rue t h a t sen- tences l i k e t h e s e are somehow more s u r p r i s i n g than sentences i n which
cows d r i n k from a brook and a r e frightened by l i g h t h i n g . We need some method ~f recording t h e ordinary, everyday ways i n which words combine
w ~ t h o u t excluding t h e unusual, t h e p o e t i c , t h e nletaphoric uses, We
k i l l c a l l them
seZectzonprefe~ences
i n s t e a d of s e l e c t i o n restrictions.Some truly semantic means of i d e n t i f y i n g semantic anomalies a r e needed. Raphael mentions t h i s question r a t h e r c a s u a l l y , almost as an a s i d e i n t h e SIR paper. He draws taxonomic t r e e s , one f o r t h e nouns and one f o r t h e verbs, from t h e vocabulary of a first grade reader.
Then he makes statements l i k e t h i s
1. Any noun below node 1 i s a s u i t a b l e subject f o r any verb below node 1'.
2 . Only nouns below nodes 3 o r
4
may b e subjects f o r verbe below node 3'. (1968, p. 51)The
complete model cohlposed of tree structures
and statements abodt their possibleconnectionp,
is
a representation f o r the class o f a l l pos-s i b l e events. In other words, i t represents the computer's knowledge of the world. We
now
have a mechanism f o r t e s t i n g the'
coherence'or
'meaningfulness' of new samples of t e x t . (1968, p .
51)
Werner (1972) has suggested a method for handling the s e l e c t l o n a l prob-
l e m which uses noun taxonomies i n very much the same way that Raphael does.
H i s proposal includes an elegant way of storing s e l e ~ t i o n a l information
within h i s memory model. In
his
network model, noun phrase arguments areconnected t o the
verb
by prepositions. The node representing the lexical entry f o r the verbhas
arcs connecting i t to compound nodes, one for eachp r e p w i t i o n which can b e used with the verb. The o b j e c t of each prepo- s i t i o n i s a node i n the noun taxonomy. This noun or any noun below ft
[image:18.744.135.614.578.739.2]i n the taxonomy may serve as an argument f o r the verb. Hem i s an over- simplified example o f a network for saZ2.
Figure
1:
Werner's Ansyer t o the Selection ProblemThis network says that oeZZ takes a human subject, a t h i n g as
o b j e c t , the preposit-n to followed by a human, t h e preposition
Jbr
followed by money. The square brackets around [human] indicate that this
i s j u s t a pointer t o the top noun i n the taxonomy for human beings.
Any node i n t h i s taxonomy below
the
node marked human, whether i t i snot use the
verb
taxonomy as Raphael does. Eachverb
haaits
ownset
of s e l e c t i o n i n d i c a t o r s
f n h i s discussion of the goals of a semantic theory Winogtad
describes
semantic markers and s e l e c t i o n r e s t r i c t i o n s,
quotes Katz and Fodor (1964)and i n d i c a t e s t h a t he i n t e n d s t o mbody t h i s theory i n h i s system. But
i n f a c t semantic markers i n t h e
BLOCKS
programare
derived from a marker t r e e (Winograd, 1971, Figure59)
which i s organized taxonomically.In
t h e implementation process Winograd seems t o have moved from .a s t r i c tKatz-Chomsky p o s i t i o n t o a p o s i t i o n somewha1 c l o s e r t o RaphaeJ. a
J
Werner.
The Raphael and Werner proposals are t h e guiding p r i n c i p l e s here, ajapted t o a c c m o d a t e case-defxned arguments. The lexlcal entry f o r
mooel,
t h e i n t r a n s i t i v e pave, must t e l l u s about s e l e c t i o n as w e l l as how t o r e l a t e s u b j e c t , o b j e c t , and p r e p o s i t i o n a l phrases t o cases. The in-formation is organized t h i s way:
gramnutical function
caseframe
selection
i n f a r a t i o n
1. s u b j e c t &per fencer thing
move1 # 2.
from
source t h i n g , place3,
to,
into,
onto goal t h i n g , placeThe
numbers 1, 2 ,3 ,
indicate argdmentpositions
for t h e predicate c a l ~ u l d ~r e p r e s e n t a t i o n . The next column lists the grammatical function. Next
come dase i n d i c a t i o n s . Last comes t h e s e l e c t i o n information, the t o p
node i n t h e r e l e v a n t p a r t of t h e taxonomy. For move1 the subject i s an
experiencer.
The
source i s u s u a l l y marked by t h e prepositaon f m m . Thec a n be a physical argument o r
thzw,
t h e s o u r c e and goal can both b e p l a c e s . There is a r u l e t h a t any p h y s i c a l g o a l can b e r e p l a c e d by ac l a s s of adverbs c o n t a i n i n g
back
and there, s o t h e s e a l t e r n a t i v e s do n o t have t o b e l i s t e d ,An a t t e m p t i s being made t o u s e t h e verb taxorlomy as Raphgel suggested. I n t h i s l e x i c o n go i s marked a s taxonomically r e l a t e d t o
move. The e n t r y f o r go does n o t c o n t a i n t h e i n f o r m a t i o n l a b e l l e d #
above. I n s t e a d , when t h i s i n f o r m a t i o n i s needed, the look-up r o u t i n e climbs t h e taxonomic t r e e i n t h e l e x i c o n u n t i l i t f i n d s a v e r b which h a s this information and oopies i t from t h a t e n t r y . Thus i t gets,case-
argument and s e l e c t i o n a l i n f o r m a t i o n f o r
go
from t h e e n t r y f o rmove.
It i s n o t c l e a r y e t whether t h i s w i l l r e a l l y work with a s i z a b l e vo-cabulary.
T h i s selec t i o n a l informa t i o n is t r e a t e d a s s e l e c t f o n p r e f e r e n c e
and n o t s e l e c t i o n r e s t r i c t i o n . Each c a n d i d a t e word s e n s e f o r a verb i s
checked f o r s e l e c t i o n a l p r e f b r e n c e . I f no arrangement of t h e avail-
able noun p h r a s e arguments 1s c o n s i s t e n t w i t h t h e s e preferences a n o t h e r word senqe i s examined But i f a l l word s e n s e e have been r e j e c t e d on
t h e basis of s e l e c t i o n a l information, t h e s e n t e n c e is n o t rejected
Instead w e look again a t the c a n d i d a t e word s e n s e s and count for ~ a c h one t h e number of q t e p s up the tzxonomic t r e e w e have t o make t o m r e s o l v e t h e c o n f l i c t . The word-sensf which requires t h e fewest s t e p s I s ~ h d s e n .
The hope is tHat t h e system will be a b l e t o "understand" s i m p l e metaphors t h i s way. I t would b e i n t e r e s t i n g t o t r y t o c r e a t e metaphors by picking noun phrase arpuments c l a w t o but: n o t u n d t r t h e nodcs I n d i c i t t d by the
d .
The
Amonm-PoZyeemy Problm
-
Cr$te&afor
Separate
Entriee.
Words with t h e same
phyeical
shape b u tdgffprent
meanings c o n s t a n t l ycause t r o u b l e i n n a t u r a l language processing. I n designing a lexicon w e must decide whether o r not t o c r e a t e a separate e n t r y f o r each v a r i a t i o n
t
meaning aid typeof
use. Q u i l l i a n i s p a r t i c u l a r l y i n t e r e s t e d i n words w'ith m u l t i p l e meafiings and he experimented with tseveral i n h i s memorymodel. I n Q u i l l i a n (1968) the word
ptmt
i s treated as a three-way homo-nym with three separate ttrpe nodes, each with a s e p a r a t e d e f i n i t i o n - p l a n e :
PLANT Living s t r u c t u r e which i s n o t ah animal, f r e q u e n t l y with leaves, g e t t i n g i t s food from a i r ,
water,
earth.PUNT2 Apparatus used f o r any process i n industry. W T 3 P u t (seed, p l a n t , etc.) i n e a x h f o r growth.
The type node f o r t h e first forms a d i s j u n c t i v e set w i t h token nodes pointing t o the other two
The word food has a s i n g l e d e f i n i t i o n w i t h a l t e r n a t i v e formulations: That which J i v i n g being has t o take i n t o keep i t
l i v i n g and f o r growth. Things f omllng meals,
e s p e c i a l l y o t h e r than dklnk.
A polysemous word l i k e t h i s has a s i n g l e type node and a s i n g l e d e f i n i t i o n - plane, b u t t h e two a l t e r n a t i v e d e f i n i t i o n s a r e combined with an OR l i n k .
r/
Apresyan,
Me1
'guk and ZolkovsQ a t tack t h e homonymy-polysemy problemwith v i g o r . Graphically coincident worda a r e considered homonyms, given
d i s t i n c t i v e s u p e r s c r i p t s and l i s t e d as s e p a r a t e e e n t r i e s , i f t h e i r d e f i n i t i o n s
V
n o t define "a common part," b u t t h e y do give an example.
KOCA
(scythe),2 3
KOCA ( b r a i d of h a i r ) ,
KOCA
( s p i t ) . If two gefinitibns have a singlecommon p a r t , t h e word i s classified as polysemantic w i t h a single entry
divided i n t o s e p a r a t e p a r t s . They d i s t i n g u i s h two t y p e s of polysemy. Ifi
one case the difference between two words i s r e g u l a r .
The
r e l a t i o n of a verb t o i t s typical ob J ect i s such a regular meaning change, e. q. record (v)-
r e c o r d ( n ),
f ieh(v)-
f
i s h ( n ),
and aid(v)-
a i d (n).
ghese regularvari-
ations i n meaning a r e numbered with Arabic numerals, while irreguldr vari-
ations are numbered with Roman numerals. Thus part 3 o f t h e l e x i c a l e n t r y f o r
bm,
the d e f i n i t i o n , might have the form:bod
XC 1. To bend t h e head i n assent o r r e v e ~ e n c e . (vt)2. To submit o r y r e l d . ( v i )
3 . TO cause t o bend. ( v t )
a
4. An i n c l i n a c i b n of t h e head. (n)
5. A
bent amplement used t o propel an arrowo r play a stringed mstrument. (n) I . 1. The forward part of a b o a t . (n)
2. One
who
rows in the bow of a boat. (n)There seems t o b e some redundancy between d e f i n i t i o n - e l e m e n t s and the
lexical f u n c t i o n s . Shouldn't r e g u l a r v a r i a t i o n s i n meaning b e captured by fegular l e x i c a l fllnctions?
If
s o , then the distinction Apresyan,u?
Zolkovsky and ~el'fuk make between regular and irregular meaning variations
w i l l be appareht from the form and need not h e i n d i c a t e d by d i f f e r e n t no-
tation, such as Arabic and Roman numerals.
For convenience i n lexical lookup we have a single physical e n t r y
is numbered s e p a r a t e l y w i t h Arabic numerals. Thus t h e a d ~ e c t i v e i s
coozl,
coo22 i$ t h e verb t o becomecoozl,
a d c o d 3 i s t h e verb meaning t o causet o
become cooll. Separate information aboub. l e x i c a l r e l a t i ~ n s , etc. iss t o r e d f o r each subentry.
e.
Xdzornb
Idloms present a s e r i o u s problem t o t h e designer of an English l e x i c o n Some c r i t e r l a must be e s t a b l i s h e d f o r deciding which idioms deserve s e p a r a t e
l e x i c a l erntries and how multi-word phrases should be s t o r e d .
When does an idiom deserve t o be t r e a t e d as a s e p a r a t e Lexical unlt? v
Apresyan, Zolkovsky, and
el'
Euk (1970) and Klparsky (1979) r e p r e s e n toppbsfte p o l e s of opinion here. I n t h e explanatory-combinatory dictionary
(ECF) of t h e Soviets word c o m b i n a t i ~ n s which have a d e f i n i t i o n of t h e i r own o r "a p e c u l l a r 'combinability p a t t e r n " have s e p a r a t e e n t r i e s . Kiparsky
(1975) c o n s i d e r s an idiom as a s e p a r a t e l e x i c a l u q i t only i f i t invoxves s y n t a c t i c patterns which a r e no longer productive. T h ~ s "house b e a u t i f u l "
and "come h e l l o r high water" are t r e a t e d be u n i t s , but "make headway" i s no't. Instead
headmy
i s defxned as "progress" and marked as appearing aftermake'
androse.
U p a r s k y ' s proposal p l a c e s a greater burden on the recognition program which would have t o be a b l e t o r e t r i e v e and put to- g e t h e r t h e p i e c e s of t h e idiom using h i s l e x i c o n The system descrQedhere follows Apresyan, ~ o l k o v s k y and Me1 'Euk, and t r e a t s f i x e d phrases
as u n i t s . I n p a r t i c u l a r , a l l noun-noun combinations l i k e pzggg bank and
bz~thday
cake a r e s e p a r a t e l y defined, although t h l s is c e r t a i n l y a productive p a r t u o f English.Levi
t h e underlying structure f o r "birthday boy" i s "boy-have-b5rthday"and the underlying s truc t p r e f o r "Sir thday cake" is "cake-for-bir thday
.
tlThen under c e r t a i n conditions
h e ,
for,
etc. can be d e l e t e d t ogive
us t h e noun adjunct expressions. Given these r u l e s , she argues, i t is not necessary t o treat these exptessions acCl s e p a r a t e lexical items.While
herrules
seem sufficient t o allow us t o syntheeiize these compounds c o r r e c t l y , difficlClJties a r i s e when we t r y t o use them for a n a l y s i s . The question-answering system needs t o be able t o infer from "birthday boy" that t h e boy i n question is having a birthday, but t o avoid i n f e r r i n g from "birthday cake" t h a t t h e cake is having a b i r t h d a y . For c o r r e c t r e c o g n i t i o n w e need
t o b e a b l e t o recover t h e qnique underlying stl'ucture i f one exists. (For
a similar criticism see Downing 1977 : 814-15. )
Levi'
s t h e o r y accountsfor
the generation of new noun-noun compounds. However, i n order t o pcrount f o r the r e c o g n i t i o n process we need l e x i c a l entries for f a ~ p i l i a r f i x e d com- pounds and her theory t o analyze new compounds. We have usedLe+iVs
struc- ture as a basis for our r e p r e s e n t a t i o n of omp pound nouns.Noun-noun colppounds have separate e n t r i e s . A
birthday
cake
is treatedas "a cake f o r a birthday." A ball gme is represented as "a game that has a ball". A piggy
bank
1 s deflned as "a bank t h a t i s a pj g. I tThe system i s t o l d that Jim has a piggy bank and asked what the bank looks l i k e . It could be argued t h a t anyone with s u f f i c i e n t c u l t u r a l know-
ledge ought t o be able t o answer t h i s even i f a l l t h e banks i n his p a s t
were shaped
like
bee-hives, b u t w e need a place t o write down this c u l t u r a l encyclopeaic knowledge and a l e x i c a l entry f o rpiggy
bank
seems l i k e a goodBecker i n
hie
wrk
on"The Phrasal
Lexicon" (1975) has produced evidenceon
t h e Soviet s i d e of t h i sargument,
H i s d a t a suggest t h a tf
h e d phrases comprise approximately h a l f of our spoken o u t p u t and have an independentlexical
existewe.
He includes i n h i s l e x i c o n euphemisms("the o l d e s t profession")
,
p h r a s a l c o n s t r a i n t r ('8yi"""
s h e e r coincidence"), d e i c t i c l o c u t i o n s ("for t h a t matter"), sentence b u i l d e r s (" (person A) gave (person B) a loag song and dance about(a
topic)'!), s i t u a t i o n a l u t t e r a n c e s ("HOW can I ever repay yau?"),
and verbatim t e x t s (proverbs, song t i t l e s ,etc
.).
He claims t h a twe speak mostly .by s t i t c h i n g t o g e t h e r bswatcbs of t e x t that w e have heard before; productive processes have t h e secondary r o l e of adapting t h e o l d phrases
t o t h e new situation....most u t t e r a n c e s a r e produced
i n
s t e ~ e k yped s o c i a l s i t u a t i o n s , where t h e c o m n i - c a t i v e and r i t u a l i s t i c functions of languagg demand not novelty, b u t r a t h e r an a p p r o p r i a t e combination of formulas, c l i c h e a , idioms, a l l u s i o n s , slogans..
.
(1975 : 60)
H e has c o l l e c t e d 25,000 phrases f o r t h e p h r a s a l lexicon.
Catherinq Flournoy (1975) has found s e v e r a l hundred f i x e d phrases i n d computer scudy of Father Coughlin's speeches.
This
i s
not a newi d e a t o s t u d e n t s of o r a l epic poetry. Homer c o n s t a n t l y used f i x e d phrases t o f i t s y n t a c t i c and m e t r i c a l s l o t s . Dawn is always "rosy-tingered";
Hector is c o n s t a n t l y " t a i l Hector of t h e
shining
helm. I IThere
i s
a s e r i o u s space-time t r a d e o f f h e r e between p a r s i n gtime
andlexical
storage space. It i s probably t r u e t h a t people possess andc o n s t a n t l y u s e
a phrasal
lexicon. Whether we shoulduse
atorage
spaceCurrently we provide separate e n t r i e s f o r any phrase that we
cannot
parse and i n t e r p r e t c o r r e c t l y from t h e e n t r i e s f o rindividual
words. Briaf entries f o r these phrases seem a b s o l u t e l y necessaty f o r any prwtacal re- c o g n i t i o n scheme. These entries a l s o seem t o b e the a p p r o p r i a t e place for indexing p o i n t e r s t o t h e c u l t u r a l i n £ ormation necessary b r mWogifif
er- ences and answering question6 about b i r t h d a y cakes and b i r t h d a y parties.There
a r etheoretical
arguments f o r suchentries as
well.
W
e
believe,
as
Becker does, i n t h e p h r a s a l lexicon, although we do n o t Pnslude entries f o r any phrases t h a t can b e parsed and i n t e r p r e t e d correttly without a
s e p a r a t e entry. Any complete system
for
language proceestng must a l s o ,o f course, contain r u l e s l i k e Levi's t o providg an a b i l i t y t o process
novel forms.
f
.
Prelhznary Design Deczsions
f o r
the
Lex{con.
The
goal of t h i s project is a lexicon suf f i c i e a t f o r p a r s i n g , formingsemantic r e p r e s e n t a t i o n s , and making inferences, c o q a c t b u t s t i l l a l I 0 ~ 4 . n g r a p i d l e x i c a l lookup.
The
lexicon isa
data-base f o r the question-answering system,a combination lexicon-encyclopedia
.
S y n t a c t i c aad semantdc i n f o bt i o n are combined i h t h e same l e x i c a l enrries. L e x i c a l semanttc r e p r e s e n t a t i o n s are m i t t e n i n the same form as t h e semantic r e p r e s e n t a t i o n s for sentenced,i n a many-tiorted. f i r s t o r d e r p r e d i c a t e c a l c u l u s . Homographs w3ich vary i n meaning o r u s e a r e d i f f e r e n t i a t e d by Arabic numetal s u b s c r i p t s . Separate
entries are included f o r
phras
es w i t h fixed meaning.The
relations areused
t o expressand retrieve
plany d i f ferem kzncls
informaticn,
from
p a s tparticiples
t o s e l e c t i o npreferences
t o proper habjtatsfor
l i o n s .Thus
the system of l e x i c a l relations is crucial t orepresentation, retrieval, and inferance.
3 .
s O M ~
THEORIES OF LEItgcatRELATIONS
While developing
our
lexical relations we examined a variety ofrelational theories
in
anthropology and lingui st iceand
even
c o l l e c t e dfolk
defilpitions of our own (Evens1975)
.
We have been parrlcularlyinfluenced by the anthropological fieldwork of Casagrande and
Hale
(1967)
by
the
memory models of Raphael (1968) and Werner (19741, and most of allby the
ECD of
Apresyan, Me1 cuk,and
Zolkovsky (1970). But we looked a teach
of
these relational theoriesfrom
t h e peaaliar point crf view o fcorn
put er ques
tion-answering
andthe
pal-titularlexical
environment ofc h i l -
dren's stories, adding and discarding relations t o fit the problem.Cdsagmnde
mrd
Bale
-
L&caZ
RejSations
in
FoZk
Definitions.
Casagrande and
Hale
(1967) collected 800 Papago folk-defini*tionsand sorted them i n t o groups on the basis b f semantic and grammatical
s i m i l a r i t i e s . They produced the following list of thirteen lexical re-
l a t i o n s . (Table 1)
Table
1.
TheRelatims
of Casagrande and Ha3eReZatiofi
Word
EngZish
0208sof
Papagu
Cefinitiow
1,
Attributiveburrming
~ I Z
but they are small; and they act l i k e mice; they l i v e i n holes.2.
Contingencyt o
get
a g r yWhen we
do notlike
somethingwe
getangry.