The Design and Application
of a Domain Specific Knowledgebase
in the TACITUS
Text Understanding System
Abstract
TACITUS is a text understanding system being developed at SRI In ternational. One o f the main components in the system is a knowledge base which contains commonsense and domain specific world knowledge encoded as axioms in a first order predicate calculus language. The prime function o f the knowledgebase is to provide extra-linguistic facts to be used in the resolution o f a range o f ambiguities such as compound nom inal constructions, definite reference, and in drawing conclusions on the basis o f the implicatures in the text. The paper discusses the methodology used in building a knowledgebase for analyzing news reports about terror ist attacks, and demonstrates how it is used in an application extracting information to be stored in a simulated database.
1
Preamble
D u r in g m y te r m as In te r n a tio n a l F e llo w a t S R I In te rn a tio n a l, C a lifo r n ia , th is p a s t w in te r, I h a d th e o p p o r t u n it y t o fa m ilia riz e m y s e lf w ith th e T A C I T U S t e x t u n d e r s ta n d in g s y s te m . U n d e r th e s u p e r v is io n o f J erry H o b b s , w h o is h ead o f th e T A C I T U S p r o je c t , I d e v e lo p e d a d o m a in s p e c ific k n o w le d g e b a s e f o r th e T A C I T U S s y s te m . T h e p re se n t p a p e r is a b r ie f a n d fa irly h ig h -le v e l a n d n o n te c h n ic a l o v e r v ie w o f th e e n te rp rise .
S e c tio n 2 o f th e p a p e r p re se n ts th e m e t h o d o lo g y u sed in th e c o n s tr u c tio n o f th e k n o w le d g e b a s e fo r n ew s r e p o r ts a b o u t te r r o r is t a tta ck s ; a c r u d e o u tlin e o f th e T A C I T U S s y s te m is g iv e n in s e c tio n 3 as n e ce ssa ry b a c k g r o u n d in fo rm a tio n b e fo r e w e g o o n t o lo o k in g in d e ta il a t a n e x a m p le te x t in se ctio n s 4 a n d 5. W e c o n c lu d e w ith s o m e fin a l re m a rk s in s e c tio n 6.
2
The Methodology behind the Construction
of the Knowledgebase
O u r g o a l w as t o b u ild a fa ir ly la rg e k n o w le d g e b a s e fo r a s p e c ific d o m a in , n a m e ly terrorist a tta ck s , t o b e u sed as a b a sis fo r a u to m a t e d u n d e r s ta n d in g o f te x ts fa llin g w ith in th is d o m a in , a n d su b se q u e n t a u to m a t ic e x t r a c t io n o f s p e c ific in fo r m a tio n . W e d e c id e d t o w o r k o n th e b a sis o f a set o f s a m p le te x ts , a n d w e c o m p ile d a c o r p u s c o n s is tin g o f s ev era l n ew s r e p o r ts a b o u t te rro ris t e v en ts. T h is c o r p u s th en c o n s tit u t e d th e b a c k b o n e in o u r w ork .
R a th e r th a n a d o p t w h a t m ig h t b e te r m e d a s tr ic t s u b la n g u a g e a p p r o a c h t o th e d e s c r ip tiv e ta sk (c f. H irsch m a n 1986, a n d H o b b s 1 984 f o r m o r e d e ta ile d d is cu s s io n s ), w e e m p lo y e d a m e t h o d o lo g y o f ste p w ise refin em en t (c f. H o b b s 1 9 8 4 ). T h e th ree s te p s o f o u r w o r k in g m e t h o d o lo g y , w h ich w ill b e e la b o r a t e d o n b e lo w , c o n s is te d in:
• A n (in fo r m a l) a n a ly sis o f th e c o r p u s te x ts in o r d e r t o esta b lish a b a s ic v o c a b u la r y , d e te rm in e a n d s e le ct relev a n t fa c ts fo r th e d o m a in .
• B re a k in g u p th e d o m a in in to s e lf-c o n ta in e d a n d co h e r e n t s u b -d o m a in s .
• A x io m a tiz in g th e fa c ts o f th e s u b d o m a in s .
2.1
T h e A nalysis o f C orpus T exts
F irstly , th e c o r p u s te x ts serv ed th e p u r p o s e o f e s ta b lis h in g th e b a s ic v o c a b u la r y in o u r sy stem . S e co n d ly , th e y c o n s tit u t e d a p ic t u r e o f th e w o r ld w e in te n d e d t o m o d e l in o u r k n o w le d g e b a s e , i.e. w h a t a re th e s e ttin g s , w h a t a re th e ty p ic a l a c tio n s , w h o a re th e a g en ts, w h a t a re th e ro le s a n d r e la tio n s b e tw e e n th e en titie s in o u r ‘ te r r o r is t’ u n iverse, e tc . T h u s th e y in d ic a te d w h a t lin g u is tic a n d e x tr a - lin g u istic in fo r m a tio n w o u ld b e n e e d e d in o u r k n o w le d g e b a s e .
U sin g a fu ll-s e n te n ce c o n c o r d a n c e o f th e s a m p le te x ts , w e lo o k e d a t ea ch sin gle le x ic a l ite m in c o n t e x t , a n d n o te d d o w n , in a n in fo r m a l m a n n e r, w h a t fa cts w ere lin g u is tic a lly p r e s u p p o s e d a n d w h a t g e n era l b a c k g r o u n d k n o w le d g e w o u ld b e n e e d e d in o r d e r t o u n d e rs ta n d a g iv e n o c c u r r e n c e o f a le x ic a l ite m in its c o n te x t. (W e w ill n o t d iscu ss th e m e a n in g o f ‘ u n d e r s t a n d ’ h ere, b u t w e u se it in a sen se sim ila r t o th a t o f E c o ’s te r m ‘ a c tu a lis a tio n ’ ( E c o 1 9 7 9 )).
T h e an a ly sis resu lts in a first b r e a k in g d o w n o f e a c h ite m in to c o m p o n e n t p a rts a n d e x p lic it s ta te m e n ts a b o u t th e im p lic a tu r e s (G r ic e 1 9 7 5 ) c a rrie d b y th e te x t.
2.2
Structuring th e D om ain Inform ation
S o r tin g fa c t s in to s u b -d o m a in s is g e n e r a lly a s tra ig h tfo rw a rd p ro ce s s . T h e first c r u d e d is t in c tio n w h ich ca n b e m a d e , is th a t b e tw e e n fa c ts p e rta in in g to c o m m o n s e n s e k n o w le d g e a n d d o m a in s p e c ific o r sp e cia liz e d k n ow led g e. T h e fo r m e r is fa c t s a b o u t th e w o r ld in g e n era l a n d n o t p a r tic u la r ly tied t o a sp e c ific d o m a in ( b e it te r r o r is t a c tio n s , in fo r m a tio n te ch n o lo g y , o r w h a t h a v e y o u ), w h ereas th e la t te r ch a r a c te r iz e s th e fa c t s w h ich a re q u ite o fte n fo u n d t o b e re stricte d a n d h ig h ly s p e c ia liz e d .
F a cts p e r ta in in g t o fo r e x a m p le s p a c e , tim e, a n d b e lie f a re co n s id e re d c o m m o n s e n s e k n o w le d g e , w h e re a s v a rio u s fa c t s a b o u t te r r o r is t o rg a n iz a tio n s are c le a r ly d o m a in s p e c ific , a n d essen tia l fo r th e u n d e rs ta n d in g o f r e p o r ts a b o u t te r r o r is t e v en ts. G e o g r a p h ic a l fa c t s a b o u t th e lo c a t io n o f citie s a n d cou n tries se e m t o fa ll s o m e w h e r e b e tw e e n th e m o r e a b s tr a c t c o m m o n s e n s e n o tio n a n d th e s p e c ia liz e d d o m a in k n o w le d g e .
O n th e b a sis o f th e resu lts fr o m o u r fa c t-fin d in g , i.e. s te p o n e a b o v e , w e d e fin e d 3 0 s u b -d o m a in s . T h e o v e ra ll c o n c e p t u a l s tru ctu re fo r th e k n o w le d g e b a se, th e s u b -d o m a in s a n d th e r e la tio n s b e tw e e n th e m , ca n b e s c h e m a tic a lly ren dered b y th e illu s tr a tio n in fig u re 1.
A p a r t fr o m p r o v id in g c o n c e p t u a l cla rity , th e a d v a n ta g e o f th is m o d u la r a p p r o a c h is o b v io u s ly th a t it p e r m its y o u t o la ter en h a n ce o r m o d ify th e su b - d o m a in s in th e k n o w le d g e b a s e in d e p e n d e n tly o f ea ch o th e r.
2 .3
A x io m a tiza tio n o f th e Facts
T h e fin a l s te p in th e c o n s tr u c tio n o f th e k n o w le d g e b a s e c o n s is te d in cre a tin g p r e c is e o n t o lo g ie s fo r th e in d iv id u a l s u b -d o m a in s , i.e. w h a t en tities e x is t an d w h a t a re th e re la tio n s b e tw e e n th e m , a n d a x io m a tiz in g th e fa cts.
T h e m a in ta sk h ere w a s t o d e c id e o n w h ich p r e d ic a te s t o d e c o m p o s e , i.e. c h a r a c te r iz e b y o t h e r o r n ew p r e d ic a te s , a n d w h ich w e re t o b e b a s ic p re d ica te s, i.e. g r o u n d te rm s fo r w h ich n o fu rth e r d e s c r ip t io n is p r o v id e d .
T h e id e a b e h in d th e a d o p t e d a p p r o a c h is n e ith e r t o fu lly d efin e ea ch lex ica l ite m in th e sen se o f p r o v id in g n e ce ssa ry a n d su fficien t c o n d itio n s , n o r t o d e c o m p o s e it in to a p re d e fin e d set p rim itiv e s in th e S ch a n k ia n tr a d itio n . R a th e r , th e p u r p o s e w a s t o c h a r a c te r iz e th e p r e d ic a te s u sed in th e k n o w le d g e -b a s e . C o n s id e r as a n e x a m p le th e fo llo w in g a x io m s fr o m th e ‘o r g a n iz a tio n ’ s u b -d o m a in .
o r g a n iz a tio n ( o ) - > E s ( V x . x G s - > p e r s o n ( x ) & m e m b e r ( x , o ) ) & E p ,g p la n ( p ,g ,o )
m e m b e r ( x ,o ) - > E e. r o le ( e ,x ,o )
r o le ( e ,x ,o ) < - a g e n t (x ,e ) & in .s e r v ic e jD f ( e ,g ,p ) & p la n ( p ,g ,o )
bomb.attack kidnap.attack
I
terrorist.organization
I
I
I
country
I
ethnicity
I— organization —I
1
police
I
-I
city
-I
publish—
I---firm
plansftgoeds
--- communication
I
property
I
I
daily_life
I
I
belief
1
possession
person.move
injury
----I
I
process
I
1
time
[image:4.595.122.480.149.730.2]I
I
scale—
I I I
I set I
numbers |
-bomb_structure
I
— phys_obj
I
1
I
I-normal
I
I
I geography
1
I
space —
I I
modality
I-■— predicates
h a s a r o le , w h ich is b e in g th e a g en t o f s o m e a c tio n w h ich is in se rv ice o f th e plan o f th e o r g a n iz a tio n .
3
The Knowledgebase and the TACITU S
System
T o te st o u r k n o w le d g e b a s e , w e im p le m e n te d a su b set (a p p . 1 0 0 ) o f th e a x io m s w e h a d d e fin e d o n th e s y s te m , a n d ran d ifferen t ty p e s o f se n te n ce s. T h e a x io m s are s ta te d in th e ‘ o n t o lo g ic a l p r o m is c u o u s ’ n o ta tio n d e v e lo p e d b y H o b b s (c f. H o b b s 1 9 8 5 b ).
T h is n o ta t io n is a first o r d e r p r e d ic a te ca lc u lu s la n g u a g e w ith th e a d d itio n o f a n o m in a liz a tio n o p e r a t o r , w r itte n ‘ !’ , a n d an e x tr a a rg u m e n t, in fo r m a lly referred t o as th e ‘ s e lf ’ a rg u m e n t.
T o b e m o r e c o n c r e t e a n d t o c o n v e y th e b a s ic in tu ition o f th e n o ta tio n t o th e r e a d e r, le t us c o n s id e r a s im p le e x a m p le :
e x p lo d e ( b ) w h ich is t o b e rea d as: b e x p lo d e s
e x p l o d e ! ( e l ’", b ) w h ich is t o b e rea d as: th e e x p lo s io n o f b.
W h e r e p ( x ) sa y s th a t p is tru e o f x , p !( e ,x ) says th a t e is th e e v e n tu a lity o r p o s s ib le s itu a tio n o f p b e in g tru e o f x . C o n se q u e n tly , H o b b s ’ n o ta tio n c a n b e re la te d t o s ta n d a r d first o r d e r p r e d ic a te ex p ression s b y th e fo llo w in g a x io m :
( V x ) p ( x ) < = > (E e ) p !( e ,x ) & R e x is ts (e )
w h e re R e x is t s ( e ) sa y s th a t th e e v e n tu a lity ‘e ’ d o e s in fa c t re a lly ex ist.
In s u m , th e b a s ic id e a o f th e n o ta tio n is th a t o f s p littin g a sen te n ce in to its p r o p o s it io n a l c o n te n t a n d a n a s s e r tio n a l/e x is te n tia l c la im . F u rth e rm o re , th e self a r g u m e n t, i.e. th e ‘ e ’ , p r o v id e s a ‘ h a n d le ’ fo r referrin g t o a p r e d ic a tio n , i.e. a p r e d ic a te a n d its a r g u m e n t, in o th e r p r e d ic a te s .
B e fo r e w e g o o n t o d is cu s s in g a s a m p le t e x t, w e w ill g iv e a c ru d e o v e rv ie w o f th e b a s ic c o m p o n e n t s a n d th e ir fu n c tio n in g in th e T A C I T U S sy ste m . W e d e lib e r a te ly ig n o r e s o m e o f th e m o r e a d v a n c e d fea tu res o f T A C I T U S in o r d e r n o t t o g e t b o g g e d d o w n b y t o o m a n y te c h n ic a l d eta ils. U n fo rtu n a te ly , th is m ea n s th a t w e d o n o t d o T A C I T U S fu ll ju s t ic e (b u t f o r m o r e d e ta ile d a n d co m p re h e n s iv e d e s c r ip t io n s o f th e s y s te m , see fo r e x a m p le H o b b s 1 9 8 6 c a n d la te r).
th e p ra g m a tics m o d u le is t o re so lv e referen tia l e x p r e s s io n s a n d s o m e s y n t a c t ic a m b ig u ities, t o e x p a n d m e to n y m ie s , a n d t o in te rp re t th e im p lic it r e la tio n s in c o m p o u n d n om in a ls. T h e p r a g m a tic s m o d u le w o rk s b y c o n s t r u c t in g a lo g ic a l ex p ression fo r th e b a s ic se m a n tic a n a ly sis resu lt, a n d c a llin g th e K A D S th e o r e m p ro v e r (S tick el 1 9 8 2 ) t o p r o v e o r d e riv e it u sin g a s ch e m e o f a b d u c t iv e in fe r e n c in g in w h ich it is p e r m itte d t o a ssu m e th e e x is te n c e o f ‘ n e w ’ fa c ts . T h e th e o r e m p ro v e r d ra w s o n th e k n o w le d g e b a s e o f c o m m o n s e n s e a n d d o m a in k n o w le d g e t o c o m p le te th e ta sk .
A b d u c t iv e in fe re n ce is, o f c o u r s e , a lo g ic a lly in v a lid m o d e o f in fe re n ce , i.e. g iv en p ( X ) —► q ( X ) a n d q ( a ) w e c o n c lu d e p ( a ) . H o w e v e r, w e m a y a rg u e , as d o e s H o b b s (c f. H o b b s e t al. 1 9 8 8 ), th a t it is a r e a so n a b le w a y o f lo o k in g a t te x t u n d e rsta n d in g b e c a u s e a b d u c t io n is in fe re n ce t o th e ‘ b e s t e x p la n a t io n ’ in a g iv e n c o n te x t. q ( a ) c a n b e th o u g h t o f as th e o b s e r v e r a b le e v id e n c e , th e im p lic a t io n as th e gen era l p r in c ip le th a t c o u ld e x p la in th e o c c u r e n c e o f q ( a ) , a n d th e a n te c e d e n t o f th e im p lic a t io n as th e u n d e rly in g ca u s e o r e x p la n a tio n o f q ( a ) .
A n in terestin g fe a tu re o f th e p r a g m a tic s m o d u le is th a t it u ses a s ch e m e fo r a b d u c tiv e in fe re n cin g in w h ich w eig h ts a n d c o s ts a re a ssig n e d t o th e a x io m s (fo r fu rth er d eta ils, see e .g . S tick el 1 9 8 8 ). T h u s i f w e c a n n o t p r o v e an a n te c e d e n t, w e assu m e its e x is te n c e a t s o m e c o s t. S o m e b a s ic h e u r is tic p r in c ip le s c o n tr o llin g th e w eig h ts a n d a ssu m a b ility c o s ts a re h a rd w ire d in to th e s y s te m (e .g . it is m o r e e x p e n siv e t o a ssu m e a fa c t th a n t o p r o v e it, a n d it is less e x p e n s iv e t o a ssu m e an in d efin ite e n tity th a n a d e fin ite o n e ), b u t th e a x io m s in th e k n o w le d g e b a s e m a y b e a ssign ed co s ts m a n u a lly (c f. 4 .2 ). T h e in te r p r e ta tio n o f a t e x t in th is a b d u c tiv e a n d a s s u m p tio n -b a s e d fr a m e w o r k , a m o u n ts t o p r o d u c in g th e m in im a l e x p la n a tio n o f w h y th e te x t w o u ld b e tr u e (c f. H o b b s et al. 1988 fo r a d e ta ile d d iscu ssio n ).
T h e an a ly sis c o m p o n e n t, i.e th e c o m p o n e n t fo r e x tr a c tin g ta sk s p e c ific in fo r m a tio n fr o m a n in te rp re te d t e x t , is b a s ic a lly a s p e c ia liz e d c a ll t o th e th e o r e m p ro v e r (see fu rth e r b e lo w ). T h e e n h a n c e d lo g ic a l fo r m , i.e. th e resu lt o u t p u t fr o m th e p r a g m a tic s m o d u le , is a b d u c t iv e ly p r o v e d b y b a c k -c h a in in g o v e r th e a x io m s in th e k n o w le d g e b a se .
In th e n e x t s e c tio n s , w e w ill h a v e a lo o k a t a n e x a m p le te x t a n d s h o w h o w th e k n o w le d g e b a s e is u sed fo r d is a m b ig u a tio n a n d c o m p u t a t io n o f im p lic it in fo r m a tion .
4
An Example
L et us n o w co n s id e r th e fo llo w in g tw o s en ten ces as an e x a m p le te x t t o b e tr e a te d w ith in o u r fra m e w o rk :
(1 ) A b o m b e x p lo d e d a t a R e n a u lt s h o w r o o m in B ilb a o . A p e r s o n c la im in g t o represen t th e E T A - M h a d w a rn e d o f th e b la st in a c a ll t o th e p o lic e .
T h e e x tr a -lin g u is tic k n o w le d g e n e e d e d in o r d e r t o a ch ie v e s o m e rea son a b le le v e l o f u n d e r s ta n d in g o f th e te x t is a m o n g o th e r th in g s; R e n a u lt is a F rench firm m a n u fa c tu r in g p r o d u c t s , i.e. ca rs , a s h o w r o o m is a b u ild in g o w n e d b y a firm w h e r e th e p r o d u c t s o f th a t firm a re o n d isp la y, B ilb a o is a c it y in th e c o u n t r y S p a in , E T A - M is a te r r o r is t o r g a n iz a tio n , a n d te rro rist org a n iz a tio n s h a v e m e m b e r s , ce rta in p la n s a n d g o a ls a n d v io le n t m e t h o d s fo r rea ch in g th eir g o a ls , a n d a n e x p lo s io n g e n e r a lly in v o lv e s a blast.
T h e b a s ic fa c t s su ch as fo r in s ta n c e S p a in b e in g a c o u n t r y a n d E T A -M b e in g a te r r o r is t o r g a n iz a tio n , a re e n c o d e d as e x isten tia l a x io m s in th e k n ow led g eb a se. E .g :
(la) (Defaxiom COUNTRY-SPAIN-1 (terror)
‘‘
Spain is a country’’
((SOME ((el* . ev) (country! el* Spain)))
(lb) (Defaxiom TERORG-ETA-M-1 (terror)
“ETA-M is a terrorist orgemization’
’
((SOME ((el* . ev) (terorgi el* eta-m)))
T h e q u a n tifie d v a ria b les in th e a x io m s a re m a rk ed fo r th eir t y p e su ch th a t ‘e v ’ d e n o te s ev e n t a n d ‘ n e v ’ n o n -e v e n t v a ria b les.
4 .1
A x io m s for D isam biguating C om pou n d Nomined
Constructions
F r o m th e lin g u is tic p o in t o f v ie w , th e T A C I T U S fr a m e w o r k offers in terestin g p o s s ib ilitie s fo r d is a m b ig u a tin g c o m p o u n d n o m in a l e x p r e s s io n s u sin g lin g u istic as w ell as e x tr a -lin g u is tic k n o w le d g e .
T h e in d iv id u a l n o u n s in a c o m p o u n d n o m in a l c o n s tr u c tio n a re a n a ly z e d as a rg u m e n ts o f th e g e n e r ic ‘ n n ’-p r e d ic a t e . T h a t is, th e e x p re s s io n ‘ R e n a u lt sh ow r o o m ’ , w o u ld a p p e a r as n n ( e l ’'‘ , R e n a u l t , S h o w r o o m ) in th e in itia l lo g ic a l fo rm o f th e s e n te n ce p r o d u c e d as o u t p u t fr o m th e p a rsin g m o d u le .
In fo r m u la tin g th e a x io m s fo r re s o lv in g su ch n n -re la tio n s , w e a d o p te d a stra t e g y c o m b in in g th e lin e o f a n a ly sis fo r c o u m p o u n d n o m in a ls p r o p o s e d b y D o w n in g (1 9 7 7 ), a n d th a t a d v o c a t e d b y L e v i (1 9 7 8 ). In su m m a ry . D o w n in g argu es th a t th e se m a n tic re la tio n s h ip b e tw e e n th e elem en ts o f a c o u m p o u n d c a n n o t b e c h a r a c te r iz e d in te r m s o f a fin ite list o f a p p r o p r ia te c o m p o u n d in g rela tion sh ip s, w h e re a s L e v i tries t o e s ta b lis h su ch a list fo r th e m o s t c o m m o n ca ses o n th e b a sis o f th e tr a n s fo r m a tio n a l re la tio n s h ip b etw e e n th e elem en ts.
(2a) (Defaziom NN-1 (terror)
“An nn-relation: for”
(ALL ((el* . ev) (p . nev) (s . nev))
(IMPLY (for! el* s p)
(SOME ((e2* . ev))
(nn! e2* p s)))))
(2b) (Defaziom NN-2 (terror)
“An nn-relation: of”
(ALL ((el* . ev) (f . nev) (s . nev))
(IMPLY (of! el* s f)
(SOME ((e2* . ev))
(nn! e2* f s)))))
(3a) (Defaziom FOR-1 (terror)
“A shovroom is for products’’
(ALL ((e2* . ev) (s . nev) (e3* . ev) (p . nev) (e4* . ev) (f . nev))
(IMPLY (AHD (shovroom! e2* s) (product! e3* p) (firm! e4* f))
(SOME ((el* . ev))
(for! el* s p)))))
(3b) (Defaziom OF-1 (terror)
“A shovroom is ovned by a firm’’
(ALL ((o2* . ev) (s . nev) (e3*. ev) (e4* . ev) (f . nev))
(IMPLY (AND (shovroom! e2* s) (ovn! e3* f s) (firm! e4* f))
(SOME ((el* . ev))
(of! el* s f)))))
In tr y in g t o a b d u c tiv e ly p r o v e a relev a n t lo g ic a l fo r m o u t p u t fr o m th e p a rsin g m o d u le a n d t o m a k e im p lic it in fo r m a tio n e x p lic it , th e p r a g m a tic s m o d u le haa th e th e o r e m p ro v e r b a c k -c h a in o v e r th e a x io m s in th e k n o w le d g e b a s e . T h u s an n n -re la tio n as th e a b o v e is r e so lv e d a g a in st 2 a a n d 2 b , th e n th e n ew g o a ls , o f ! ( e l * s f ) a n d f o r ! ( e l * s f ) , a re r e s o lv e d a g a in st 3 a a n d 3 b re s p e ctiv e ly , y ie ld in g n ew g o a ls t o b e re solv ed .
4.2
A x io m s for R esolving R eferring Expressions
A s m e n tio n e d a b o v e , o n e o f th e b a s ic h e u ristic a s s u m p tio n h a rd w ire d in to T A C I T U S ’ p r a g m a tic s m o d u le is th a t an in d e fin ite n o u n p h ra se in tr o d u c e s n ew in fo rm a tio n a n d a d e fin ite n o u n p h ra se refers t o a k n o w n en tity , i.e. s o m e th in g w h ich is eith er in th e k n o w le d g e b a s e o r h a s b e e n in tr o d u c e d in th e p r e v io u s ly p r o c e s s e d te x t. H e n ce th e c o s t o f a ssu m in g an in d e fin ite n o u n p h ra se is c h e a p e r th a n a ssu m in g a d e fin ite n o u n p h rase.
In th e e x a m p le s en ten ces g iv e n in (1 ) , th e n o u n p h ra se ‘ th e b la s t ’ , is rela ted t o th e ev en t o f th e e x p lo s io n m e n tio n e d in th e p r e c e e d in g se n te n ce . S im p lify in g so m e w h a t (c f. fu rth e r b e lo w ), w e c o u ld sa y th a t ’ th e b la s t ’ is in a sen se a n o m in a liz a tio n o f ‘ a b o m b e x p lo d e d ’ .
(4) (Defaxiom EXPLOSIOK-BLAST-1 (terror)
‘‘
An explosion generates a blast'’
(ALL ((el* . ev) (x . nev) (y . nev) (z . nev))
(IMPLY (AND (ASSUMABLE (etc-expl el* x y z ) 0.3)
(explode! el* x y z))
(SOME ((e2* . ev) (b . nev))
(AND (blast! e2* b) (genn el* e2*))))))
E ssen tia lly , th is a x io m sa y s th a t a b la s t (e 2 * ) im p lies th e o c c u r r e n c e o f som e e x p lo s io n ev e n t ( e l * ) , a n d th a t th e la tte r g en era tes th e fo r m e r , w h ich is sta ted b y w a y o f th e p r im itiv e p r e d ic a te ‘ g e n n ’ . T h e p r e d ic a te ‘e t c - e x p l’ , w h ich ca n b e seen as ‘ a d d it io n a l’ , b u t n o t s p e lle d o u t p r o p e r tie s r e la tin g t o th e e x p lo d e p r e d ic a te , is in t r o d u c e d b e c a u s e w e d o n o t w a n t t o s ta te fla tly th a t ‘ a b la s t’ and ‘ a n e x p lo s io n ’ is th e sa m e th in g .
S in ce a n ‘ e x p lo s io n ’ is k n o w n (it w a s in tr o d u c e d in th e p r e v io u s s e n te n c e ), it is free o f c h a r g e t o re s o lv e th e s e c o n d p r e d ic a te in th e a n te ce d e n t o f th e a x io m a g a in s t th is k n o w n fa c t . T h e first p r e d ic a te in th e a n teced en t h as b e e n assign ed su ch a lo w a s s u m a b ility c o s t ( 0 .3 ), th a t p r o v in g ‘ b la s t’ b y u se o f th e a x io m is c h e a p e r th a n t o a ssu m e its e x is te n ce .
5
Extracting Specific Information from
the Texts
T h e lo g ic a l fo r m e n c a p s u la tin g th e in te r p r e ta tio n fo u n d fo r a t e x t, i.e. th e o u tp u t fr o m th e in te r p r e ta tio n c o m p o n e n t , is th e in p u t t o th e ta sk s p e c ific a n alysis c o m p o n e n t . T h e a n a ly sis is p e r fo r m e d o n th e b a sis o f th e lo g ic a l fo r m a n d a ‘ ta sk s c h e m a s p e c if ic a t io n ’ g iv e n t o th e th e o r e m p r o v e r .
5.1 T h e Schem a
L e t us h ere c o n s id e r a sim p lifie d e x a m p le o f th e k in d o f ev en t rela ted sp ecific in fo r m a tio n w e w o u ld lik e th e s y s te m t o c o m p u t e . F o r a g iv en te x t d e s c r ib in g a te r r o r is t e v e n t, w e w o u ld lik e t o fin d an sw ers ( i f a n y ) t o ‘ q u e s tio n s ’ su ch as th e fo llo w in g :
INCIDENT TYPE:
TARGET TYPE:
TARGET NATIONALITY:
INCIDENT CITY:
INCIDENT COUNTRY:
RESPONSIBLE ORGANIZATION:
etc.
en tries, th e ajisw ers fo u n d a re p r in te d o u t o n th e scre e n . T h e s lo t s in th e ‘ r e c o r d ’ a re filled b y th e valu es fo u n d fo r v a ria b les w h en p re s e n tin g th e th e o r e m p r o v e r w ith g oa ls t o b e a b d u c t iv e ly p r o v e n b y u sin g th e in fo r m a tio n fr o m th e te x t in terp reted a n d th e fa c ts in th e k n o w le d g e b a s e .
T h e g o a ls o f th e s c h e m a a p p e a r as th e c o n s e q u e n t in w h a t m ig h t in fo r m a lly b e ca lle d th e ‘ lin k in g a x io m s ’ in th e a p p lic a tio n ta sk s p e c ific p a rt o f th e k n o w l e d g eb a se. L in k in g a x io m s c a n b e th o u g h t o f as g u id e lin e s fo r h o w t o fin d an sw ers t o th e ‘ q u e s tio n s ’ p o s e d b y w a y o f th e s c h e m a s p e c ific a tio n .
T h e s c h e m a it s e lf is a m e ta lo g ic a l L IS P e x p re s s io n in a fir s t-o r d e r p r e d ic a te ca lcu lu s fo r m a n n o t a te d b y n o n -lo g ic a l o p e r a to r s fo r sea rch c o n t r o l a n d r e s o u r c e b o u n d s . T h e t w o n o n -lo g ic a l o p e r a to r s a re ‘ p r o v in g ’ a n d ‘e n u m e r a te d -fo r -a ll’ . W it h o u t g o in g in to te ch n ica l d eta ils a b o u t th ese t w o o p e r a t o r s ( f o r m o r e d e ta ils , see T y s o n a n d H o b b s 1 9 8 8 ), let us s im p ly p resen t a sm a ll e x c e r p t fr o m th e s ch em a fo r th e a b o v e e x a m p le ‘ r e c o r d ’ , a n d meike s o m e e x p la n a t o r y c o m m e n t s in o r d e r t o c o n v e y th e b a s ic in tu itio n s o f th e p r o c e s s t o th e rea d er:
(proving
(enumerated-for-all ((el . ev))
(proving
(some ((it . nev)) (incident-type el it))
(terror-limits default-time)
print-incident)
(and
(enumerated-for-all ((it . nev))
(proving
(incident-type el it)
(terror-limits default-time)
print-incident-type)
:true)
(enumerated-for-all ((ro . nev))
(proving
(responsible-organization el ro)
(terror-limits default-time)
print-responsible-organization)
:true)
(terror-limits default-time)
print-sentence-finished)))
T h e lin k in g a x io m in th e k n o w le d g e b a s e fo r ‘ re s p o n s ib le o r g a n iz a t io n ’ c o u ld b e th e fo llo w in g sta te m e n t:
(5) (Defaziom RESP-ORG-1 (terror)
'‘
The organization responsible for the attack’’
(ALL ((el* . ev) (e . ev) (e2* . ev) (o . nev) (e3* . ev))
(IMPLY (AHD (terattack! el* e) (responsible! e2* o e)
T h u s , w e fin d th e o r g a n iz a tio n ( o ) re s p o n s ib le fo r an a tta c k (e ) b y p ro v in g th a t e is a te rro ris t a tt a c k , th a t o is a te rro rist o r g a n iz a tio n , a n d th a t o is r e s p o n s ib le f o r e.
C o n t r a r y t o th e p r a g m a tic s m o d u le , n o a s su m p tio n s a re m a d e in th e task s p e c ific a n a ly sis p h a se w h e n tr y in g t o p r o v e th e g o a ls o f th e s ch em a ; th is step is m e a n t t o e x t r a c t in fo r m a tio n o n ly . H o w ev er, th e p r o c e s s is still b a ck -ch a in in g c o n tr o lle d a b d u c t iv e in fe r e n c in g . T h is m ea n s th a t e v e r y th in g has t o b e p rov ed a g a in st th e k n o w le d g e in th e d a ta b a s e in c o n ju n c tio n w ith th e in te rp re ta tio n o f th e te x t.
P r o v in g th e a n te c e d e n ts o f th e lin k in g a x io m s m a y o f c o u r s e in v o lv e resolv in g th e n e w g o a ls w ith k n o w le d g e a sse rte d in th e te x t o r in th is c a s e , p r o v in g fu rth er a x io m s in th e k n o w le d g e b a s e .
T h e r e m a y a ls o b e d ifferen t a x io m s fo r th e sa m e g o a l, in d ic a t in g th a t a g oa l c a n b e e x p la in e d , o r m o r e c o r r e c t ly p r o v e d , in differen t w a y s. A c tu a lly , this is o n ly a r e fle c tio n o f th e fa c t th a t a g iv e n p h e n o m e n a ca n b e b r o u g h t a b o u t in d ifferen t w a y s. F o r e x a m p le , th e re a re a c tu a lly th ree d ifferen t a x io m s fo r ‘ r e s p o n s ib le ’ in o u r k n o w le d g e b a s e .
5.2
T h e In form ation E x tracted from the Interpretation
R esu lt
L e t us n o w re tu rn t o o u r e x a m p le t e x t. F or illu stra tio n , w e first sh o w an e x c e r p t fr o m th e resu lt o f th e in te r p r e ta tio n o f th e sen ten ces in e x te rn a l fo r m a t (6 ) — n o te th e r e so lv e d c o m p o u n d in g rela tio n sh ip ; a n d th en th e p r in t-o u t o f th e in fo r m a tio n a u to m a t ic a lly e x t r a c t e d b y th e a n a ly z e c o m p o n e n t fr o m th e in te r p r e t a tio n (7 ) o f th e t w o e x a m p le sen ten ces.
(6)
INTERPRETATION 1 OF SENTENCE:
C o s t : 34
New emd Assumed Information:
xl:
bomb!(e2, xl)
yl:
explode!(e4, yl, xl)
xl2:
bilbaol(el3, xl2)
x8:
renault!(e9, x8)
x6:
showroom!(e7, x6)
in!(ell, x6, xl2)
e4:
at!(e5, e4, x6)
past!(el5, e4)
Given or Inferred Information:
x 8 :
renault!(e9, x8)
nn!(elO, x8, x6)
own!(e25, x8, x6)
firm!(o26, x8)
(7)
INCIDENT TYPE: explosion
TARGET TYPE: commercial
TARGET NATIONALITY: french
INCIDENT CITY: bilbao
INCIDENT COUNTRY: Spain
PROPERTY DAMAGE: <unknown>
WARNING: yes
METHOD: phone
RESPONSIBLE ORGANIZATION: eta-m
6
Final Remarks
T A C I T U S o ffers an in te re s tin g fr a m e w o r k fo r e x p e r im e n tin g w ith k n o w le d g e - b a sed n a tu ra l la n g u a g e p r o c e s s in g , a n d in fa c t it is a q u ite s o p h is tic a te d s y s te m . P re v io u sly , th e T A C I T U S te a m a t S R I h as b e e n e x p e r im e n tin g w ith im p le m e n ta tio n s o f k n o w le d g e b a s e s fo r d o m a in s su ch as th e b r e a k -d o w n o r m a lfu n c t io n in g o f m e ch a n ica l p a rts in sh ip s (H o b b s 1 9 8 7 ). C o n s t r u c t in g a k n o w le d g e b a s e fo r th e te rro rist a tta c k d o m a in w as th e first a t t e m p t t o d e a l w ith a s lig h tly less r e s tr ic t ed s u b je c t field in th e T A C I T U S s y s te m . T h e m a in c o n c lu s io n t o b e d ra w n fr o m th e e x p e rim e n t w ith th e te rro ris t te x t s is th a t v e r y ca r e fu l a x io m a tiz a tio n o f th e f
2
w;ts is n e cessa ry in o r d e r t o a ch ie v e g o o d resu lts, i.e. ‘ n u ts a n d b o lt s ’ h a v e t o b e ca re fu lly fitte d to g e th e r t o c r e a te ‘ d e lu s io n s o f g r a n d e u r ’ .Ack nowledgements:
T h e D a n ish C a r ls b e r g F o u n d a tio n p r o v id e d th e fin a n cia l s u p p o r t fo r m y s ta y at S R I In te rn a tio n a l. C o n s t r u c t in g a n d te s tin g th e d o m a in s p e c ific k n o w le d g e b a s e fo r te rro rist te x ts in th e T A C I T U S s y s te m d e s c r ib e d h ere, w a s s u g g e s te d t o m e b y J e rry H o b b s a n d ca rrie d o u t u n d e r his s u p e r v is io n . I a m in d e b te d t o J e rry fo r his g u id a n c e a n d m a n y u sefu l h in ts. N eed less t o say, i f th e p resen t p a p e r co n ta in s e rrors o r m is c o n c e p tio n s in th e p re s e n ta tio n o f T A C I T U S , th e a u th o r a lon e c a n b e b la m e d .
References
Eco, U. 1979. Lector in Fabula. Milan.
Grice, H.P. 1975. Logic and Conversation. R. Schank and B. Nash-Webber [Eds.],
Theoretical Issues in Natural Language Processingl69-174. Cambridge, Mass.
Grosz, B., N. Haas, G. Hendrix, J. Hobbs, P. Martin, R. Moore, J. Robinson, and S. Rosenschein. 1982. D IALOG IC: A Core Natural-Language System. SRI Tech. Note 270. SRI, Menlo Park, California.
Grosz, B., D.E. Appelt, P.A. Martin, and F.N.C. Pereira. 1987. TE A M : An Experi ment in the Design o f Transportable Natured-Language Interfaces. Artificial In
telligence, 32:173-243.
Hayes, P.J. 1985. The Second Naive Physics Manifesto. J.R. Hobbs and R.C. Moore [Eds.], Formal Theories o f the Commonsense W orld:l-36. Ablex, New Jersey.
Hirschman, L. 1986. Discovering Sublanguage Structures. R. Grishman and R. Kit- tredge [Eds.], Analyzing Language in Restricted Domains: Sublanguage Descrip
tion and Processing211-234. Erlbaum, New Jersey.
Hobbs, J.R. 1978. Coherence and Coreference. SRI Tech. Note 168. SRI, Menlo Park, California.
Hobbs, J.R. 1984. Sublanguage and Knowledge. SRI Tech. Note 329. SRI, Menlo Park, California.
Hobbs, J.R. 1985a. Granularity. In: Proceedings o f IJ C A I-85:l-4.
Hobbs, J.R. 1985b. Ontological Promiscuity. In: Proceedings o f A C L -85:6l-69. Univer sity o f Chicago, Illinois.
Hobbs, J.R. 1986a. Commonsense Metaphysics and Lexical Semantics. SRI Tech. Note 392. SRI, Menlo Park, California.
Hobbs, J.R. 1986b. Discourse and Inference. Ms. SRI, Menlo Park, California.
Hobbs, J.R. 1986c. Overview o f the TACITUS Project. Computational Linguistics,
12
:220
-222
.Hobbs, J.R. 1987. Local Pragmatics. SRI Tech. Note 429. SRI, Menlo Park, California.
Hobbs, J.R., W . Croft, T . Davies, D. Edwards, and K. Laws. 1988. The TACITUS Commonsense Knowledge Base. Ms. SRI, Menlo Park, California.
Hobbs, J.R., M. Stickel, P. Martin, and D. Edwards. 1989. Interpretation as Abduction. Ms. SRI, Menlo Park, California.
Levi, J. 1978. The Syntax and Semantics o f Complex Nominals. Academic Press, New York.
Stickel, M.E. 1982. A Nonclausal Connection-Graph Resolution Theorem-Proving Pro gram. Proceedings o f the A A A I-82 National Conference on Artifical Intelligence: 229-233. Pittsburgh, Pennsylvania.
Stickel, M.E. 1988. A Prolog-like Inference System for Computing Minimum Cost Ab- ductive Expl2inations in Natural Language Interpretation. Proceedings o f ICCSC 58:343-350. Hong Kong.
Tyson, M. and J.R. Hobbs. 1988. Domain-Independent Task Specification in the TAC ITUS Natural Language System. Ms. SRI, Menlo Park, California.