• No results found

The Design and Application of a Domain Specific Knowledgebase in the TACITUS Text Understanding System

N/A
N/A
Protected

Academic year: 2020

Share "The Design and Application of a Domain Specific Knowledgebase in the TACITUS Text Understanding System"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

The Design and Application

of a Domain Specific Knowledgebase

in the TACITUS

Text Understanding System

Abstract

TACITUS is a text understanding system being developed at SRI In­ ternational. One o f the main components in the system is a knowledge­ base which contains commonsense and domain specific world knowledge encoded as axioms in a first order predicate calculus language. The prime function o f the knowledgebase is to provide extra-linguistic facts to be used in the resolution o f a range o f ambiguities such as compound nom­ inal constructions, definite reference, and in drawing conclusions on the basis o f the implicatures in the text. The paper discusses the methodology used in building a knowledgebase for analyzing news reports about terror­ ist attacks, and demonstrates how it is used in an application extracting information to be stored in a simulated database.

1

Preamble

D u r in g m y te r m as In te r n a tio n a l F e llo w a t S R I In te rn a tio n a l, C a lifo r n ia , th is p a s t w in te r, I h a d th e o p p o r t u n it y t o fa m ilia riz e m y s e lf w ith th e T A C I T U S t e x t u n d e r s ta n d in g s y s te m . U n d e r th e s u p e r v is io n o f J erry H o b b s , w h o is h ead o f th e T A C I T U S p r o je c t , I d e v e lo p e d a d o m a in s p e c ific k n o w le d g e b a s e f o r th e T A C I T U S s y s te m . T h e p re se n t p a p e r is a b r ie f a n d fa irly h ig h -le v e l a n d n o n ­ te c h n ic a l o v e r v ie w o f th e e n te rp rise .

S e c tio n 2 o f th e p a p e r p re se n ts th e m e t h o d o lo g y u sed in th e c o n s tr u c tio n o f th e k n o w le d g e b a s e fo r n ew s r e p o r ts a b o u t te r r o r is t a tta ck s ; a c r u d e o u tlin e o f th e T A C I T U S s y s te m is g iv e n in s e c tio n 3 as n e ce ssa ry b a c k g r o u n d in fo rm a tio n b e fo r e w e g o o n t o lo o k in g in d e ta il a t a n e x a m p le te x t in se ctio n s 4 a n d 5. W e c o n c lu d e w ith s o m e fin a l re m a rk s in s e c tio n 6.

(2)

2

The Methodology behind the Construction

of the Knowledgebase

O u r g o a l w as t o b u ild a fa ir ly la rg e k n o w le d g e b a s e fo r a s p e c ific d o m a in , n a m e ly terrorist a tta ck s , t o b e u sed as a b a sis fo r a u to m a t e d u n d e r s ta n d in g o f te x ts fa llin g w ith in th is d o m a in , a n d su b se q u e n t a u to m a t ic e x t r a c t io n o f s p e c ific in­ fo r m a tio n . W e d e c id e d t o w o r k o n th e b a sis o f a set o f s a m p le te x ts , a n d w e c o m p ile d a c o r p u s c o n s is tin g o f s ev era l n ew s r e p o r ts a b o u t te rro ris t e v en ts. T h is c o r p u s th en c o n s tit u t e d th e b a c k b o n e in o u r w ork .

R a th e r th a n a d o p t w h a t m ig h t b e te r m e d a s tr ic t s u b la n g u a g e a p p r o a c h t o th e d e s c r ip tiv e ta sk (c f. H irsch m a n 1986, a n d H o b b s 1 984 f o r m o r e d e ta ile d d is ­ cu s s io n s ), w e e m p lo y e d a m e t h o d o lo g y o f ste p w ise refin em en t (c f. H o b b s 1 9 8 4 ). T h e th ree s te p s o f o u r w o r k in g m e t h o d o lo g y , w h ich w ill b e e la b o r a t e d o n b e lo w , c o n s is te d in:

• A n (in fo r m a l) a n a ly sis o f th e c o r p u s te x ts in o r d e r t o esta b lish a b a s ic v o c a b u la r y , d e te rm in e a n d s e le ct relev a n t fa c ts fo r th e d o m a in .

• B re a k in g u p th e d o m a in in to s e lf-c o n ta in e d a n d co h e r e n t s u b -d o m a in s .

• A x io m a tiz in g th e fa c ts o f th e s u b d o m a in s .

2.1

T h e A nalysis o f C orpus T exts

F irstly , th e c o r p u s te x ts serv ed th e p u r p o s e o f e s ta b lis h in g th e b a s ic v o c a b u la r y in o u r sy stem . S e co n d ly , th e y c o n s tit u t e d a p ic t u r e o f th e w o r ld w e in te n d e d t o m o d e l in o u r k n o w le d g e b a s e , i.e. w h a t a re th e s e ttin g s , w h a t a re th e ty p ic a l a c tio n s , w h o a re th e a g en ts, w h a t a re th e ro le s a n d r e la tio n s b e tw e e n th e en titie s in o u r ‘ te r r o r is t’ u n iverse, e tc . T h u s th e y in d ic a te d w h a t lin g u is tic a n d e x tr a - lin g u istic in fo r m a tio n w o u ld b e n e e d e d in o u r k n o w le d g e b a s e .

U sin g a fu ll-s e n te n ce c o n c o r d a n c e o f th e s a m p le te x ts , w e lo o k e d a t ea ch sin gle le x ic a l ite m in c o n t e x t , a n d n o te d d o w n , in a n in fo r m a l m a n n e r, w h a t fa cts w ere lin g u is tic a lly p r e s u p p o s e d a n d w h a t g e n era l b a c k g r o u n d k n o w le d g e w o u ld b e n e e d e d in o r d e r t o u n d e rs ta n d a g iv e n o c c u r r e n c e o f a le x ic a l ite m in its c o n te x t. (W e w ill n o t d iscu ss th e m e a n in g o f ‘ u n d e r s t a n d ’ h ere, b u t w e u se it in a sen se sim ila r t o th a t o f E c o ’s te r m ‘ a c tu a lis a tio n ’ ( E c o 1 9 7 9 )).

T h e an a ly sis resu lts in a first b r e a k in g d o w n o f e a c h ite m in to c o m p o n e n t p a rts a n d e x p lic it s ta te m e n ts a b o u t th e im p lic a tu r e s (G r ic e 1 9 7 5 ) c a rrie d b y th e te x t.

2.2

Structuring th e D om ain Inform ation

(3)

S o r tin g fa c t s in to s u b -d o m a in s is g e n e r a lly a s tra ig h tfo rw a rd p ro ce s s . T h e first c r u d e d is t in c tio n w h ich ca n b e m a d e , is th a t b e tw e e n fa c ts p e rta in in g to c o m m o n s e n s e k n o w le d g e a n d d o m a in s p e c ific o r sp e cia liz e d k n ow led g e. T h e fo r ­ m e r is fa c t s a b o u t th e w o r ld in g e n era l a n d n o t p a r tic u la r ly tied t o a sp e c ific d o ­ m a in ( b e it te r r o r is t a c tio n s , in fo r m a tio n te ch n o lo g y , o r w h a t h a v e y o u ), w h ereas th e la t te r ch a r a c te r iz e s th e fa c t s w h ich a re q u ite o fte n fo u n d t o b e re stricte d a n d h ig h ly s p e c ia liz e d .

F a cts p e r ta in in g t o fo r e x a m p le s p a c e , tim e, a n d b e lie f a re co n s id e re d c o m ­ m o n s e n s e k n o w le d g e , w h e re a s v a rio u s fa c t s a b o u t te r r o r is t o rg a n iz a tio n s are c le a r ly d o m a in s p e c ific , a n d essen tia l fo r th e u n d e rs ta n d in g o f r e p o r ts a b o u t te r r o r is t e v en ts. G e o g r a p h ic a l fa c t s a b o u t th e lo c a t io n o f citie s a n d cou n tries se e m t o fa ll s o m e w h e r e b e tw e e n th e m o r e a b s tr a c t c o m m o n s e n s e n o tio n a n d th e s p e c ia liz e d d o m a in k n o w le d g e .

O n th e b a sis o f th e resu lts fr o m o u r fa c t-fin d in g , i.e. s te p o n e a b o v e , w e d e fin e d 3 0 s u b -d o m a in s . T h e o v e ra ll c o n c e p t u a l s tru ctu re fo r th e k n o w le d g e b a se, th e s u b -d o m a in s a n d th e r e la tio n s b e tw e e n th e m , ca n b e s c h e m a tic a lly ren dered b y th e illu s tr a tio n in fig u re 1.

A p a r t fr o m p r o v id in g c o n c e p t u a l cla rity , th e a d v a n ta g e o f th is m o d u la r a p ­ p r o a c h is o b v io u s ly th a t it p e r m its y o u t o la ter en h a n ce o r m o d ify th e su b - d o m a in s in th e k n o w le d g e b a s e in d e p e n d e n tly o f ea ch o th e r.

2 .3

A x io m a tiza tio n o f th e Facts

T h e fin a l s te p in th e c o n s tr u c tio n o f th e k n o w le d g e b a s e c o n s is te d in cre a tin g p r e c is e o n t o lo g ie s fo r th e in d iv id u a l s u b -d o m a in s , i.e. w h a t en tities e x is t an d w h a t a re th e re la tio n s b e tw e e n th e m , a n d a x io m a tiz in g th e fa cts.

T h e m a in ta sk h ere w a s t o d e c id e o n w h ich p r e d ic a te s t o d e c o m p o s e , i.e. c h a r a c te r iz e b y o t h e r o r n ew p r e d ic a te s , a n d w h ich w e re t o b e b a s ic p re d ica te s, i.e. g r o u n d te rm s fo r w h ich n o fu rth e r d e s c r ip t io n is p r o v id e d .

T h e id e a b e h in d th e a d o p t e d a p p r o a c h is n e ith e r t o fu lly d efin e ea ch lex ica l ite m in th e sen se o f p r o v id in g n e ce ssa ry a n d su fficien t c o n d itio n s , n o r t o d e c o m ­ p o s e it in to a p re d e fin e d set p rim itiv e s in th e S ch a n k ia n tr a d itio n . R a th e r , th e p u r p o s e w a s t o c h a r a c te r iz e th e p r e d ic a te s u sed in th e k n o w le d g e -b a s e . C o n s id e r as a n e x a m p le th e fo llo w in g a x io m s fr o m th e ‘o r g a n iz a tio n ’ s u b -d o m a in .

o r g a n iz a tio n ( o ) - > E s ( V x . x G s - > p e r s o n ( x ) & m e m b e r ( x , o ) ) & E p ,g p la n ( p ,g ,o )

m e m b e r ( x ,o ) - > E e. r o le ( e ,x ,o )

r o le ( e ,x ,o ) < - a g e n t (x ,e ) & in .s e r v ic e jD f ( e ,g ,p ) & p la n ( p ,g ,o )

(4)

bomb.attack kidnap.attack

I

terrorist.organization

I

I

I

country

I

ethnicity

I— organization —I

1

police

I

-I

city

-I

publish—

I---firm

plansftgoeds

--- communication

I

property

I

I

daily_life

I

I

belief

1

possession

person.move

injury

----I

I

process

I

1

time

[image:4.595.122.480.149.730.2]

I

I

scale—

I I I

I set I

numbers |

-bomb_structure

I

— phys_obj

I

1

I

I-normal

I

I

I geography

1

I

space —

I I

modality

I-■— predicates

(5)

h a s a r o le , w h ich is b e in g th e a g en t o f s o m e a c tio n w h ich is in se rv ice o f th e plan o f th e o r g a n iz a tio n .

3

The Knowledgebase and the TACITU S

System

T o te st o u r k n o w le d g e b a s e , w e im p le m e n te d a su b set (a p p . 1 0 0 ) o f th e a x io m s w e h a d d e fin e d o n th e s y s te m , a n d ran d ifferen t ty p e s o f se n te n ce s. T h e a x io m s are s ta te d in th e ‘ o n t o lo g ic a l p r o m is c u o u s ’ n o ta tio n d e v e lo p e d b y H o b b s (c f. H o b b s 1 9 8 5 b ).

T h is n o ta t io n is a first o r d e r p r e d ic a te ca lc u lu s la n g u a g e w ith th e a d d itio n o f a n o m in a liz a tio n o p e r a t o r , w r itte n ‘ !’ , a n d an e x tr a a rg u m e n t, in fo r m a lly referred t o as th e ‘ s e lf ’ a rg u m e n t.

T o b e m o r e c o n c r e t e a n d t o c o n v e y th e b a s ic in tu ition o f th e n o ta tio n t o th e r e a d e r, le t us c o n s id e r a s im p le e x a m p le :

e x p lo d e ( b ) w h ich is t o b e rea d as: b e x p lo d e s

e x p l o d e ! ( e l ’", b ) w h ich is t o b e rea d as: th e e x p lo s io n o f b.

W h e r e p ( x ) sa y s th a t p is tru e o f x , p !( e ,x ) says th a t e is th e e v e n tu a lity o r p o s s ib le s itu a tio n o f p b e in g tru e o f x . C o n se q u e n tly , H o b b s ’ n o ta tio n c a n b e re la te d t o s ta n d a r d first o r d e r p r e d ic a te ex p ression s b y th e fo llo w in g a x io m :

( V x ) p ( x ) < = > (E e ) p !( e ,x ) & R e x is ts (e )

w h e re R e x is t s ( e ) sa y s th a t th e e v e n tu a lity ‘e ’ d o e s in fa c t re a lly ex ist.

In s u m , th e b a s ic id e a o f th e n o ta tio n is th a t o f s p littin g a sen te n ce in to its p r o p o s it io n a l c o n te n t a n d a n a s s e r tio n a l/e x is te n tia l c la im . F u rth e rm o re , th e self a r g u m e n t, i.e. th e ‘ e ’ , p r o v id e s a ‘ h a n d le ’ fo r referrin g t o a p r e d ic a tio n , i.e. a p r e d ic a te a n d its a r g u m e n t, in o th e r p r e d ic a te s .

B e fo r e w e g o o n t o d is cu s s in g a s a m p le t e x t, w e w ill g iv e a c ru d e o v e rv ie w o f th e b a s ic c o m p o n e n t s a n d th e ir fu n c tio n in g in th e T A C I T U S sy ste m . W e d e lib e r a te ly ig n o r e s o m e o f th e m o r e a d v a n c e d fea tu res o f T A C I T U S in o r d e r n o t t o g e t b o g g e d d o w n b y t o o m a n y te c h n ic a l d eta ils. U n fo rtu n a te ly , th is m ea n s th a t w e d o n o t d o T A C I T U S fu ll ju s t ic e (b u t f o r m o r e d e ta ile d a n d co m p re h e n s iv e d e s c r ip t io n s o f th e s y s te m , see fo r e x a m p le H o b b s 1 9 8 6 c a n d la te r).

(6)

th e p ra g m a tics m o d u le is t o re so lv e referen tia l e x p r e s s io n s a n d s o m e s y n t a c t ic a m b ig u ities, t o e x p a n d m e to n y m ie s , a n d t o in te rp re t th e im p lic it r e la tio n s in c o m p o u n d n om in a ls. T h e p r a g m a tic s m o d u le w o rk s b y c o n s t r u c t in g a lo g ic a l ex p ression fo r th e b a s ic se m a n tic a n a ly sis resu lt, a n d c a llin g th e K A D S th e o r e m p ro v e r (S tick el 1 9 8 2 ) t o p r o v e o r d e riv e it u sin g a s ch e m e o f a b d u c t iv e in fe r e n c in g in w h ich it is p e r m itte d t o a ssu m e th e e x is te n c e o f ‘ n e w ’ fa c ts . T h e th e o r e m p ro v e r d ra w s o n th e k n o w le d g e b a s e o f c o m m o n s e n s e a n d d o m a in k n o w le d g e t o c o m p le te th e ta sk .

A b d u c t iv e in fe re n ce is, o f c o u r s e , a lo g ic a lly in v a lid m o d e o f in fe re n ce , i.e. g iv en p ( X ) —► q ( X ) a n d q ( a ) w e c o n c lu d e p ( a ) . H o w e v e r, w e m a y a rg u e , as d o e s H o b b s (c f. H o b b s e t al. 1 9 8 8 ), th a t it is a r e a so n a b le w a y o f lo o k in g a t te x t u n d e rsta n d in g b e c a u s e a b d u c t io n is in fe re n ce t o th e ‘ b e s t e x p la n a t io n ’ in a g iv e n c o n te x t. q ( a ) c a n b e th o u g h t o f as th e o b s e r v e r a b le e v id e n c e , th e im p lic a t io n as th e gen era l p r in c ip le th a t c o u ld e x p la in th e o c c u r e n c e o f q ( a ) , a n d th e a n te c e d e n t o f th e im p lic a t io n as th e u n d e rly in g ca u s e o r e x p la n a tio n o f q ( a ) .

A n in terestin g fe a tu re o f th e p r a g m a tic s m o d u le is th a t it u ses a s ch e m e fo r a b d u c tiv e in fe re n cin g in w h ich w eig h ts a n d c o s ts a re a ssig n e d t o th e a x io m s (fo r fu rth er d eta ils, see e .g . S tick el 1 9 8 8 ). T h u s i f w e c a n n o t p r o v e an a n te c e d e n t, w e assu m e its e x is te n c e a t s o m e c o s t. S o m e b a s ic h e u r is tic p r in c ip le s c o n tr o llin g th e w eig h ts a n d a ssu m a b ility c o s ts a re h a rd w ire d in to th e s y s te m (e .g . it is m o r e e x p e n siv e t o a ssu m e a fa c t th a n t o p r o v e it, a n d it is less e x p e n s iv e t o a ssu m e an in d efin ite e n tity th a n a d e fin ite o n e ), b u t th e a x io m s in th e k n o w le d g e b a s e m a y b e a ssign ed co s ts m a n u a lly (c f. 4 .2 ). T h e in te r p r e ta tio n o f a t e x t in th is a b d u c tiv e a n d a s s u m p tio n -b a s e d fr a m e w o r k , a m o u n ts t o p r o d u c in g th e m in im a l e x p la n a tio n o f w h y th e te x t w o u ld b e tr u e (c f. H o b b s et al. 1988 fo r a d e ta ile d d iscu ssio n ).

T h e an a ly sis c o m p o n e n t, i.e th e c o m p o n e n t fo r e x tr a c tin g ta sk s p e c ific in­ fo r m a tio n fr o m a n in te rp re te d t e x t , is b a s ic a lly a s p e c ia liz e d c a ll t o th e th e o r e m p ro v e r (see fu rth e r b e lo w ). T h e e n h a n c e d lo g ic a l fo r m , i.e. th e resu lt o u t p u t fr o m th e p r a g m a tic s m o d u le , is a b d u c t iv e ly p r o v e d b y b a c k -c h a in in g o v e r th e a x io m s in th e k n o w le d g e b a se .

In th e n e x t s e c tio n s , w e w ill h a v e a lo o k a t a n e x a m p le te x t a n d s h o w h o w th e k n o w le d g e b a s e is u sed fo r d is a m b ig u a tio n a n d c o m p u t a t io n o f im p lic it in fo r ­ m a tion .

4

An Example

L et us n o w co n s id e r th e fo llo w in g tw o s en ten ces as an e x a m p le te x t t o b e tr e a te d w ith in o u r fra m e w o rk :

(1 ) A b o m b e x p lo d e d a t a R e n a u lt s h o w r o o m in B ilb a o . A p e r s o n c la im in g t o represen t th e E T A - M h a d w a rn e d o f th e b la st in a c a ll t o th e p o lic e .

(7)

T h e e x tr a -lin g u is tic k n o w le d g e n e e d e d in o r d e r t o a ch ie v e s o m e rea son a b le le v e l o f u n d e r s ta n d in g o f th e te x t is a m o n g o th e r th in g s; R e n a u lt is a F rench firm m a n u fa c tu r in g p r o d u c t s , i.e. ca rs , a s h o w r o o m is a b u ild in g o w n e d b y a firm w h e r e th e p r o d u c t s o f th a t firm a re o n d isp la y, B ilb a o is a c it y in th e c o u n t r y S p a in , E T A - M is a te r r o r is t o r g a n iz a tio n , a n d te rro rist org a n iz a tio n s h a v e m e m b e r s , ce rta in p la n s a n d g o a ls a n d v io le n t m e t h o d s fo r rea ch in g th eir g o a ls , a n d a n e x p lo s io n g e n e r a lly in v o lv e s a blast.

T h e b a s ic fa c t s su ch as fo r in s ta n c e S p a in b e in g a c o u n t r y a n d E T A -M b e in g a te r r o r is t o r g a n iz a tio n , a re e n c o d e d as e x isten tia l a x io m s in th e k n ow led g eb a se. E .g :

(la) (Defaxiom COUNTRY-SPAIN-1 (terror)

‘‘

Spain is a country’’

((SOME ((el* . ev) (country! el* Spain)))

(lb) (Defaxiom TERORG-ETA-M-1 (terror)

“ETA-M is a terrorist orgemization’

((SOME ((el* . ev) (terorgi el* eta-m)))

T h e q u a n tifie d v a ria b les in th e a x io m s a re m a rk ed fo r th eir t y p e su ch th a t ‘e v ’ d e n o te s ev e n t a n d ‘ n e v ’ n o n -e v e n t v a ria b les.

4 .1

A x io m s for D isam biguating C om pou n d Nomined

Constructions

F r o m th e lin g u is tic p o in t o f v ie w , th e T A C I T U S fr a m e w o r k offers in terestin g p o s s ib ilitie s fo r d is a m b ig u a tin g c o m p o u n d n o m in a l e x p r e s s io n s u sin g lin g u istic as w ell as e x tr a -lin g u is tic k n o w le d g e .

T h e in d iv id u a l n o u n s in a c o m p o u n d n o m in a l c o n s tr u c tio n a re a n a ly z e d as a rg u m e n ts o f th e g e n e r ic ‘ n n ’-p r e d ic a t e . T h a t is, th e e x p re s s io n ‘ R e n a u lt sh ow ­ r o o m ’ , w o u ld a p p e a r as n n ( e l ’'‘ , R e n a u l t , S h o w r o o m ) in th e in itia l lo g ic a l fo rm o f th e s e n te n ce p r o d u c e d as o u t p u t fr o m th e p a rsin g m o d u le .

In fo r m u la tin g th e a x io m s fo r re s o lv in g su ch n n -re la tio n s , w e a d o p te d a stra t­ e g y c o m b in in g th e lin e o f a n a ly sis fo r c o u m p o u n d n o m in a ls p r o p o s e d b y D o w n ­ in g (1 9 7 7 ), a n d th a t a d v o c a t e d b y L e v i (1 9 7 8 ). In su m m a ry . D o w n in g argu es th a t th e se m a n tic re la tio n s h ip b e tw e e n th e elem en ts o f a c o u m p o u n d c a n n o t b e c h a r a c te r iz e d in te r m s o f a fin ite list o f a p p r o p r ia te c o m p o u n d in g rela tion sh ip s, w h e re a s L e v i tries t o e s ta b lis h su ch a list fo r th e m o s t c o m m o n ca ses o n th e b a sis o f th e tr a n s fo r m a tio n a l re la tio n s h ip b etw e e n th e elem en ts.

(8)

(2a) (Defaziom NN-1 (terror)

“An nn-relation: for”

(ALL ((el* . ev) (p . nev) (s . nev))

(IMPLY (for! el* s p)

(SOME ((e2* . ev))

(nn! e2* p s)))))

(2b) (Defaziom NN-2 (terror)

“An nn-relation: of”

(ALL ((el* . ev) (f . nev) (s . nev))

(IMPLY (of! el* s f)

(SOME ((e2* . ev))

(nn! e2* f s)))))

(3a) (Defaziom FOR-1 (terror)

“A shovroom is for products’’

(ALL ((e2* . ev) (s . nev) (e3* . ev) (p . nev) (e4* . ev) (f . nev))

(IMPLY (AHD (shovroom! e2* s) (product! e3* p) (firm! e4* f))

(SOME ((el* . ev))

(for! el* s p)))))

(3b) (Defaziom OF-1 (terror)

“A shovroom is ovned by a firm’’

(ALL ((o2* . ev) (s . nev) (e3*. ev) (e4* . ev) (f . nev))

(IMPLY (AND (shovroom! e2* s) (ovn! e3* f s) (firm! e4* f))

(SOME ((el* . ev))

(of! el* s f)))))

In tr y in g t o a b d u c tiv e ly p r o v e a relev a n t lo g ic a l fo r m o u t p u t fr o m th e p a rsin g m o d u le a n d t o m a k e im p lic it in fo r m a tio n e x p lic it , th e p r a g m a tic s m o d u le haa th e th e o r e m p ro v e r b a c k -c h a in o v e r th e a x io m s in th e k n o w le d g e b a s e . T h u s an n n -re la tio n as th e a b o v e is r e so lv e d a g a in st 2 a a n d 2 b , th e n th e n ew g o a ls , o f ! ( e l * s f ) a n d f o r ! ( e l * s f ) , a re r e s o lv e d a g a in st 3 a a n d 3 b re s p e ctiv e ly , y ie ld in g n ew g o a ls t o b e re solv ed .

4.2

A x io m s for R esolving R eferring Expressions

A s m e n tio n e d a b o v e , o n e o f th e b a s ic h e u ristic a s s u m p tio n h a rd w ire d in to T A ­ C I T U S ’ p r a g m a tic s m o d u le is th a t an in d e fin ite n o u n p h ra se in tr o d u c e s n ew in fo rm a tio n a n d a d e fin ite n o u n p h ra se refers t o a k n o w n en tity , i.e. s o m e th in g w h ich is eith er in th e k n o w le d g e b a s e o r h a s b e e n in tr o d u c e d in th e p r e v io u s ly p r o c e s s e d te x t. H e n ce th e c o s t o f a ssu m in g an in d e fin ite n o u n p h ra se is c h e a p e r th a n a ssu m in g a d e fin ite n o u n p h rase.

In th e e x a m p le s en ten ces g iv e n in (1 ) , th e n o u n p h ra se ‘ th e b la s t ’ , is rela ted t o th e ev en t o f th e e x p lo s io n m e n tio n e d in th e p r e c e e d in g se n te n ce . S im p lify ­ in g so m e w h a t (c f. fu rth e r b e lo w ), w e c o u ld sa y th a t ’ th e b la s t ’ is in a sen se a n o m in a liz a tio n o f ‘ a b o m b e x p lo d e d ’ .

(9)

(4) (Defaxiom EXPLOSIOK-BLAST-1 (terror)

‘‘

An explosion generates a blast'’

(ALL ((el* . ev) (x . nev) (y . nev) (z . nev))

(IMPLY (AND (ASSUMABLE (etc-expl el* x y z ) 0.3)

(explode! el* x y z))

(SOME ((e2* . ev) (b . nev))

(AND (blast! e2* b) (genn el* e2*))))))

E ssen tia lly , th is a x io m sa y s th a t a b la s t (e 2 * ) im p lies th e o c c u r r e n c e o f som e e x p lo s io n ev e n t ( e l * ) , a n d th a t th e la tte r g en era tes th e fo r m e r , w h ich is sta ted b y w a y o f th e p r im itiv e p r e d ic a te ‘ g e n n ’ . T h e p r e d ic a te ‘e t c - e x p l’ , w h ich ca n b e seen as ‘ a d d it io n a l’ , b u t n o t s p e lle d o u t p r o p e r tie s r e la tin g t o th e e x p lo d e p r e d ic a te , is in t r o d u c e d b e c a u s e w e d o n o t w a n t t o s ta te fla tly th a t ‘ a b la s t’ and ‘ a n e x p lo s io n ’ is th e sa m e th in g .

S in ce a n ‘ e x p lo s io n ’ is k n o w n (it w a s in tr o d u c e d in th e p r e v io u s s e n te n c e ), it is free o f c h a r g e t o re s o lv e th e s e c o n d p r e d ic a te in th e a n te ce d e n t o f th e a x io m a g a in s t th is k n o w n fa c t . T h e first p r e d ic a te in th e a n teced en t h as b e e n assign ed su ch a lo w a s s u m a b ility c o s t ( 0 .3 ), th a t p r o v in g ‘ b la s t’ b y u se o f th e a x io m is c h e a p e r th a n t o a ssu m e its e x is te n ce .

5

Extracting Specific Information from

the Texts

T h e lo g ic a l fo r m e n c a p s u la tin g th e in te r p r e ta tio n fo u n d fo r a t e x t, i.e. th e o u tp u t fr o m th e in te r p r e ta tio n c o m p o n e n t , is th e in p u t t o th e ta sk s p e c ific a n alysis c o m p o n e n t . T h e a n a ly sis is p e r fo r m e d o n th e b a sis o f th e lo g ic a l fo r m a n d a ‘ ta sk s c h e m a s p e c if ic a t io n ’ g iv e n t o th e th e o r e m p r o v e r .

5.1 T h e Schem a

L e t us h ere c o n s id e r a sim p lifie d e x a m p le o f th e k in d o f ev en t rela ted sp ecific in fo r m a tio n w e w o u ld lik e th e s y s te m t o c o m p u t e . F o r a g iv en te x t d e s c r ib in g a te r r o r is t e v e n t, w e w o u ld lik e t o fin d an sw ers ( i f a n y ) t o ‘ q u e s tio n s ’ su ch as th e fo llo w in g :

INCIDENT TYPE:

TARGET TYPE:

TARGET NATIONALITY:

INCIDENT CITY:

INCIDENT COUNTRY:

RESPONSIBLE ORGANIZATION:

etc.

(10)

en tries, th e ajisw ers fo u n d a re p r in te d o u t o n th e scre e n . T h e s lo t s in th e ‘ r e c o r d ’ a re filled b y th e valu es fo u n d fo r v a ria b les w h en p re s e n tin g th e th e o r e m p r o v e r w ith g oa ls t o b e a b d u c t iv e ly p r o v e n b y u sin g th e in fo r m a tio n fr o m th e te x t in terp reted a n d th e fa c ts in th e k n o w le d g e b a s e .

T h e g o a ls o f th e s c h e m a a p p e a r as th e c o n s e q u e n t in w h a t m ig h t in fo r m a lly b e ca lle d th e ‘ lin k in g a x io m s ’ in th e a p p lic a tio n ta sk s p e c ific p a rt o f th e k n o w l­ e d g eb a se. L in k in g a x io m s c a n b e th o u g h t o f as g u id e lin e s fo r h o w t o fin d an sw ers t o th e ‘ q u e s tio n s ’ p o s e d b y w a y o f th e s c h e m a s p e c ific a tio n .

T h e s c h e m a it s e lf is a m e ta lo g ic a l L IS P e x p re s s io n in a fir s t-o r d e r p r e d ic a te ca lcu lu s fo r m a n n o t a te d b y n o n -lo g ic a l o p e r a to r s fo r sea rch c o n t r o l a n d r e s o u r c e b o u n d s . T h e t w o n o n -lo g ic a l o p e r a to r s a re ‘ p r o v in g ’ a n d ‘e n u m e r a te d -fo r -a ll’ . W it h o u t g o in g in to te ch n ica l d eta ils a b o u t th ese t w o o p e r a t o r s ( f o r m o r e d e ta ils , see T y s o n a n d H o b b s 1 9 8 8 ), let us s im p ly p resen t a sm a ll e x c e r p t fr o m th e s ch em a fo r th e a b o v e e x a m p le ‘ r e c o r d ’ , a n d meike s o m e e x p la n a t o r y c o m m e n t s in o r d e r t o c o n v e y th e b a s ic in tu itio n s o f th e p r o c e s s t o th e rea d er:

(proving

(enumerated-for-all ((el . ev))

(proving

(some ((it . nev)) (incident-type el it))

(terror-limits default-time)

print-incident)

(and

(enumerated-for-all ((it . nev))

(proving

(incident-type el it)

(terror-limits default-time)

print-incident-type)

:true)

(enumerated-for-all ((ro . nev))

(proving

(responsible-organization el ro)

(terror-limits default-time)

print-responsible-organization)

:true)

(terror-limits default-time)

print-sentence-finished)))

T h e lin k in g a x io m in th e k n o w le d g e b a s e fo r ‘ re s p o n s ib le o r g a n iz a t io n ’ c o u ld b e th e fo llo w in g sta te m e n t:

(5) (Defaziom RESP-ORG-1 (terror)

'‘

The organization responsible for the attack’’

(ALL ((el* . ev) (e . ev) (e2* . ev) (o . nev) (e3* . ev))

(IMPLY (AHD (terattack! el* e) (responsible! e2* o e)

(11)

T h u s , w e fin d th e o r g a n iz a tio n ( o ) re s p o n s ib le fo r an a tta c k (e ) b y p ro v in g th a t e is a te rro ris t a tt a c k , th a t o is a te rro rist o r g a n iz a tio n , a n d th a t o is r e s p o n s ib le f o r e.

C o n t r a r y t o th e p r a g m a tic s m o d u le , n o a s su m p tio n s a re m a d e in th e task s p e c ific a n a ly sis p h a se w h e n tr y in g t o p r o v e th e g o a ls o f th e s ch em a ; th is step is m e a n t t o e x t r a c t in fo r m a tio n o n ly . H o w ev er, th e p r o c e s s is still b a ck -ch a in in g c o n tr o lle d a b d u c t iv e in fe r e n c in g . T h is m ea n s th a t e v e r y th in g has t o b e p rov ed a g a in st th e k n o w le d g e in th e d a ta b a s e in c o n ju n c tio n w ith th e in te rp re ta tio n o f th e te x t.

P r o v in g th e a n te c e d e n ts o f th e lin k in g a x io m s m a y o f c o u r s e in v o lv e resolv in g th e n e w g o a ls w ith k n o w le d g e a sse rte d in th e te x t o r in th is c a s e , p r o v in g fu rth er a x io m s in th e k n o w le d g e b a s e .

T h e r e m a y a ls o b e d ifferen t a x io m s fo r th e sa m e g o a l, in d ic a t in g th a t a g oa l c a n b e e x p la in e d , o r m o r e c o r r e c t ly p r o v e d , in differen t w a y s. A c tu a lly , this is o n ly a r e fle c tio n o f th e fa c t th a t a g iv e n p h e n o m e n a ca n b e b r o u g h t a b o u t in d ifferen t w a y s. F o r e x a m p le , th e re a re a c tu a lly th ree d ifferen t a x io m s fo r ‘ r e s p o n s ib le ’ in o u r k n o w le d g e b a s e .

5.2

T h e In form ation E x tracted from the Interpretation

R esu lt

L e t us n o w re tu rn t o o u r e x a m p le t e x t. F or illu stra tio n , w e first sh o w an e x ­ c e r p t fr o m th e resu lt o f th e in te r p r e ta tio n o f th e sen ten ces in e x te rn a l fo r m a t (6 ) — n o te th e r e so lv e d c o m p o u n d in g rela tio n sh ip ; a n d th en th e p r in t-o u t o f th e in fo r m a tio n a u to m a t ic a lly e x t r a c t e d b y th e a n a ly z e c o m p o n e n t fr o m th e in te r p r e t a tio n (7 ) o f th e t w o e x a m p le sen ten ces.

(6)

INTERPRETATION 1 OF SENTENCE:

C o s t : 34

New emd Assumed Information:

xl:

bomb!(e2, xl)

yl:

explode!(e4, yl, xl)

xl2:

bilbaol(el3, xl2)

x8:

renault!(e9, x8)

x6:

showroom!(e7, x6)

in!(ell, x6, xl2)

e4:

at!(e5, e4, x6)

past!(el5, e4)

Given or Inferred Information:

x 8 :

renault!(e9, x8)

nn!(elO, x8, x6)

own!(e25, x8, x6)

firm!(o26, x8)

(12)

(7)

INCIDENT TYPE: explosion

TARGET TYPE: commercial

TARGET NATIONALITY: french

INCIDENT CITY: bilbao

INCIDENT COUNTRY: Spain

PROPERTY DAMAGE: <unknown>

WARNING: yes

METHOD: phone

RESPONSIBLE ORGANIZATION: eta-m

6

Final Remarks

T A C I T U S o ffers an in te re s tin g fr a m e w o r k fo r e x p e r im e n tin g w ith k n o w le d g e - b a sed n a tu ra l la n g u a g e p r o c e s s in g , a n d in fa c t it is a q u ite s o p h is tic a te d s y s te m . P re v io u sly , th e T A C I T U S te a m a t S R I h as b e e n e x p e r im e n tin g w ith im p le m e n ­ ta tio n s o f k n o w le d g e b a s e s fo r d o m a in s su ch as th e b r e a k -d o w n o r m a lfu n c t io n in g o f m e ch a n ica l p a rts in sh ip s (H o b b s 1 9 8 7 ). C o n s t r u c t in g a k n o w le d g e b a s e fo r th e te rro rist a tta c k d o m a in w as th e first a t t e m p t t o d e a l w ith a s lig h tly less r e s tr ic t­ ed s u b je c t field in th e T A C I T U S s y s te m . T h e m a in c o n c lu s io n t o b e d ra w n fr o m th e e x p e rim e n t w ith th e te rro ris t te x t s is th a t v e r y ca r e fu l a x io m a tiz a tio n o f th e f

2

w;ts is n e cessa ry in o r d e r t o a ch ie v e g o o d resu lts, i.e. ‘ n u ts a n d b o lt s ’ h a v e t o b e ca re fu lly fitte d to g e th e r t o c r e a te ‘ d e lu s io n s o f g r a n d e u r ’ .

Ack nowledgements:

T h e D a n ish C a r ls b e r g F o u n d a tio n p r o v id e d th e fin a n cia l s u p p o r t fo r m y s ta y at S R I In te rn a tio n a l. C o n s t r u c t in g a n d te s tin g th e d o m a in s p e c ific k n o w le d g e b a s e fo r te rro rist te x ts in th e T A C I T U S s y s te m d e s c r ib e d h ere, w a s s u g g e s te d t o m e b y J e rry H o b b s a n d ca rrie d o u t u n d e r his s u p e r v is io n . I a m in d e b te d t o J e rry fo r his g u id a n c e a n d m a n y u sefu l h in ts. N eed less t o say, i f th e p resen t p a p e r co n ta in s e rrors o r m is c o n c e p tio n s in th e p re s e n ta tio n o f T A C I T U S , th e a u th o r a lon e c a n b e b la m e d .

References

Eco, U. 1979. Lector in Fabula. Milan.

Grice, H.P. 1975. Logic and Conversation. R. Schank and B. Nash-Webber [Eds.],

Theoretical Issues in Natural Language Processingl69-174. Cambridge, Mass.

Grosz, B., N. Haas, G. Hendrix, J. Hobbs, P. Martin, R. Moore, J. Robinson, and S. Rosenschein. 1982. D IALOG IC: A Core Natural-Language System. SRI Tech. Note 270. SRI, Menlo Park, California.

Grosz, B., D.E. Appelt, P.A. Martin, and F.N.C. Pereira. 1987. TE A M : An Experi­ ment in the Design o f Transportable Natured-Language Interfaces. Artificial In­

telligence, 32:173-243.

(13)

Hayes, P.J. 1985. The Second Naive Physics Manifesto. J.R. Hobbs and R.C. Moore [Eds.], Formal Theories o f the Commonsense W orld:l-36. Ablex, New Jersey.

Hirschman, L. 1986. Discovering Sublanguage Structures. R. Grishman and R. Kit- tredge [Eds.], Analyzing Language in Restricted Domains: Sublanguage Descrip­

tion and Processing211-234. Erlbaum, New Jersey.

Hobbs, J.R. 1978. Coherence and Coreference. SRI Tech. Note 168. SRI, Menlo Park, California.

Hobbs, J.R. 1984. Sublanguage and Knowledge. SRI Tech. Note 329. SRI, Menlo Park, California.

Hobbs, J.R. 1985a. Granularity. In: Proceedings o f IJ C A I-85:l-4.

Hobbs, J.R. 1985b. Ontological Promiscuity. In: Proceedings o f A C L -85:6l-69. Univer­ sity o f Chicago, Illinois.

Hobbs, J.R. 1986a. Commonsense Metaphysics and Lexical Semantics. SRI Tech. Note 392. SRI, Menlo Park, California.

Hobbs, J.R. 1986b. Discourse and Inference. Ms. SRI, Menlo Park, California.

Hobbs, J.R. 1986c. Overview o f the TACITUS Project. Computational Linguistics,

12

:

220

-

222

.

Hobbs, J.R. 1987. Local Pragmatics. SRI Tech. Note 429. SRI, Menlo Park, California.

Hobbs, J.R., W . Croft, T . Davies, D. Edwards, and K. Laws. 1988. The TACITUS Commonsense Knowledge Base. Ms. SRI, Menlo Park, California.

Hobbs, J.R., M. Stickel, P. Martin, and D. Edwards. 1989. Interpretation as Abduction. Ms. SRI, Menlo Park, California.

Levi, J. 1978. The Syntax and Semantics o f Complex Nominals. Academic Press, New York.

Stickel, M.E. 1982. A Nonclausal Connection-Graph Resolution Theorem-Proving Pro­ gram. Proceedings o f the A A A I-82 National Conference on Artifical Intelligence: 229-233. Pittsburgh, Pennsylvania.

Stickel, M.E. 1988. A Prolog-like Inference System for Computing Minimum Cost Ab- ductive Expl2inations in Natural Language Interpretation. Proceedings o f ICCSC 58:343-350. Hong Kong.

Tyson, M. and J.R. Hobbs. 1988. Domain-Independent Task Specification in the TAC­ ITUS Natural Language System. Ms. SRI, Menlo Park, California.

Figure

Figure 1:

References

Related documents

All participants included in this survey were asked to assess the level of skills, abilities and competencies of their family physicians with regard to the following six crucial

The Federal Regulations regarding National Direct/Federal Perkins Student Loans are strictly adhered to so that loan advances, payment processing, delinquent account

El rol del estado en la reforma de medios [The role of the state in media reform] Fundación Violeta Chamorro, Managua, Nicaragua, March 2010. Democracia y Web 2.0 [Democracy and

Also the en- trepreneurs without investment opportunity …nd equity less attractive than money as means of saving (if the expected rate of returns were unchanged), because he can

Supplemental drugs, supplemental cost-sharing, over-the-counter drugs and non-Part D drugs funded by Part C rebates are excluded from this field. Net amount the plan has paid for

Here, we used inter-subject phase synchronization (ISPS) and seed- based phase synchronization (SBPS) analysis ( Glerean et al., 2012 ) of functional magnetic resonance imaging

This paper is devoted for comparing the effect of temperature change on the attenuation coefficient of optical fiber when two laser types [He-Ne and solid state laser]