Learning Semantic Level Information Extraction Rules by Type Oriented ILP

(1)

Typ e-Oriented ILP

Yutaka Sasaki and Yoshihiro Matsuo

NTT Communication ScienceLab oratories

2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237,Japan

fsasaki, [email protected]

Abstract

This paper describ es an approach to using

se-manticrepresentationsforlearninginformation

extraction (IE) rules bya typ e-oriented

induc-tive logic programming (ILP) system. NLP

comp onentsofamachinetranslationsystemare

used toautomaticallygenerate semantic

repre-sentations of text corpus that can b e given

di-rectly toan ILPsystem. Thelatestexp

erimen-talresultsshowhighprecisionand recallof the

learnedrules.

1 Intro duction

Informationextraction (IE)tasks inthis pap er

involve the MUC-3 style IE.The input for the

information extraction task is an empty

tem-plateandasetofnaturallanguagetextsthat

de-scrib earestrictedtargetdomain,suchascorp

o-ratemergersorterroristattacksinSouth

Amer-ica. Templates have a record-like data

struc-turewithslotsthathavenames,e.g.,\company

name"and\mergerdate",andvalues. The

out-put is a set of lled templates. IE tasks are

highly domain-dep endent, so rules and

dictio-nariesforllingvaluesinthetemplateslots

de-p endon thedomain.

It is a heavy burden for IE system

develop-ers that such systems dep end on hand-made

rules, which cannot be easily constructed and

changed. For example, Umass/MUC-3 needed

ab out1,500p erson-hoursofhighlyskilledlabor

to build the IE rules and represent them as a

dictionary (Lehnert, 1992). All the rules must

b e reconstructed from scratch when the target

domainis changed.

To cop e with this problem, some pioneers

have studied metho ds for learning information

extraction rules (Rilo,1996; So derland et al.,

1995; Kim et al., 1995; Human, 1996; Cali

proach is to apply an inductive logic

program-ming (ILP) (Muggleton, 1991) system to the

learning of IE rules, where information is

ex-tracted from semantic representations of news

articles. The ILP system that we employed is

a type-oriented ILP systemRHB +

(Sasaki and

Haruno, 1997), which can ecientlyand

eec-tivelyhandletype(orsort)informationin

train-ing data.

2 Our Approach to IE Tasks

Thissectiondescrib esourapproachtoIEtasks.

Figure1isanoverviewofourapproachto

learn-ing IE rules using an ILP system from

seman-tic representations. First, training articles are

analyzed andconvertedintosemantic

represen-tations,whicharelledcaseframesrepresented

asatomicformulae. Trainingtemplatesare

pre-pared by hand aswell. The ILP system learns

IErulesintheformoflogicprogramswithtyp e

information. Toextractkeyinformation froma

new article, semanticrepresentations

automat-ically generated from the article ismatched by

theIErules. Extractedinformationislledinto

the templateslots.

3 NLP Resources and Tools

3.1 The Semantic AttributeSystem

Weusedthesemanticattributesystemof

\Goi-Taikei | A Japanese Lexicon" (Ikehara et al.,

1997a; KurohashiandSakai,1999)compiledby

the NTT Communication Science Lab oratories

for a Japanese-to-English machine translation

system,ALT-J/E(Ikeharaetal.,1994). The

se-manticattributesystemisasort ofhierarchical

concept thesaurus represented as a tree

struc-ture in which each no de is called a semantic

category. An edgein thetree representsan is a

(2)

articles

semantic representation

templates filled

by hand

ILP

system

IE rules

new article

semantic

representation

release(c2,p2)

announce(c2,d2)

company(X)

release(X,P)

Analyze

sentences

Analyze

sentences

release(c1,p1)

announce(c1,d1)

Apply rules

to semantic

representation

Company: c2

Date: d2

Product: p2

answer

positive

examples

background

knowledge

sort hierarchy

Figure1: Blockdiagram of IEusing ILP

contains ab out 3,000 semantic category no des.

Morethan300,000Japanesewordsarelinkedto

thecategory nodes.

3.2 Verb Case Frame Dictionary

The Japanese-to-English valency pattern

dic-tionary of \Goi-Taikei" (Ikehara et al., 1997b;

Kurohashi and Sakai, 1999) was also originally

developedforALT-J/E.Thevalencydictionary

containsabout15,000 case frames with

seman-tic restrictions on their arguments for 6,000

Japaneseverbs. Eachcaseframeconsistsofone

predicate and one or more case elements that

have alist ofsemanticcategories.

3.3 Natural Language Processing Tools

We used the NLPcomponentsof ALT-J/E for

text analysis. These include the morphological

analyzer, the syntactic analyzer, and the case

analyzerforJapanese. The comp onentsare

ro-bustandgenericto ols,mainlytargetedto

news-pap er articles.

3.3.1 Generic Case Analyzer

Let us examine the case analysis in more

de-tail. The caseanalyzerreads asetof parsetree

candidatespro duced bytheJapanese syntactic

analyzer. The parsetree isrepresentedasa

de-p endency of phrases(i.e., Japanesebunsetsu).

First, it dividesthe parse tree intounit

sen-tences, where a unit sentence consists of one

predicate and its noun and adverb dep endent

phrases. Second, it compares each unit

sen-tence with a verb case frame dictionary. Each

frameconsistsapredicateconditionandseveral

case elements conditions. The predicate

con-and eachcase-rolehasacase element condition

whichsp eciesparticlesandsemanticcategories

of noun phrases. The preference value is

de-ned as the summation of noun phrase

prefer-ences which are calculated from the distances

between the categories of the input sentences

and the categories written in the frames. The

case analyzerthen cho oses the most preferable

parse treeand themostpreferablecombination

of case frames.

The valency dictionary also has case-roles

(Table1)fornounphraseconditions. The

case-roles of adjuncts are determined by using the

particlesofadjunctsandthesemanticcategories

of noun phrases.

Asaresult,theoutputof thecase analysisis

a setofcaseframesforeachunitsentence. The

nounphrasesinframesarelab eledbycase-roles

in Table 1.

Forsimplicity,weusecase-roleco des,suchas

N1 andN2,asthelab els(orslotnames)to

rep-resent case frames. The relation b etween

sen-tences and case-roles is described in detail in

(Ikeharaet al., 1993).

3.3.2 Logical Form Translator

We developed a logical form translator FEP

that generates semantic representations

ex-pressedasatomicformulaefromthecaseframes

and parse trees. For later use, do cument ID

andtenseinformationarealsoaddedtothecase

frames.

Forexample,thecase frameinTable2is

ob-tained afteranalyzingthe followingsentenceof

(3)

Name Co de Description Example

Subject N1 theagent/experiencerof I throwa ball.

anevent/situation

Object1 N2 theobject ofan event I throwa bal l.

Object2 N3 anotherobjectof an event Icompare itwith them.

Lo c-Source N4 sourcelo cationof amovement I startfrom Japan.

Lo c-Goal N5 goallocationofa movement I goto Japan.

Purp ose N6 thepurpose ofan action I goshopping.

Result N7 theresult ofan event It resultsin failure.

Lo cative N8 thelo cationof an event It o ccurs atthe station.

Comitative N9 co-experiencer Ishare aro omwith him.

Quotative N10 quotedexpression Isay that....

Material N11 material/ingredient I lltheglass withwater.

Cause N12 thereasonforan event It collapsedfrom the weight.

Instrument N13 aconcrete instrument I sp eakwith a microphone.

Means N14 an abstractinstrument Ispeakin Japanese.

Time-Position TN1 thetimeof an event I gotob ed at10:00.

Time-Source TN2 thestartingtimeof an event IworkfromMonday.

Time-Goal TN3 theend timeof an event It continuesuntilMonday.

Amount QUANT quantityof something I sp end$10.

shokuba(the oce) kara(from) kuko(the

air-p ort)ni(to) hakobu(carry)"

(\Jackcarriesasuitcasefromtheocetothe

airp ort.")

Table 2: Case Frame ofthe SampleSentence

predicate: hakobu (carry)

article: D1

tense: present

N1: Jakku (Jack)

N2: sutsukesu (suitcase)

N4: shokuba (theoce)

N5: kuko (theairport)

4 Inductive Learning To ol

ConventionalILPsystemstakeasetof p ositive

andnegativeexamples,andbackground

knowl-edge. The output is a set of hypotheses in the

formoflogicprogramsthatcoverspositivesand

do notcovernegatives. Weemployedthe typ

e-+

4.1 Features of Type-oriented ILP

System RHB +

The typ e-orientedILPsystemhasthefollowing

features that match the needs for learning IE

rules.

A type-oriented ILP systemcan eciently

and eectively handle type (or

seman-tic category) information intraining data.

Thisfeature isadvantageousincontrolling

the generality and accuracy of learned IE

rules.

Itcandirectlyusesemanticrepresentations

of thetextasbackgroundknowledge.

It canlearn fromonly positiveexamples.

Predicates are allowed to have labels (or

keywords) for readability and

expressibil-ity.

4.2 Summary of Typ e-oriented ILP

System RHB +

This section summarizes the employed typ

e-oriented ILP system RHB +

. The input of

RHB +

(4)

back-the semantic attribute system). The output is

asetof Hornclauses(Lloyd,1987)having

vari-ableswith type information. That is,the term

isextended tothe-term.

4.3 -terms

-termsaretherestrictedformof -terms(A

t-KaciandNasr,1986;At-Kacietal.,1994).

In-formally, -termsare Prolog terms whose

vari-ablesare replaced withvariable Var of type T,

whichis denoted asVar:T.Predicateand

func-tion symbols are allowed to have features (or

lab els). Forexample,

speak(agent)X:human,object)Y:l ang uage)

is a clause based on -terms which has lab els

agent and object, and typ es human and

l anguag e.

4.4 Algorithm

The algorithm of RHB +

is basically a greedy

covering algorithm. It constructs clauses

one-by-one by calling inner loop (Algorithm 1)

which returns a hypothesis clause. A

hypoth-esisclauseisrepresentedintheform ofhead:0

body. CoveredexamplesareremovedfromP in

eachcycle.

The inner lo op consists of two phases: the

headconstructionphaseandtheb o dy

construc-tion phase. It constructs headsin a b ottom-up

mannerand constructstheb odyina top-down

manner, followingtheresultdescribed in(Zelle

et al.,1994).

The search heuristic PWI is weighted

infor-mativity employing the Laplace estimate. Let

T = fHead:0Body g [BK.

Pj denotes the number of positive

ex-amples coveredbyT and Q(T)is theempirical

content. ThesmallerthevalueofPWI,the

can-didate clause is b etter. Q(T) is dened as the

set of atoms (1) that arederivablefrom Tand

(2)whosepredicate isthetargetpredicate,i.e.,

thepredicate name of thehead.

The dynamic type restriction by positive

ex-amplesusesp ositiveexamplescurrentlycovered

inorder todetermineappropriatetypesto

vari-1. Given positives P, original positives P

0 ,

back-groundknowledgeBK.

2. Decidetypesof variablesin aheadby

comput-ingthe typedleastgeneral generalizations(lgg)

ofNpairsofelementsinP,andselectthemost

general headasHead.

3. If the stopping condition is satised, return

Head.

4. LetBody beempty.

5. CreateasetofallpossibleliteralsLusing

vari-ables inHeadandBody.

6. Let BEAM be top K literals l

k

of L with

respect to the positive weighted informativity

PWI.

7. Do later steps, assuming that l

k

is added to

Body foreachliteral l

k

inBEAM.

8. Dynamical ly restrict types in Body by cal ling

the dynamictype restriction by positive

exam-ples.

9. If the stopping condition is satised, return

(Head:0Body).

10. Goto5.

5 Illustration of a Learning Process

Now, we examine thetwo short notices of new

productsreleaseinTable3. Thefollowingtable

shows a sample template for articles rep orting

a new pro ductrelease.

Template

1. articleid:

2. company:

3. pro duct:

4. releasedate:

5.1 Preparation

Suppose that the following semantic

represen-tations is obtainedfrom Article1.

(c1) announce( article => 1,

tense => past,

tn1 => "this week",

n1 => "ABC Corp.",

n10 => (c2) ).

(c2) release( article => 1,

tense => future,

tn1 => "Jan. 20",

(5)

Articleid Sentence

#1 \ABC Corp. this week announced thatitwill release a colorprinteron Jan. 20."

#2 \XYZCorp. released acolorscanner lastmonth."

The lledtemplateforArticle1 isasfollows.

Template1

1. articleid: 1

2. company: ABCCorp.

3. pro duct: acolorprinter

4. release date: Jan. 20

Supposethatthefollowingsemantic

represen-tationis obtainedfrom Article2.

tense => past,

tn1 => "last month",

n1 => "XYZ Corp.",

n2 => "a color scanner" ).

The lledtemplateforArticle2 isasfollows.

Template2

1. articleid: 2

2. company: XYZCorp.

3. pro duct: acolorscanner

4. release date: last month

5.2 Head Construction

Twop ositiveexamplesareselectedforthe

tem-plate slot\company".

company(article-number => 1

name => "ABC Corp").

company(article-number => 2

name => "XYZ Corp").

By computing a least general generalization

(lgg)sasaki97,the followingheadisobtained:

company( article-number => Art: number

name => Co: organization).

5.3 Body Construction

Generatep ossible literals 1

by combining

predi-cate names and variables, then check the PWI

1

\literals" here means atomic formulae or negated

values of clauses to which one of the literal

added. Inthiscase,supp osethataddingthe

fol-lowing literal with predicate release is theb est

one. After the dynamic type restriction, the

current clause satises the stopping condition.

Finally,theruleforextracting\companyname"

is returned. Extraction rules for other slots

\product"and \releasedate"canb elearned in

thesamemanner. Notethatseveralliteralsmay

be needed in the bo dy of the clause to satisfy

the stoppingcondition.

company(article-number => Art:number,

name => Co: organization )

:- release( article => Art,

tense => T: tense,

tn1 => D: time,

n1 => Co,

n2 => P: product ).

5.4 Extraction

Now, we have the following semantic

represen-tation extractedfrom thenewarticle:

Article3: \JPNCorp. hasreleasedanewCD

player." 2

tense => perfect_present,

tn1 => nil,

n1 => "JPN Corp.",

n2 => "a new CD player" ).

ApplyingthelearnedIErulesandotherrules,

wecan obtain thelledtemplateforArticle3.

Template 3

1. articleid: 3

2. company: JPNCorp.

3. pro duct: CD player

4. releasedate:

2

(6)

in-(a)Without datacorrection

company pro duct release date announce date price

Precision 89.6% 80.5% 90.6% 100.0% 58.4%

Recall 82.1% 66.7% 66.7% 82.4% 60.8%

Averagetime(sec.) 15.8 22.1 14.4 2.2 11.0

(b)With datacorrection

company pro duct release date announce date price

Precision 91.1% 80.0% 92.3% 100.0% 87.1%

Recall 85.7% 69.7% 82.8% 88.2% 82.4%

Averagetime(sec.) 22.9 25.2 33.55 5.15 11.9

6 Experimental Results

6.1 Setting of Experiments

We extracted articles related to the release of

new pro ducts from a one-year newspap er

cor-pus written in Japanese 3

. One-hundred

arti-cles were randomly selected from 362 relevant

articles. The template we used consisted of

ve slots: company name, product name,

re-lease date, announce date, and price. We also

lled one template for each article. After

an-alyzing sentences, case frames were converted

intoatomicformulaerepresentingsemantic

rep-resentationsasdescribedinSection2and3. All

the semantic representations were given to the

learnerasbackgroundknowledge,andthelled

templates were givenas p ositive examples. To

sp eed-upthelearningpro cess,weselected

pred-icatenamesthatarerelevanttothewordsinthe

templatesasthetargetpredicatestobeusedby

theILPsystem,andwealsorestrictedthe

num-b erof literalsintheb o dyofhyp othesestoone.

Precisionand recall,thestandardmetricsfor

IEtasks, arecounted byusing the

remove-one-out cross validation on the examples for each

item. We used a VArStation with thePentium

I I Xeon(450MHz) forthis exp eriment.

6.2 Results

Table4shows theresultsofourexperiment. In

theexp erimentoflearningfromsemantic

repre-sentations,includingerrorsincase-roleselection

and semantic category selection, precision was

3

We used articles from the MainichiNewspap ersof

very high. The precision of the learned rules

forpricewaslowb ecausethesemanticcategory

name automatically given to the price

expres-sions in the data were not quite appropriate.

Forthe ve items, 67-82% recall wasachieved.

Withthebackgroundknowledgehaving

seman-tic representationscorrectedbyhand,precision

was veryhigh and 70-88% recall wasachieved.

The precision ofpricewas markedlyimproved.

It is imp ortant that the extraction of ve

dierent piecesof information showed go o d

re-sults. ThisindicatesthattheILPsystemRHB +

has ahigh p otentialinIEtasks.

7 Related Work

Previous researches on generating IE rules

from texts with templates include

AutoSlog-TS (Rilo,1996), CRYSTAL (So derland et al.,

1995), PALKA (Kimet al., 1995),LIEP

(Hu-man, 1996) and RAPIER (Cali and Mooney,

1997). In our approach, we use the

type-oriented ILP system RHB +

, which is indep

en-dent of natural language analysis. This p oint

dierentiates our approach from the others.

Learning semantic-level IE rules using an ILP

system from semantic representations is also a

new challenge inIEstudies.

Sasaki (Sasaki and Haruno, 1997) applied

RHB +

totheextractionofthenumberofdeaths

and injuries from twenty ve articles. That

experiment was sucient to assess the p

(7)

This pap er describ ed a use of semantic

repre-sentationsforgeneratinginformationextraction

rules by applying a type-oriented ILP system.

Exp eriments were conducted on the data

gen-eratedfrom 100news articles inthe domainof

new pro duct release. The results showed very

high precision, recall of 67-82% without data

correction and 70-88% recall with correct

se-mantic representations. The extraction of ve

dierentpieces of information showed go od

re-sults. ThisindicatesthatourlearnerRHB +

has

a highpotentialinIE tasks.

References

H. At-Kaci and R.Nasr, LOGIN:A logic

pro-gramminglanguagewithbuilt-ininheritance,

Journal of Logic Programming, 3, pp.185{

215, 1986.

H. At-Kaci, B. Dumant, R. Meyer, A.

Podel-ski,andP.VanRoy,TheWildLifeHandbook,

1994.

M. E. Cali and R. J. Mo oney, Relational

LearningofPattern-MatchRulesfor

Informa-tion Extraction, Proc. of ACL-97 Workshop

in Natural Language Learning,1997.

S. B. Human, Learning Information

Extrac-tion Patterns from Examples, Statisticaland

SymbolicApproaches toLearningfor Natural

Language Processing,pp.246{260,1996.

S. Ikehara, M. Miyazaki, and A. Yokoo,

Clas-sication of language knowledge for

mean-ing analysis in machine translations,

Trans-actions of Information Processing Society

of Japan, Vol.34, pp.1692-1704, 1993. (in

Japanese)

S. Ikehara, S. Shirai, K. Ogura, A. Yokoo,

H. Nakaiwa and T. Kawaoka, ALT-J/E: A

JapanesetoEnglishMachineTranslation

Sys-tem for Communication with Translation,

Proc. of The 13th IFIP World Computer

Congress,pp.80{85, 1994.

S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo,

H. Nakaiwa, K. Ogura, Y. Oyama and

Y. Hayashi (eds.), The Semantic Attribute

System, Goi-Taikei | A Japanese

Lexi-con, Vol.1, Iwanami Publishing, 1997. (in

Japanese)

S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo,

H. Nakaiwa, K. Ogura, Y. Oyama and

IwanamiPublishing, 1997. (inJapanese)

J.-T. Kim and D. I. Moldovan, Acquisition

of Linguistic Patterns for Knowledge-Based

Information Extraction, IEEE Transaction

on Knowledge and Data Engineering (IEEE

TKDE),Vol.7,No.5, pp.713{724,1995.

S. Kurohashi and Y. Sakai, Semantic Analysis

ofJapaneseNounPhrases: ANew Approach

toDictionary-BasedUnderstandingThe37th

Annual Meeting of the Association for

Com-putationalLinguistics(ACL-99),pp.481{488,

1999.

W.Lehnert, C.Cardie,D.Fisher,J.McCarthy,

E.RiloandS.So derland,Universityof

Mas-sachusetts: MUC-4 Test Results and

Analy-sis,Proc.ofTheFourthMessage

Understand-ingConference(MUC-4),pp.151{158,1992.

J. Lloyd, Foundations of Logic Programming,

Springer,1987.

S. Muggleton, Inductive logic programming,

New Generation Computing, Vol.8, No.4,

pp.295-318,1991.

E. Rilo, Automatically Generating

Extrac-tion Pattern from Untagged Text, Proc.of

American Association for Articial

Intelli-gence(AAAI-96), pp.1044{1049,1996.

Y. Sasaki and M. Haruno, RHB +

: A

Type-OrientedILPSystem Learningfrom Positive

Data, Proc. of The 14th International Joint

Conferenceon Articial Intelligence

(IJCAI-97),pp.894{899,1997.

S.So derland,D.Fisher,J.Aseltine, W.Lenert,

CRYSTAL: Inducing a Conceptual

Dictio-nary, Proc. of The 13th International Joint

Conferenceon Articial Intelligence

(IJCAI-95),pp.1314{1319, 1995.

J. M. Zelleand R. J. Mo oney,J. B. Konvisser,

Combining Top-down and Bottom-up

Meth-o ds in Inductive Logic Programming, Proc

ofThe11thInternationalConferenceon