Typ e-Oriented ILP
Yutaka Sasaki and Yoshihiro Matsuo
NTT Communication ScienceLab oratories
2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237,Japan
fsasaki, [email protected]
Abstract
This paper describ es an approach to using
se-manticrepresentationsforlearninginformation
extraction (IE) rules bya typ e-oriented
induc-tive logic programming (ILP) system. NLP
comp onentsofamachinetranslationsystemare
used toautomaticallygenerate semantic
repre-sentations of text corpus that can b e given
di-rectly toan ILPsystem. Thelatestexp
erimen-talresultsshowhighprecisionand recallof the
learnedrules.
1 Intro duction
Informationextraction (IE)tasks inthis pap er
involve the MUC-3 style IE.The input for the
information extraction task is an empty
tem-plateandasetofnaturallanguagetextsthat
de-scrib earestrictedtargetdomain,suchascorp
o-ratemergersorterroristattacksinSouth
Amer-ica. Templates have a record-like data
struc-turewithslotsthathavenames,e.g.,\company
name"and\mergerdate",andvalues. The
out-put is a set of lled templates. IE tasks are
highly domain-dep endent, so rules and
dictio-nariesforllingvaluesinthetemplateslots
de-p endon thedomain.
It is a heavy burden for IE system
develop-ers that such systems dep end on hand-made
rules, which cannot be easily constructed and
changed. For example, Umass/MUC-3 needed
ab out1,500p erson-hoursofhighlyskilledlabor
to build the IE rules and represent them as a
dictionary (Lehnert, 1992). All the rules must
b e reconstructed from scratch when the target
domainis changed.
To cop e with this problem, some pioneers
have studied metho ds for learning information
extraction rules (Rilo,1996; So derland et al.,
1995; Kim et al., 1995; Human, 1996; Cali
proach is to apply an inductive logic
program-ming (ILP) (Muggleton, 1991) system to the
learning of IE rules, where information is
ex-tracted from semantic representations of news
articles. The ILP system that we employed is
a type-oriented ILP systemRHB +
(Sasaki and
Haruno, 1997), which can ecientlyand
eec-tivelyhandletype(orsort)informationin
train-ing data.
2 Our Approach to IE Tasks
Thissectiondescrib esourapproachtoIEtasks.
Figure1isanoverviewofourapproachto
learn-ing IE rules using an ILP system from
seman-tic representations. First, training articles are
analyzed andconvertedintosemantic
represen-tations,whicharelledcaseframesrepresented
asatomicformulae. Trainingtemplatesare
pre-pared by hand aswell. The ILP system learns
IErulesintheformoflogicprogramswithtyp e
information. Toextractkeyinformation froma
new article, semanticrepresentations
automat-ically generated from the article ismatched by
theIErules. Extractedinformationislledinto
the templateslots.
3 NLP Resources and Tools
3.1 The Semantic AttributeSystem
Weusedthesemanticattributesystemof
\Goi-Taikei | A Japanese Lexicon" (Ikehara et al.,
1997a; KurohashiandSakai,1999)compiledby
the NTT Communication Science Lab oratories
for a Japanese-to-English machine translation
system,ALT-J/E(Ikeharaetal.,1994). The
se-manticattributesystemisasort ofhierarchical
concept thesaurus represented as a tree
struc-ture in which each no de is called a semantic
category. An edgein thetree representsan is a
articles
semantic representation
templates filled
by hand
ILP
system
IE rules
new article
semantic
representation
release(c2,p2)
announce(c2,d2)
company(X)
release(X,P)
Analyze
sentences
Analyze
sentences
release(c1,p1)
announce(c1,d1)
Apply rules
to semantic
representation
Company: c2
Date: d2
Product: p2
answer
positive
examples
background
knowledge
sort hierarchy
Figure1: Blockdiagram of IEusing ILP
contains ab out 3,000 semantic category no des.
Morethan300,000Japanesewordsarelinkedto
thecategory nodes.
3.2 Verb Case Frame Dictionary
The Japanese-to-English valency pattern
dic-tionary of \Goi-Taikei" (Ikehara et al., 1997b;
Kurohashi and Sakai, 1999) was also originally
developedforALT-J/E.Thevalencydictionary
containsabout15,000 case frames with
seman-tic restrictions on their arguments for 6,000
Japaneseverbs. Eachcaseframeconsistsofone
predicate and one or more case elements that
have alist ofsemanticcategories.
3.3 Natural Language Processing Tools
We used the NLPcomponentsof ALT-J/E for
text analysis. These include the morphological
analyzer, the syntactic analyzer, and the case
analyzerforJapanese. The comp onentsare
ro-bustandgenericto ols,mainlytargetedto
news-pap er articles.
3.3.1 Generic Case Analyzer
Let us examine the case analysis in more
de-tail. The caseanalyzerreads asetof parsetree
candidatespro duced bytheJapanese syntactic
analyzer. The parsetree isrepresentedasa
de-p endency of phrases(i.e., Japanesebunsetsu).
First, it dividesthe parse tree intounit
sen-tences, where a unit sentence consists of one
predicate and its noun and adverb dep endent
phrases. Second, it compares each unit
sen-tence with a verb case frame dictionary. Each
frameconsistsapredicateconditionandseveral
case elements conditions. The predicate
con-and eachcase-rolehasacase element condition
whichsp eciesparticlesandsemanticcategories
of noun phrases. The preference value is
de-ned as the summation of noun phrase
prefer-ences which are calculated from the distances
between the categories of the input sentences
and the categories written in the frames. The
case analyzerthen cho oses the most preferable
parse treeand themostpreferablecombination
of case frames.
The valency dictionary also has case-roles
(Table1)fornounphraseconditions. The
case-roles of adjuncts are determined by using the
particlesofadjunctsandthesemanticcategories
of noun phrases.
Asaresult,theoutputof thecase analysisis
a setofcaseframesforeachunitsentence. The
nounphrasesinframesarelab eledbycase-roles
in Table 1.
Forsimplicity,weusecase-roleco des,suchas
N1 andN2,asthelab els(orslotnames)to
rep-resent case frames. The relation b etween
sen-tences and case-roles is described in detail in
(Ikeharaet al., 1993).
3.3.2 Logical Form Translator
We developed a logical form translator FEP
that generates semantic representations
ex-pressedasatomicformulaefromthecaseframes
and parse trees. For later use, do cument ID
andtenseinformationarealsoaddedtothecase
frames.
Forexample,thecase frameinTable2is
ob-tained afteranalyzingthe followingsentenceof
Name Co de Description Example
Subject N1 theagent/experiencerof I throwa ball.
anevent/situation
Object1 N2 theobject ofan event I throwa bal l.
Object2 N3 anotherobjectof an event Icompare itwith them.
Lo c-Source N4 sourcelo cationof amovement I startfrom Japan.
Lo c-Goal N5 goallocationofa movement I goto Japan.
Purp ose N6 thepurpose ofan action I goshopping.
Result N7 theresult ofan event It resultsin failure.
Lo cative N8 thelo cationof an event It o ccurs atthe station.
Comitative N9 co-experiencer Ishare aro omwith him.
Quotative N10 quotedexpression Isay that....
Material N11 material/ingredient I lltheglass withwater.
Cause N12 thereasonforan event It collapsedfrom the weight.
Instrument N13 aconcrete instrument I sp eakwith a microphone.
Means N14 an abstractinstrument Ispeakin Japanese.
Time-Position TN1 thetimeof an event I gotob ed at10:00.
Time-Source TN2 thestartingtimeof an event IworkfromMonday.
Time-Goal TN3 theend timeof an event It continuesuntilMonday.
Amount QUANT quantityof something I sp end$10.
shokuba(the oce) kara(from) kuko(the
air-p ort)ni(to) hakobu(carry)"
(\Jackcarriesasuitcasefromtheocetothe
airp ort.")
Table 2: Case Frame ofthe SampleSentence
predicate: hakobu (carry)
article: D1
tense: present
N1: Jakku (Jack)
N2: sutsukesu (suitcase)
N4: shokuba (theoce)
N5: kuko (theairport)
4 Inductive Learning To ol
ConventionalILPsystemstakeasetof p ositive
andnegativeexamples,andbackground
knowl-edge. The output is a set of hypotheses in the
formoflogicprogramsthatcoverspositivesand
do notcovernegatives. Weemployedthe typ
e-+
4.1 Features of Type-oriented ILP
System RHB +
The typ e-orientedILPsystemhasthefollowing
features that match the needs for learning IE
rules.
A type-oriented ILP systemcan eciently
and eectively handle type (or
seman-tic category) information intraining data.
Thisfeature isadvantageousincontrolling
the generality and accuracy of learned IE
rules.
Itcandirectlyusesemanticrepresentations
of thetextasbackgroundknowledge.
It canlearn fromonly positiveexamples.
Predicates are allowed to have labels (or
keywords) for readability and
expressibil-ity.
4.2 Summary of Typ e-oriented ILP
System RHB +
This section summarizes the employed typ
e-oriented ILP system RHB +
. The input of
RHB +
back-the semantic attribute system). The output is
asetof Hornclauses(Lloyd,1987)having
vari-ableswith type information. That is,the term
isextended tothe-term.
4.3 -terms
-termsaretherestrictedformof -terms(A
t-KaciandNasr,1986;At-Kacietal.,1994).
In-formally, -termsare Prolog terms whose
vari-ablesare replaced withvariable Var of type T,
whichis denoted asVar:T.Predicateand
func-tion symbols are allowed to have features (or
lab els). Forexample,
speak(agent)X:human,object)Y:l ang uage)
is a clause based on -terms which has lab els
agent and object, and typ es human and
l anguag e.
4.4 Algorithm
The algorithm of RHB +
is basically a greedy
covering algorithm. It constructs clauses
one-by-one by calling inner loop (Algorithm 1)
which returns a hypothesis clause. A
hypoth-esisclauseisrepresentedintheform ofhead:0
body. CoveredexamplesareremovedfromP in
eachcycle.
The inner lo op consists of two phases: the
headconstructionphaseandtheb o dy
construc-tion phase. It constructs headsin a b ottom-up
mannerand constructstheb odyina top-down
manner, followingtheresultdescribed in(Zelle
et al.,1994).
The search heuristic PWI is weighted
infor-mativity employing the Laplace estimate. Let
T = fHead:0Body g [BK.
Pj denotes the number of positive
ex-amples coveredbyT and Q(T)is theempirical
content. ThesmallerthevalueofPWI,the
can-didate clause is b etter. Q(T) is dened as the
set of atoms (1) that arederivablefrom Tand
(2)whosepredicate isthetargetpredicate,i.e.,
thepredicate name of thehead.
The dynamic type restriction by positive
ex-amplesusesp ositiveexamplescurrentlycovered
inorder todetermineappropriatetypesto
vari-1. Given positives P, original positives P
0 ,
back-groundknowledgeBK.
2. Decidetypesof variablesin aheadby
comput-ingthe typedleastgeneral generalizations(lgg)
ofNpairsofelementsinP,andselectthemost
general headasHead.
3. If the stopping condition is satised, return
Head.
4. LetBody beempty.
5. CreateasetofallpossibleliteralsLusing
vari-ables inHeadandBody.
6. Let BEAM be top K literals l
k
of L with
respect to the positive weighted informativity
PWI.
7. Do later steps, assuming that l
k
is added to
Body foreachliteral l
k
inBEAM.
8. Dynamical ly restrict types in Body by cal ling
the dynamictype restriction by positive
exam-ples.
9. If the stopping condition is satised, return
(Head:0Body).
10. Goto5.
5 Illustration of a Learning Process
Now, we examine thetwo short notices of new
productsreleaseinTable3. Thefollowingtable
shows a sample template for articles rep orting
a new pro ductrelease.
Template
1. articleid:
2. company:
3. pro duct:
4. releasedate:
5.1 Preparation
Suppose that the following semantic
represen-tations is obtainedfrom Article1.
(c1) announce( article => 1,
tense => past,
tn1 => "this week",
n1 => "ABC Corp.",
n10 => (c2) ).
(c2) release( article => 1,
tense => future,
tn1 => "Jan. 20",
Articleid Sentence
#1 \ABC Corp. this week announced thatitwill release a colorprinteron Jan. 20."
#2 \XYZCorp. released acolorscanner lastmonth."
The lledtemplateforArticle1 isasfollows.
Template1
1. articleid: 1
2. company: ABCCorp.
3. pro duct: acolorprinter
4. release date: Jan. 20
Supposethatthefollowingsemantic
represen-tationis obtainedfrom Article2.
(c3) release( article => 2,
tense => past,
tn1 => "last month",
n1 => "XYZ Corp.",
n2 => "a color scanner" ).
The lledtemplateforArticle2 isasfollows.
Template2
1. articleid: 2
2. company: XYZCorp.
3. pro duct: acolorscanner
4. release date: last month
5.2 Head Construction
Twop ositiveexamplesareselectedforthe
tem-plate slot\company".
company(article-number => 1
name => "ABC Corp").
company(article-number => 2
name => "XYZ Corp").
By computing a least general generalization
(lgg)sasaki97,the followingheadisobtained:
company( article-number => Art: number
name => Co: organization).
5.3 Body Construction
Generatep ossible literals 1
by combining
predi-cate names and variables, then check the PWI
1
\literals" here means atomic formulae or negated
values of clauses to which one of the literal
added. Inthiscase,supp osethataddingthe
fol-lowing literal with predicate release is theb est
one. After the dynamic type restriction, the
current clause satises the stopping condition.
Finally,theruleforextracting\companyname"
is returned. Extraction rules for other slots
\product"and \releasedate"canb elearned in
thesamemanner. Notethatseveralliteralsmay
be needed in the bo dy of the clause to satisfy
the stoppingcondition.
company(article-number => Art:number,
name => Co: organization )
:- release( article => Art,
tense => T: tense,
tn1 => D: time,
n1 => Co,
n2 => P: product ).
5.4 Extraction
Now, we have the following semantic
represen-tation extractedfrom thenewarticle:
Article3: \JPNCorp. hasreleasedanewCD
player." 2
(c4) release( article => 3,
tense => perfect_present,
tn1 => nil,
n1 => "JPN Corp.",
n2 => "a new CD player" ).
ApplyingthelearnedIErulesandotherrules,
wecan obtain thelledtemplateforArticle3.
Template 3
1. articleid: 3
2. company: JPNCorp.
3. pro duct: CD player
4. releasedate:
2
in-(a)Without datacorrection
company pro duct release date announce date price
Precision 89.6% 80.5% 90.6% 100.0% 58.4%
Recall 82.1% 66.7% 66.7% 82.4% 60.8%
Averagetime(sec.) 15.8 22.1 14.4 2.2 11.0
(b)With datacorrection
company pro duct release date announce date price
Precision 91.1% 80.0% 92.3% 100.0% 87.1%
Recall 85.7% 69.7% 82.8% 88.2% 82.4%
Averagetime(sec.) 22.9 25.2 33.55 5.15 11.9
6 Experimental Results
6.1 Setting of Experiments
We extracted articles related to the release of
new pro ducts from a one-year newspap er
cor-pus written in Japanese 3
. One-hundred
arti-cles were randomly selected from 362 relevant
articles. The template we used consisted of
ve slots: company name, product name,
re-lease date, announce date, and price. We also
lled one template for each article. After
an-alyzing sentences, case frames were converted
intoatomicformulaerepresentingsemantic
rep-resentationsasdescribedinSection2and3. All
the semantic representations were given to the
learnerasbackgroundknowledge,andthelled
templates were givenas p ositive examples. To
sp eed-upthelearningpro cess,weselected
pred-icatenamesthatarerelevanttothewordsinthe
templatesasthetargetpredicatestobeusedby
theILPsystem,andwealsorestrictedthe
num-b erof literalsintheb o dyofhyp othesestoone.
Precisionand recall,thestandardmetricsfor
IEtasks, arecounted byusing the
remove-one-out cross validation on the examples for each
item. We used a VArStation with thePentium
I I Xeon(450MHz) forthis exp eriment.
6.2 Results
Table4shows theresultsofourexperiment. In
theexp erimentoflearningfromsemantic
repre-sentations,includingerrorsincase-roleselection
and semantic category selection, precision was
3
We used articles from the MainichiNewspap ersof
very high. The precision of the learned rules
forpricewaslowb ecausethesemanticcategory
name automatically given to the price
expres-sions in the data were not quite appropriate.
Forthe ve items, 67-82% recall wasachieved.
Withthebackgroundknowledgehaving
seman-tic representationscorrectedbyhand,precision
was veryhigh and 70-88% recall wasachieved.
The precision ofpricewas markedlyimproved.
It is imp ortant that the extraction of ve
dierent piecesof information showed go o d
re-sults. ThisindicatesthattheILPsystemRHB +
has ahigh p otentialinIEtasks.
7 Related Work
Previous researches on generating IE rules
from texts with templates include
AutoSlog-TS (Rilo,1996), CRYSTAL (So derland et al.,
1995), PALKA (Kimet al., 1995),LIEP
(Hu-man, 1996) and RAPIER (Cali and Mooney,
1997). In our approach, we use the
type-oriented ILP system RHB +
, which is indep
en-dent of natural language analysis. This p oint
dierentiates our approach from the others.
Learning semantic-level IE rules using an ILP
system from semantic representations is also a
new challenge inIEstudies.
Sasaki (Sasaki and Haruno, 1997) applied
RHB +
totheextractionofthenumberofdeaths
and injuries from twenty ve articles. That
experiment was sucient to assess the p
This pap er describ ed a use of semantic
repre-sentationsforgeneratinginformationextraction
rules by applying a type-oriented ILP system.
Exp eriments were conducted on the data
gen-eratedfrom 100news articles inthe domainof
new pro duct release. The results showed very
high precision, recall of 67-82% without data
correction and 70-88% recall with correct
se-mantic representations. The extraction of ve
dierentpieces of information showed go od
re-sults. ThisindicatesthatourlearnerRHB +
has
a highpotentialinIE tasks.
References
H. At-Kaci and R.Nasr, LOGIN:A logic
pro-gramminglanguagewithbuilt-ininheritance,
Journal of Logic Programming, 3, pp.185{
215, 1986.
H. At-Kaci, B. Dumant, R. Meyer, A.
Podel-ski,andP.VanRoy,TheWildLifeHandbook,
1994.
M. E. Cali and R. J. Mo oney, Relational
LearningofPattern-MatchRulesfor
Informa-tion Extraction, Proc. of ACL-97 Workshop
in Natural Language Learning,1997.
S. B. Human, Learning Information
Extrac-tion Patterns from Examples, Statisticaland
SymbolicApproaches toLearningfor Natural
Language Processing,pp.246{260,1996.
S. Ikehara, M. Miyazaki, and A. Yokoo,
Clas-sication of language knowledge for
mean-ing analysis in machine translations,
Trans-actions of Information Processing Society
of Japan, Vol.34, pp.1692-1704, 1993. (in
Japanese)
S. Ikehara, S. Shirai, K. Ogura, A. Yokoo,
H. Nakaiwa and T. Kawaoka, ALT-J/E: A
JapanesetoEnglishMachineTranslation
Sys-tem for Communication with Translation,
Proc. of The 13th IFIP World Computer
Congress,pp.80{85, 1994.
S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo,
H. Nakaiwa, K. Ogura, Y. Oyama and
Y. Hayashi (eds.), The Semantic Attribute
System, Goi-Taikei | A Japanese
Lexi-con, Vol.1, Iwanami Publishing, 1997. (in
Japanese)
S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo,
H. Nakaiwa, K. Ogura, Y. Oyama and
IwanamiPublishing, 1997. (inJapanese)
J.-T. Kim and D. I. Moldovan, Acquisition
of Linguistic Patterns for Knowledge-Based
Information Extraction, IEEE Transaction
on Knowledge and Data Engineering (IEEE
TKDE),Vol.7,No.5, pp.713{724,1995.
S. Kurohashi and Y. Sakai, Semantic Analysis
ofJapaneseNounPhrases: ANew Approach
toDictionary-BasedUnderstandingThe37th
Annual Meeting of the Association for
Com-putationalLinguistics(ACL-99),pp.481{488,
1999.
W.Lehnert, C.Cardie,D.Fisher,J.McCarthy,
E.RiloandS.So derland,Universityof
Mas-sachusetts: MUC-4 Test Results and
Analy-sis,Proc.ofTheFourthMessage
Understand-ingConference(MUC-4),pp.151{158,1992.
J. Lloyd, Foundations of Logic Programming,
Springer,1987.
S. Muggleton, Inductive logic programming,
New Generation Computing, Vol.8, No.4,
pp.295-318,1991.
E. Rilo, Automatically Generating
Extrac-tion Pattern from Untagged Text, Proc.of
American Association for Articial
Intelli-gence(AAAI-96), pp.1044{1049,1996.
Y. Sasaki and M. Haruno, RHB +
: A
Type-OrientedILPSystem Learningfrom Positive
Data, Proc. of The 14th International Joint
Conferenceon Articial Intelligence
(IJCAI-97),pp.894{899,1997.
S.So derland,D.Fisher,J.Aseltine, W.Lenert,
CRYSTAL: Inducing a Conceptual
Dictio-nary, Proc. of The 13th International Joint
Conferenceon Articial Intelligence
(IJCAI-95),pp.1314{1319, 1995.
J. M. Zelleand R. J. Mo oney,J. B. Konvisser,
Combining Top-down and Bottom-up
Meth-o ds in Inductive Logic Programming, Proc
ofThe11thInternationalConferenceon