Herve Dejean
SeminarfurSprachwissenschaft
UniversitatTubingen
[email protected] ing en.d e
Abstract
This paperpresentsalearning systemfor
identify-ing syntacticstructures. This systemrelies on the
use ofbackgroundknowledge and default valuesin
ordertobuildupaninitialgrammarandtheuseof
theoryrenementinordertoimprovethisgrammar.
Thiscombinationprovidesagoodmachinelearning
framework for Natural Language Learning. We
il-lustrate this point with thepresentation of ALLiS,
alearningsystemwhichgeneratesaregular
expres-sion grammarof non-recursivephrases from
brack-etedcorpora.
1 Introduction
Applying Machine Learning techniques to Natural
Language Processing is a booming domain of
re-search. Oneofthereasonsisthedevelopmentof
cor-pora with morpho-syntactic and syntactic
annota-tion(Marcusetal.,1993),(Sampson,1995). One
re-centpopularsubtaskisthelearningofnon-recursive
NounsPhrases(NP)(RamshawandMarcus,1995),
(Tjong Kim Sang and Veenstra, 1999), (Mu~noz et
al.,1999),(Group,1998),(CardieandPierce,1999),
(Buchholzetal.,1999).
Whenotherlearningtechniques(symbolicor
sta-tistical)arewidelyusedinNaturalLanguage
Learn-ing,theoryrenement(AbeckerandSchmid,1996),
(Mooney,1993)seemstobeignored(except(Brunk
and Pazzani, 1995)). Theory renementconsistsof
improvinganexistingknowledgebasesothatit
bet-ter accords with data. No work using theory
re-nementappliedtothegrammarlearningparadigm
seems to have been developed. We would like to
point outin this articletheadequacybetween
the-oryrenementandNaturalLanguageLearning.
Toillustratethisclaim,wepresentALLiS
(Archi-tectureforLearningLinguisticStructures), a
learn-ingsystemwhichusestheoryrenementinorderto
learn non-recursive noun phrases (also called base
nounphrases)andnon-recursiveverbalphrases. We
will show that this technique combined with the
This research is funded be the TMR network Learning
use of default values provides a good architecture
tolearnnaturallanguagestructures.
Thisarticleisorganisedasfollows: Section2gives
anoverviewoftheoryrenement. Section3explains
theadvantageofcombiningdefault valuesand
the-oryrenementtobuildalearningsystem. Section4
describesthe general characteristicsof ALLiS, and
Section5explainsthelearningalgorithm. Theev
al-uationof ALLiS isdescribedSection 6. The
exam-ples which illustrate this article correspond to
En-glishNPs.
2 Theory Renement
We present here a brief introduction to theory
re-nement. Foramoredetailedpresentation,werefer
thereaderto(Abeckerand Schmid, 1996),(Brunk,
1996) or (Ourston and Mooney, 1990). (Mooney,
1993)denesitas:
Theory renement systems developed in
Machine Learningautomatically modify a
Knowledge Base to render it consistent
withasetofclassiedtrainingexamples.
This technique thus consists of improving a given
Knowledge Base (here a grammar) on the basis
of examples (here a treebank). Some impose to
modify the initial knowledge base as little as
pos-sible. Appliedin conjunctionwithexisting learning
techniques (Explanation-BasedLearning, Inductive
Logic Programming), TR seems to achieve better
results than these techniques used alone (Mooney,
1997). Theoryrenementismainlyused(andhasits
origin) in KnowledgeBased Systems (KBS) (Craw
andSleeman, 1990). Itconsistsoftwomainsteps:
1. Build a more or less correct grammar on the
basisof backgroundknowledge.
2. Renethisgrammarusing trainingexamples:
(a) Identifytherevisionpoints
(b) Correctthem
Therststep consistsof acquiring aninitial
step(therenement)comparesthepredictionofthe
initialgrammarwiththetrainingcorpusinorderto
rstly identify the revision points, i.e. points that
are not correctly described by the grammar, and
secondly, to correct these revision points. The
er-ror identicationand renementoperations are
ex-plainedSection 5.3.
The main dierence between a TR system and
othersymboliclearningsystemsisthataTRsystem
must be able to revise existing rules given to the
system as backgroundknowledge. (A system such
as TBL (Brill, 1993) can not be considered as TR
sinceitonlyacquiresnewrules). Inthecaseofother
techniques,newrulesarelearnedinordertoimprove
thegeneraleÆciencyofthesystem(selectionofthe
\best rule"accordingtoapreference function)and
notinordertocorrectaspecic rule.
3 Theory Renement, Default values
and Natural Language Learning
This section explains how default values combined
withtheoryrenementcan provideagoodmachine
learningframeworkforNLP.
3.1 The Use ofDefault Values
Theuse ofdefault valuesisnotnewin NLP (Brill,
1993), (Vergne and Giguet, 1998). Wecan observe
that often (but not necessarily) in a language, an
elementbelongstoapredominantclass(Vergneand
Giguet, 1998). Some systems such as stochastic
modelsusethispropertyimplicitly. Someothersuse
it explicitly. For instance, the general principleof
the Transformation-BasedLearning (Brill, 1993)is
toassigntoeachelementitsmostfrequentcategory,
and then to learn transformation rules which
cor-rectits initial categorisation. Asecond exampleis:
theparser described in (Vergne and Giguet, 1998).
They rst assign to each grammatical word a
de-faultcategory(defaulttag),andthenmightmodify
itthankstolocalcontextsandgrammaticalrelation
assignment(inordertodealwithconstraintsdueto
longdistance relationswhich can not beexpressed
bylocalcontexts).
Themainworkisdonebythelexicon andby
de-faultvalues(eveniffurtheroperationsareobviously
necessary)
Theseapproachesarethusdierentforthe
disam-biguation often used in tagging. The default rules
are notnumerous (one per tag), easyto
automati-cally generatebut theynevertheless producea
sat-isfactorystartinglevel.
3.2 The Combination ofDefault Values
with TR
The ideaon which ALLiS relies is thefollowing: a
rst\naivegrammar"is builtupusingdefault
val-each element its default category(the algorithm is
explainedinSection5.2). Theruleslearnedare
cat-egorisationrules: assignacategorytoanelement (a
tag ora word). Since an element is automatically
assignedtoits defaultcategory, thesystemhasnot
tolearnthecategorisationrulesforitscategory,and
justlearnscategorisationruleswhichcorrespondto
casesinwhichtheelementdoesnotbelongtoits
de-faultcategory. This minimised thenumberofrules
thathaveto belearned. Supposetheelementecan
belong to several categories (a frequent case). The
rstrule learnedisthe\default"rule: assign(e,dc),
wheredc is thedefault categoryofe. Then ALLiS
justlearnsrulesforcaseswhereedoesnobelongto
its default category. The numerous rules
concern-ingthedefault categoryarereplaced bythesimple
defaultrule.
4 ALLiS
ThegoalofALLiS 1
istoautomaticallybuilda
regu-larexpressiongrammarfromabracketedandtagged
corpus 2
. In this training data, only the structures
wewanttolearnaremarkedat theirboundariesby
squarebrackets 3
. The followingsentence showsan
exampleofthetrainingcorpusfortheNPstructure
(onlybase-NPsoccurinsidebrackets).
In/IN [ early/JJ trading/NN ] in/IN [
Hong/NNP Kong/NNP ] [ Monday/NNP
] ,/, [ gold/NN ] was/VBD quoted/VBN
at/IN [ $/$ 366.50/CD ] [ an/DT
ounce/NN ]./.
ALLiSusesaninternalformalismin orderto
rep-resentthegrammarrules. Inorder toparse atext,
amodule converts its formalism into a regular
ex-pressiongrammarwhichcanbeusedbyaparser
us-ingsuch representation(twomodules exist: one for
theCASS parser(Abney, 1996)and onefor XFST
(Karttunenet al.,1997)).
Followingthe principle of theoryrenement, the
learningtaskiscomposedoftwosteps. Therststep
isthegenerationoftheinitialgrammar. This
gram-mar is generated from examples and background
knowledge (Section 5). This initial grammar
pro-videsanincompleteand/orincorrectanalysisofthe
data. Thesecondstepistherenementofthis
gram-mar(Section 5.3). Duringthis step,the validity of
thegrammarrules ischecked andtherules are
im-proved(rened)ifnecessary. Thisimprovement
cor-responds to nd contexts in which elements which
1
http://www.sfb441.uni-t ueb inge n.d e/~d eje an/
chunker.html.
2
TheWSJcorpus(Marcusetal.,1993).
3
(Mu~noz et al., 1999) showed that this representation
tendstoprovidebetterresultsthantherepresentationused
are considered to be members of the structure do
notbelongto thisstructure(andreciprocally).
We give here a simple example to illustrate the
learning process. The rst step (initial grammar
generation)categorisesthetagJJ (adjective)as
be-longing by default to the NP structure if it occurs
before a noun. The second step(renement) nds
outthatsomeadjectivesdonotobeytotheserules 4
.
The renement is triggered in order to modify the
defaultrulesothattheseexceptionscanbecorrectly
processed.
Thus, the learning algorithm simply consists of
categorising the elements of the corpus (tags and
words) into specic categories,and this
categorisa-tionallowstheextractionofthestructureswewant
tolearn. Thesecategoriesareexplainedinthenext
section.
5 The Learning System
5.1 The Background Knowledge
Inordertoeasethelearning,thesystemuses
back-ground knowledge. This knowledge provides a
for-mal and general description of the structures that
ALLiScanlearn. Wesupposethatthestructuresare
composed of anucleus with optional left and right
adjuncts. Weheregiveinformaldenitions,the
for-mal/distributionalonesaregiveninSection 5.2.
The nucleus is the head of the structure. We
authorisethepresenceof severalnucleiin thesame
structure.
All the other elements in the structure (except
the linker) are considered as adjuncts. They are
in dependence relation with the head of the
struc-ture. The adjuncts are characterisedby their
posi-tion(left/right)relativeto thenucleus.
Alinkerisaspecialelementwhichbuildsan
en-docentric structure with two elements. It usually
correspondstocoordination 5
.
Anelement(nucleusoradjunct)mightpossessthe
break property. This notion is introduced (as in
(RamshawandMarcus,1995),(Mu~nozetal.,1999))
inordertodealwithsequenceswhereadjacentnuclei
composeseveralstructures(Section5.2.4).
This pattern can be seen as a variant of the
X-bar template (Head, Spec and Comp), which was
already used in a learning system (Berwick, 1985)
(although Comp is notuseful forthe non-recursive
structures).
Thepossibledierentcategoriesofanelementare
summarisedin Figure5.1.
Thefollowingsentenceshowsan exampleof
cat-egorisation (elements which do not appear in the
structure(NP)aretaggedO):
4
Forexample: [the/DT7/CD%/NNbenchmark/NN
is-sue/NN]due/JJ[October/NNP1999/CD].
5
Onlylinkersoccurringbetween thetwocoordinated
ele-NU
Figure1: Thedierentcategories.
In
Thestructureisformallydened as:
S !A
Thesymbol corresponds to the Kleene star. For
thesakeoflegibility,wedonotintroducelinkersin
this expression. But each symbol X (NU, A) can
bedened by therule 6
X !X jX lX where lis
thelistoflinkers. ThesymbolsB+andB indicate
whethertheelementhasthebreakerpropertyornot.
Since the corpus does not contain information
aboutthesedistributional categories,ALLiS hasto
gure them out. This categorisation relies on the
distributionalbehaviouroftheelements,andcanbe
automaticallyachieved.
5.2 The InitialCategorisation
Thegeneralideatocategoriseelementsistouse
spe-ciccontextswhichpoint outsomeofthe
distribu-tionalpropertiesofthecategory. Thecategorisation
isa sequentialprocess. First the nucleihave to be
found out. For each tag of the corpus, we apply
the function f
nu
(described below). This function
selects a list of elements which are categorised as
nuclei. Thefunction f
b
isapplied tothis listin
or-der to gure out nuclei which are breakers. Then
theadjuncts are foundout, and thefunction f
b is
alsoappliedtothemtogureoutbreakers.
5.2.1 Categorisation ofNuclei
Thecontextusedtondoutthenucleireliesonthis
simple following observation: A structure requires
at least one nucleus 7
. Thus, the elements that
oc-curalone in astructure are assimilatedto nucleus,
sincea structure requires a nucleus. Forexample,
thetagsPRP (pronouns)and NNP(propernouns)
may compose alone a structure (respectively 99%
6
The regular expression formalism does not allow such
rule,andthen,thisrecursionissimulated.
7
Thepartialstructures(structureswithoutnucleus)
and48%ofthese tagsappear alonein NP)but the
tagJJ appears aloneonly0.009%. Wededucethat
PRPandNNPbelongtothenucleiandnotJJ.But
thiscriteriondoesnotallowtheidentication ofall
thenuclei. Someoftenappearwithadjuncts(an
En-glishnoun(NN)often 8
occurswithadetermineror
anadjectiveandthusappearsaloneonly13%). The
singleuseofthischaracteristicprovidesacontinuum
ofvalueswheretheautomaticsetupofathreshold
betweenadjunctsand nucleiisproblematicand
de-pends onthestructure. Tosolvethisproblem, the
ideais todecompose thecategorisationof nucleiin
twosteps. Firstweidentify characteristicadjuncts.
The adjuncts can not appear alone since they
de-pend on a nucleus. The function f
char
is built so
that it provides averysmall valuefor adjuncts. If
thevalueofanelementislowerthanagiven
thresh-old (
char
=0:05), then itis categorisedasa
char-acteristicadjunct.
P correspondsto thenumberof occurrencesof
the pattern P in the corpus C. For example the
numberofoccurrencesin thetrainingcorpusofthe
pattern[JJ]is99,andthenumberofoccurrencesof
thepatternJJ(includingthepattern[JJ])is11097.
So f
char
(JJ) = 0:009, value being low enough to
considerJJasacharacteristicadjunct. Thelist
pro-videdbyf
char
forEnglishNPis:
A
char
=fDT;PR P$;POS;JJ;JJR ;JJS;;g
These elements can correspond to left orright
ad-juncts. All theadjunctsare notidentied,but this
list allows the identication of the nuclei as
ex-plainedin thenextparagraph.
Thesecondstepconsistsofintroducingthese
ele-mentsintoanewpatternusedbythefunction f
nu .
Thispatternmatcheselementssurroundedbythese
characteristicadjuncts. Itthusmatchesnucleiwhich
oftenappearwithadjuncts. Sinceasequenceof
ad-juncts(asanadjunctalone)cannotalonecomposea
completestructure,X onlymatcheselementswhich
correspondtoanucleus.
f
Thefunctionf
nu
isagooddiscriminationfunction
betweennuclei and adjunctsand providesverylow
values for adjuncts and veryhigh values for nuclei
(table1).
5.2.2 Categorisationof Adjuncts
Oncethenucleiareidentied,wecaneasilyndout
theadjuncts. Theycorrespondto allthe other
ele-ments which appear in the context: There are two
8
nu
POS 1759 0.00 no
PRP$ 1876 0.01 no
JJ 11097 0.02 no
DT 18097 0.03 no
RB 919 0.06 no
NNP 11046 0.74 yes
NN 21240 0.87 yes
NNPS 164 0.93 yes
NNS 7774 0.95 yes
WP 527 0.97 yes
PRP 3800 0.99 yes
Table1: DetectionofsomenucleioftheEnglishNP
(nounsandpronouns).
kindsof adjuncts: the left and the right adjuncts.
Thecontextsusedare:
[ NU ] fortheleftadjuncts
[ A
l
*NU ] fortherightadjuncts
Ifanelementappearsat theposition ofthe
under-score, it is categorised as adjunct. Once the left
adjunctsarefoundout,theycanbeusedforthe
cat-egorisationoftherightadjuncts. Theythus appear
in the context as optional elements(this is helpful
to capturecircumpositions).
SincetheAdjectivePhraseoccurringinsideanNP
is notmarkedin theUpenn treebank,weintroduce
the class of adjunct of adjunct. The contexts used
to ndouttheadjunctsoftheleftadjunct are:
[ A
NU ]fortherightadjuncts ofA
l
The contexts are similar for adjuncts of right
ad-juncts.
5.2.3 The Linkers
By denition, a linker connects two elements, and
appears between them. The contexts used to nd
linkersare:
[ NU NU ] linkerofnuclei
[ A ANU ] linkerofleftadjuncts
[ NUA A ] linkerofrightadjuncts
Elements which occurin these contexts but which
havealreadybeencategorisedasnucleusoradjuncts
are deletedfromthelist.
5.2.4 The Break Property
Asequenceofseveralnuclei(andtheadjunctswhich
depend on them) canbelong to aunique structure
orcomposeseveraladjacentstructures. An element
isabreakerifitspresenceintroducesabreakintoa
sequence ofadjacentnuclei. Forexample,the
singlestructureinthetrainingcorpus.
...[the/DT coming/VBG week/NN]
[the/DT foreign/JJ exchange/NN
mar-ket/NN] ...
ThetagDT introducesabreakonitsleft,butsome
tagscanintroduceabreakontheirrightorontheir
left and right. Forinstance, thetagWDT (NU by
default) introduces a break on its left and on its
right. In other words, this tag can not belong to
thesamestructureastheprecedingadjacentnucleus
andto thesamestructureasthefollowingadjacent
nucleus.
...[ railroads/NNS and/CC trucking/NN
companies/NNS ] [ that/WDT ]
be-gan/VBDin/IN[1980/CD]...
...in/IN[which/WDT][people/NNS]
generally/RBare/VBP...
Inordertodetectwhichtaghasthebreakproperty,
webuilduptwofunctionsf
bleft andf
bright .
f
bleft (X)=
P
C
NU ] [X
P
C
wb NU X
f
bright (X)=
P
C
X ][NU
P
C
wb XNU
NU:fnucl eig
C
wb
:corpuswithoutbrackets
Thesefunctionsareusedtocomputethebreak
prop-erty for nuclei, but also for adjuncts (In this case,
the pattern X is completed by adding the element
NU to the left or to the right of X (the potential
adjunct) according to the kind of adjunct (left or
rightadjunct)). Thetable 2showssome valuesfor
sometags. Anelementcanbealeftbreaker(DT),a
rightbreaker(noexampleforEnglishNPatthetag
level),orboth(PRP).Thebreak propertyis
gener-allywell-markedandthethresholdiseasytosetup
(0.66in practice).
TAG f
bleft f
bright
DT 0.97(yes) 0.00(no)
PRP 0.97(yes) 0.68(yes)
POS 0.95(yes) 0.00(no)
PRP$ 0.94(yes) 0.00(no)
JJ 0.44(no) 0.00(no)
NN 0.04(no) 0.11(no)
NNS 0.03(no) 0.14(no)
Table2: Breakerdetermination. Valuesofthe
func-tionsf
bleft andf
bright
forsomeelements.
In therenement step(Section 5.3), thebreaker
itstag(NN)isnot.
5.3 The RenementStep
5.3.1 The notion ofreliability
Thepreceding functions identifythecategoryofan
elementwhenitoccurs inthe structure. Butan
ele-mentcanoccurinthestructureaswellasoutofthe
structure. Forexample,thetagVBGisonly
consid-eredasadjunctwhenitoccursinthestructure.
Nev-ertheless,itmainlyoccursoutofthestructure(84%
ofitsoccurrences). Ifanelementmainly 9
occursout
ofthestructure,itisconsideredasnon-reliable and
itsdefaultcategoryisOUT.Foreachelement
occur-ring inside the structure, its reliable is tested. The
initialgrammar correspondsto thegrammarwhich
onlycontainsthereliableelements. Itsprecisionand
itsrecallarearound86%.
How is determined the reliability of an element?
Thisnotionofreliableelementiscontextualand
de-pendsonthecategoryoftheelement.
Forthenuclei,thecontextisempty. Wejust
com-putetheratiobetweenthenumberofoccurrencesin
thestructureoverthenumberofoccurrences
occur-ringoutsideofthestructure.
For the adjuncts, the context includes an
adja-centnucleus(ontherightforleftadjunctsoronthe
left forrightadjuncts). Forinstance, thetag JJ is
categorisedasleftadjunctfortheEnglishNP.It
ap-pears 9617 times before a nucleus and 9489 times
in the structure. It is thus considered as reliable,
anditsdefault categoryisleft adjunct. Inthecase
wherethetagJJoccurswithoutnucleusonitsright
(a predicative use), it is not considered as adjunct
and this kind of occurrences is not used to
deter-minethereliabilityoftheelement. Onthecontrary,
the tag VBG appears 468 times before a nucleus,
but,in this context,itoccurs only138timesin the
structure. Thisisnotenough(29%)toconsiderthe
element as areliable left adjunct, and thus its
de-fault categoryis OUT.Fortheadjunct of adjunct,
thecontextincludes adjunctandnucleus.
5.3.2 Detection oferrors
Oncetheinitialgrammarisbuiltup,itserrorshave
to be corrected. Thedetection of the errors
corre-spondstoamiscategorisationofatag. Anautomatic
errordonebytheinitialgrammaristowrongly
anal-yse the structures composed with non-reliable
ele-ments(falsenegativeexamples). Each time that a
non-reliableelement occurs in the structure
corre-spondstoanerror. Forinstance,theinitialgrammar
cannotcorrectlyrecognisethefollowingsequenceas
anNP, the default categoryof the tagVBG being
OUT(outsidethestructure):
...[the/DTcoming/VBGweek/NN]...
The second kind of errors corresponds to
se-quenceswronglyrecognisedasstructures(false
pos-itiveexamples). Thiskind of erroris generated by
reliableelementswhichexceptionallydonotoccurin
the structure. In the following example, order/NN
occursoutsideofthestructure,althoughthedefault
categoryof the tag NN is NU (nucleus), and thus
theinitialgrammarrecognisesanNP.
...in/INorder/NNto/TO pay/VB...
5.3.3 Correction
Inbothkindsof errors,the sametechniqueisused
to correct them. For this purpose, ALLiS disposes
oftwooperators,thecontextualisation andthe
lex-icalisation.
Thecontextualisation consistsofnding out
con-texts in order to x the errors. Theideais to add
constraints for recategorising non-reliable elements
asreliable 10
. Thepresenceofsomespecicelements
cancompletelychangethebehaviourofatag. The
table3showsthelistofcontextswherethetagVBN
isnotcategorisedasOUTbutas leftAdjunct.
PRP$ VBG NU
IN[ VBG NU
DT VBG NU
JJ VBG NU
POS VBG NU
VBG[ VBG NU
Table3: Somecontextswhere the non-reliable
ele-mentVBN becomesreliable.
For each tag occurring in the structure, all the
possible contexts 11
are generated. For the
non-reliable tags (rst kind of error), we evaluate the
reliability of them contextually, and we delete the
contexts in which the tag is still non-reliable (the
list of contexts canbe empty, and in this case the
errorcannotbexed). Forthereliabletags(second
kindoferror),wekeepthecontextsinwhichthetag
iscategorisedOUT.
The lexicalisation consists of introducing lexical
information: theword level. Some words canhave
a specic behaviour which does not appear at the
Part-Of-Speech(POS)level. Forinstance,theword
yesterday isaleftbreaker,behaviourwhichcannot
begured outat thePOS level (Table4). The
in-troduction of the lexicalisationimprovesthe result
by2%(Section6).
The lexicalisation and the contextualisation can
becombinedwhenbothseparatelyarenotpowerful
enoughtoxtheerror. Forexample,thewordabout
taggedRB (defaultcategoryofRB:OUT) followed
10
Thesametechniqueisusedin(Sima'an,1997).
11
Thecontextsdependonthe categoryofthetag,butare
about/RB( CD) OUT A
l,B+
order(IN TO) NU OUT
yesterday/NN NU
B-NU
B+
operating/VBG OUT A
l,B-last/JJ A
l,B-A
l,B+
Table4: Somespecic lexicalbehaviours.
by the tag CD is recategorised asleft adjunct and
leftbreaker(Table4).
6 Evaluation
Wenowshow someresultsand givesome
compar-isons with other works (Table 5). The results are
quite similar to other approaches. Two rates are
measured: precisionanrecall.
R=
Numberof correctproposed patterns
Numberof correct patterns
P=
Numberof correctproposed patterns
Number of proposedpatterns
The training data are composed of the sections
15-18oftheWallStreetJournalCorpus(Marcuset
al., 1993), and we use the section 20 for the test
corpus 12
. The data is tagged with the Brill
tag-ger. TheworksgeneratingsymbolicruleslikeALLiS
are(RamshawandMarcus,1995)(T
ransformation-Based learning) and (Cardie and Pierce, 1998)
(error-driven pruningof treebank grammars).
AL-LiS provides better results than them. (Argamon
etal.,1998)useaMemory-BasedShallowLearning
system, (Tjong Kim Sangand Veenstra, 1999) the
Memory-BasedLearningmethodand(Mu~nozetal.,
1999)usesanetwork oflinearfunctions. Thelatter
work seems to integrate better lexical information
sinceALLiSgetsbetterresultswithPOSonly.
POSonly withwords
NP MPRZ99 90.3/90.9 92.40/93.10
ALLiS 91.0/91.2 92.56/92.36
TV99 92.50/92.25
ADK98 91.6/91.6
RM95 90.7/90.5 92.3/91.8
CP98 91.1/90.7
VP ALLiS 91.39/90.52 92.15/91.95
Table 5: Resultsfor NP and VP structures
(preci-sion/recall).
The main errors done by ALLiS are due to
er-rorsof tagging (the corpusis tagged with theBrill
tagger) or errors in the bracketing of the training
corpus. Then, the second type of errors concerns
12
correspondto51%oftheoverallerrors. Wecannd
the same typology in other works (Ramshaw and
Marcus, 1995), (Cardieand Pierce, 1998). We did
some tries in order to manually improve the nal
grammar,but theonlytypeof errorswhich canbe
manuallyimprovedconcernstheproblemofthe
quo-tationmarks(the improvementisaboutof0.2%in
precisionandrecall).
Errortypes # %
tagging/bracketing errors 57 28.5%
coordination 45 22.5%
gerund 15 7.5%
adverb 13 6.5%
appositives 13 6.5%
quotationmarks,punctuation 10 5%
pastparticiple 9 4.5%
that (IN) 6 3%
Table6: Typologyofthe200rsterrors.
References
Andreas Abecker and Klaus Schmid. 1996. From
theory renement to kb maintenance: aposition
statement. InECAI'96,Budapest,Hungary.
StevenAbney. 1996. Partialparsingvianite-state
cascades. InProceedingsoftheESSLLI'96Robust
ParsingWorkshop.
Shlomo Argamon, Ido Dagan, and Yuval
Kry-molowski. 1998. A memory-based approach to
learning shallow natural language patterns. In
COLING'98,Montreal.
RobertC.Berwick. 1985. Theacquisitionof
syntac-tic knowledge. MITpress,Cambridge.
Eric Brill. 1993. A Corpus-Based Approach to
Language Learning. Ph.D. thesis,Departmentof
ComputerandInformationScience,Universityof
Pennsylvania.
Cliord Alan Brunk and Michael Pazzani. 1995.
A lexically based semantics bias for theory
revi-sion. InMorganKauman,editor,Twelfth
Inter-national Conference on Machine Learning, pages
81{89.
Cliord Alan Brunk. 1996. An investigation
of Knowledge Intensive Approaches to Concept
Learning and Theory Renement. Ph.D. thesis,
UniversityofCalifornia,Irvine.
SabineBuchholz, JornVeenstra,andWalter
Daele-mans. 1999. Cascaded grammatical relation
as-signment. In Proceedings of EMNLP/VLC-99,
pagespp.239{246,UniversityofMaryland,USA.
ClaireCardieandDavidPierce. 1998. Error-driven
pruning of treebank grammars for base noun
guistics (COLING-ACL '98).
Claire Cardie and David Pierce. 1999. The role of
lexicalization and pruning for base noun phrase
grammars. In Proceedings of the Sixteenth
Na-tional Conferenceon Articial Intelligence.
SusanCrawandD.Sleeman. 1990. Automatingthe
renement of knowledge-based systems. In
Pro-ceedings of the ECAI'90 Conference, pages 167{
172.
The XTAG Research Group. 1998. A lexicalized
treeadjoininggrammarforenglish. Technical
Re-portIRCS98-18,UniversityofPennsylvania.
Lauri Karttunen, Tamas Gal, and Andre Kempe.
1997. Xerox nite-state tool. Technical report,
XeroxResearchCentre Europe,Grenoble.
MitchellMarcus, BeatriceSantorini,andMarcAnn
Marcinkiewicz. 1993. Buildingalargeannotated
corpus of english: thepenn treebank.
Computa-tional Linguistics,19(2):313{330.
Raymond J.Mooney. 1993. Induction overthe
un-explained: Using overly-general domain theories
toaidconceptlearning. MachineLearning,10:79.
Raymond J. Mooney. 1997. Inductive logic
pro-gramming for natural language processing. In
Sixth International Inductive Logic Programming
Workshop,pages205{224,Stockholm,Sweden.
Marcia Mu~noz,Vasin Punyakanok, Dan Roth, and
DavZimak. 1999. Alearningapproachtoshallow
parsing. InProceedings of EMNLP-WVLC'99.
DirkOurstonandRaymondMooney. 1990.
Chang-ingtherules: Acomprehensiveapproachtotheory
renement. InProceedings of the Eight National
Conference on Articial Intelligence, pages 815{
820.
LanceA. Ramshawand Mitchell P. Marcus. 1995.
Text chunking using transformation-based
learn-ing. InACLThirdWorkshop onVeryLarge
Cor-pora,pages82{94.
GeoreySampson. 1995. Englishfor theComputer.
The SUSANNE Corpus and Analytic Scheme.
Oxford: ClarendonPress.
Khalil Sima'an. 1997. Explanation-based learning
of dataorientedparsing. In T.MarkEllison,
ed-itor, Computational Natural Language Learning
(CoNLL), ACL/EACL-97,Madrid.
Erik Tjong Kim Sang and Jorn Veenstra. 1999.
Representing text chunks. In Proceedings of
EACL'99, Association for Computational
Lin-guistics, Bergen.
JacquesVergne and Emmanuel Giguet. 1998.
Re-gards theoriques sur le "tagging". In
proceed-ings of TraitementAutomatiquedesLangues