Theory Refinement and Natural Language Learning

(1)

Herve Dejean

SeminarfurSprachwissenschaft

UniversitatTubingen

[email protected] ing en.d e

Abstract

This paperpresentsalearning systemfor

identify-ing syntacticstructures. This systemrelies on the

use ofbackgroundknowledge and default valuesin

ordertobuildupaninitialgrammarandtheuseof

theoryrenementinordertoimprovethisgrammar.

Thiscombinationprovidesagoodmachinelearning

framework for Natural Language Learning. We

il-lustrate this point with thepresentation of ALLiS,

alearningsystemwhichgeneratesaregular

expres-sion grammarof non-recursivephrases from

brack-etedcorpora.

1 Introduction

Applying Machine Learning techniques to Natural

Language Processing is a booming domain of

re-search. Oneofthereasonsisthedevelopmentof

cor-pora with morpho-syntactic and syntactic

annota-tion(Marcusetal.,1993),(Sampson,1995). One

re-centpopularsubtaskisthelearningofnon-recursive

NounsPhrases(NP)(RamshawandMarcus,1995),

(Tjong Kim Sang and Veenstra, 1999), (Mu~noz et

al.,1999),(Group,1998),(CardieandPierce,1999),

(Buchholzetal.,1999).

Whenotherlearningtechniques(symbolicor

sta-tistical)arewidelyusedinNaturalLanguage

Learn-ing,theoryrenement(AbeckerandSchmid,1996),

(Mooney,1993)seemstobeignored(except(Brunk

and Pazzani, 1995)). Theory renementconsistsof

improvinganexistingknowledgebasesothatit

bet-ter accords with data. No work using theory

re-nementappliedtothegrammarlearningparadigm

seems to have been developed. We would like to

point outin this articletheadequacybetween

the-oryrenementandNaturalLanguageLearning.

Toillustratethisclaim,wepresentALLiS

(Archi-tectureforLearningLinguisticStructures), a

learn-ingsystemwhichusestheoryrenementinorderto

learn non-recursive noun phrases (also called base

nounphrases)andnon-recursiveverbalphrases. We

will show that this technique combined with the

This research is funded be the TMR network Learning

use of default values provides a good architecture

tolearnnaturallanguagestructures.

Thisarticleisorganisedasfollows: Section2gives

anoverviewoftheoryrenement. Section3explains

theadvantageofcombiningdefault valuesand

the-oryrenementtobuildalearningsystem. Section4

describesthe general characteristicsof ALLiS, and

Section5explainsthelearningalgorithm. Theev

al-uationof ALLiS isdescribedSection 6. The

exam-ples which illustrate this article correspond to

En-glishNPs.

2 Theory Renement

We present here a brief introduction to theory

re-nement. Foramoredetailedpresentation,werefer

thereaderto(Abeckerand Schmid, 1996),(Brunk,

1996) or (Ourston and Mooney, 1990). (Mooney,

1993)denesitas:

Theory renement systems developed in

Machine Learningautomatically modify a

Knowledge Base to render it consistent

withasetofclassiedtrainingexamples.

This technique thus consists of improving a given

Knowledge Base (here a grammar) on the basis

of examples (here a treebank). Some impose to

modify the initial knowledge base as little as

pos-sible. Appliedin conjunctionwithexisting learning

techniques (Explanation-BasedLearning, Inductive

Logic Programming), TR seems to achieve better

results than these techniques used alone (Mooney,

1997). Theoryrenementismainlyused(andhasits

origin) in KnowledgeBased Systems (KBS) (Craw

andSleeman, 1990). Itconsistsoftwomainsteps:

1. Build a more or less correct grammar on the

basisof backgroundknowledge.

2. Renethisgrammarusing trainingexamples:

(a) Identifytherevisionpoints

(b) Correctthem

Therststep consistsof acquiring aninitial

(2)

step(therenement)comparesthepredictionofthe

initialgrammarwiththetrainingcorpusinorderto

rstly identify the revision points, i.e. points that

are not correctly described by the grammar, and

secondly, to correct these revision points. The

er-ror identicationand renementoperations are

ex-plainedSection 5.3.

The main dierence between a TR system and

othersymboliclearningsystemsisthataTRsystem

must be able to revise existing rules given to the

system as backgroundknowledge. (A system such

as TBL (Brill, 1993) can not be considered as TR

sinceitonlyacquiresnewrules). Inthecaseofother

techniques,newrulesarelearnedinordertoimprove

thegeneraleÆciencyofthesystem(selectionofthe

\best rule"accordingtoapreference function)and

notinordertocorrectaspecic rule.

3 Theory Renement, Default values

and Natural Language Learning

This section explains how default values combined

withtheoryrenementcan provideagoodmachine

learningframeworkforNLP.

3.1 The Use ofDefault Values

Theuse ofdefault valuesisnotnewin NLP (Brill,

1993), (Vergne and Giguet, 1998). Wecan observe

that often (but not necessarily) in a language, an

elementbelongstoapredominantclass(Vergneand

Giguet, 1998). Some systems such as stochastic

modelsusethispropertyimplicitly. Someothersuse

it explicitly. For instance, the general principleof

the Transformation-BasedLearning (Brill, 1993)is

toassigntoeachelementitsmostfrequentcategory,

and then to learn transformation rules which

cor-rectits initial categorisation. Asecond exampleis:

theparser described in (Vergne and Giguet, 1998).

They rst assign to each grammatical word a

de-faultcategory(defaulttag),andthenmightmodify

itthankstolocalcontextsandgrammaticalrelation

assignment(inordertodealwithconstraintsdueto

longdistance relationswhich can not beexpressed

bylocalcontexts).

Themainworkisdonebythelexicon andby

de-faultvalues(eveniffurtheroperationsareobviously

necessary)

Theseapproachesarethusdierentforthe

disam-biguation often used in tagging. The default rules

are notnumerous (one per tag), easyto

automati-cally generatebut theynevertheless producea

sat-isfactorystartinglevel.

3.2 The Combination ofDefault Values

with TR

The ideaon which ALLiS relies is thefollowing: a

rst\naivegrammar"is builtupusingdefault

val-each element its default category(the algorithm is

explainedinSection5.2). Theruleslearnedare

cat-egorisationrules: assignacategorytoanelement (a

tag ora word). Since an element is automatically

assignedtoits defaultcategory, thesystemhasnot

tolearnthecategorisationrulesforitscategory,and

justlearnscategorisationruleswhichcorrespondto

casesinwhichtheelementdoesnotbelongtoits

de-faultcategory. This minimised thenumberofrules

thathaveto belearned. Supposetheelementecan

belong to several categories (a frequent case). The

rstrule learnedisthe\default"rule: assign(e,dc),

wheredc is thedefault categoryofe. Then ALLiS

justlearnsrulesforcaseswhereedoesnobelongto

its default category. The numerous rules

concern-ingthedefault categoryarereplaced bythesimple

defaultrule.

4 ALLiS

ThegoalofALLiS 1

istoautomaticallybuilda

regu-larexpressiongrammarfromabracketedandtagged

corpus 2

. In this training data, only the structures

wewanttolearnaremarkedat theirboundariesby

squarebrackets 3

. The followingsentence showsan

exampleofthetrainingcorpusfortheNPstructure

(onlybase-NPsoccurinsidebrackets).

In/IN [ early/JJ trading/NN ] in/IN [

Hong/NNP Kong/NNP ] [ Monday/NNP

] ,/, [ gold/NN ] was/VBD quoted/VBN

at/IN [ $/$ 366.50/CD ] [ an/DT

ounce/NN ]./.

ALLiSusesaninternalformalismin orderto

rep-resentthegrammarrules. Inorder toparse atext,

amodule converts its formalism into a regular

ex-pressiongrammarwhichcanbeusedbyaparser

us-ingsuch representation(twomodules exist: one for

theCASS parser(Abney, 1996)and onefor XFST

(Karttunenet al.,1997)).

Followingthe principle of theoryrenement, the

learningtaskiscomposedoftwosteps. Therststep

isthegenerationoftheinitialgrammar. This

gram-mar is generated from examples and background

knowledge (Section 5). This initial grammar

pro-videsanincompleteand/orincorrectanalysisofthe

data. Thesecondstepistherenementofthis

gram-mar(Section 5.3). Duringthis step,the validity of

thegrammarrules ischecked andtherules are

im-proved(rened)ifnecessary. Thisimprovement

cor-responds to nd contexts in which elements which

1

http://www.sfb441.uni-t ueb inge n.d e/~d eje an/

chunker.html.

2

TheWSJcorpus(Marcusetal.,1993).

3

(Mu~noz et al., 1999) showed that this representation

tendstoprovidebetterresultsthantherepresentationused

(3)

are considered to be members of the structure do

notbelongto thisstructure(andreciprocally).

We give here a simple example to illustrate the

learning process. The rst step (initial grammar

generation)categorisesthetagJJ (adjective)as

be-longing by default to the NP structure if it occurs

before a noun. The second step(renement) nds

outthatsomeadjectivesdonotobeytotheserules 4

.

The renement is triggered in order to modify the

defaultrulesothattheseexceptionscanbecorrectly

processed.

Thus, the learning algorithm simply consists of

categorising the elements of the corpus (tags and

words) into specic categories,and this

categorisa-tionallowstheextractionofthestructureswewant

tolearn. Thesecategoriesareexplainedinthenext

section.

5 The Learning System

5.1 The Background Knowledge

Inordertoeasethelearning,thesystemuses

back-ground knowledge. This knowledge provides a

for-mal and general description of the structures that

ALLiScanlearn. Wesupposethatthestructuresare

composed of anucleus with optional left and right

adjuncts. Weheregiveinformaldenitions,the

for-mal/distributionalonesaregiveninSection 5.2.

The nucleus is the head of the structure. We

authorisethepresenceof severalnucleiin thesame

structure.

All the other elements in the structure (except

the linker) are considered as adjuncts. They are

in dependence relation with the head of the

struc-ture. The adjuncts are characterisedby their

posi-tion(left/right)relativeto thenucleus.

Alinkerisaspecialelementwhichbuildsan

en-docentric structure with two elements. It usually

correspondstocoordination 5

.

Anelement(nucleusoradjunct)mightpossessthe

break property. This notion is introduced (as in

(RamshawandMarcus,1995),(Mu~nozetal.,1999))

inordertodealwithsequenceswhereadjacentnuclei

composeseveralstructures(Section5.2.4).

This pattern can be seen as a variant of the

X-bar template (Head, Spec and Comp), which was

already used in a learning system (Berwick, 1985)

(although Comp is notuseful forthe non-recursive

structures).

Thepossibledierentcategoriesofanelementare

summarisedin Figure5.1.

Thefollowingsentenceshowsan exampleof

cat-egorisation (elements which do not appear in the

structure(NP)aretaggedO):

4

Forexample: [the/DT7/CD%/NNbenchmark/NN

is-sue/NN]due/JJ[October/NNP1999/CD].

5

Onlylinkersoccurringbetween thetwocoordinated

ele-NU

Figure1: Thedierentcategories.

In

Thestructureisformallydened as:

S !A

Thesymbol corresponds to the Kleene star. For

thesakeoflegibility,wedonotintroducelinkersin

this expression. But each symbol X (NU, A) can

bedened by therule 6

X !X jX lX where lis

thelistoflinkers. ThesymbolsB+andB indicate

whethertheelementhasthebreakerpropertyornot.

Since the corpus does not contain information

aboutthesedistributional categories,ALLiS hasto

gure them out. This categorisation relies on the

distributionalbehaviouroftheelements,andcanbe

automaticallyachieved.

5.2 The InitialCategorisation

Thegeneralideatocategoriseelementsistouse

spe-ciccontextswhichpoint outsomeofthe

distribu-tionalpropertiesofthecategory. Thecategorisation

isa sequentialprocess. First the nucleihave to be

found out. For each tag of the corpus, we apply

the function f

nu

(described below). This function

selects a list of elements which are categorised as

nuclei. Thefunction f

b

isapplied tothis listin

or-der to gure out nuclei which are breakers. Then

theadjuncts are foundout, and thefunction f

b is

alsoappliedtothemtogureoutbreakers.

5.2.1 Categorisation ofNuclei

Thecontextusedtondoutthenucleireliesonthis

simple following observation: A structure requires

at least one nucleus 7

. Thus, the elements that

oc-curalone in astructure are assimilatedto nucleus,

sincea structure requires a nucleus. Forexample,

thetagsPRP (pronouns)and NNP(propernouns)

may compose alone a structure (respectively 99%

6

The regular expression formalism does not allow such

rule,andthen,thisrecursionissimulated.

7

Thepartialstructures(structureswithoutnucleus)

(4)

and48%ofthese tagsappear alonein NP)but the

tagJJ appears aloneonly0.009%. Wededucethat

PRPandNNPbelongtothenucleiandnotJJ.But

thiscriteriondoesnotallowtheidentication ofall

thenuclei. Someoftenappearwithadjuncts(an

En-glishnoun(NN)often 8

occurswithadetermineror

anadjectiveandthusappearsaloneonly13%). The

singleuseofthischaracteristicprovidesacontinuum

ofvalueswheretheautomaticsetupofathreshold

betweenadjunctsand nucleiisproblematicand

de-pends onthestructure. Tosolvethisproblem, the

ideais todecompose thecategorisationof nucleiin

twosteps. Firstweidentify characteristicadjuncts.

The adjuncts can not appear alone since they

de-pend on a nucleus. The function f

char

is built so

that it provides averysmall valuefor adjuncts. If

thevalueofanelementislowerthanagiven

thresh-old (

char

=0:05), then itis categorisedasa

char-acteristicadjunct.

P correspondsto thenumberof occurrencesof

the pattern P in the corpus C. For example the

numberofoccurrencesin thetrainingcorpusofthe

pattern[JJ]is99,andthenumberofoccurrencesof

thepatternJJ(includingthepattern[JJ])is11097.

So f

char

(JJ) = 0:009, value being low enough to

considerJJasacharacteristicadjunct. Thelist

pro-videdbyf

char

forEnglishNPis:

A

char

=fDT;PR P$;POS;JJ;JJR ;JJS;;g

These elements can correspond to left orright

ad-juncts. All theadjunctsare notidentied,but this

list allows the identication of the nuclei as

ex-plainedin thenextparagraph.

Thesecondstepconsistsofintroducingthese

ele-mentsintoanewpatternusedbythefunction f

nu .

Thispatternmatcheselementssurroundedbythese

characteristicadjuncts. Itthusmatchesnucleiwhich

oftenappearwithadjuncts. Sinceasequenceof

ad-juncts(asanadjunctalone)cannotalonecomposea

completestructure,X onlymatcheselementswhich

correspondtoanucleus.

f

Thefunctionf

nu

isagooddiscriminationfunction

betweennuclei and adjunctsand providesverylow

values for adjuncts and veryhigh values for nuclei

(table1).

5.2.2 Categorisationof Adjuncts

Oncethenucleiareidentied,wecaneasilyndout

theadjuncts. Theycorrespondto allthe other

ele-ments which appear in the context: There are two

8

nu

POS 1759 0.00 no

PRP$ 1876 0.01 no

JJ 11097 0.02 no

DT 18097 0.03 no

RB 919 0.06 no

NNP 11046 0.74 yes

NN 21240 0.87 yes

NNPS 164 0.93 yes

NNS 7774 0.95 yes

WP 527 0.97 yes

PRP 3800 0.99 yes

Table1: DetectionofsomenucleioftheEnglishNP

(nounsandpronouns).

kindsof adjuncts: the left and the right adjuncts.

Thecontextsusedare:

[ NU ] fortheleftadjuncts

[ A

l

*NU ] fortherightadjuncts

Ifanelementappearsat theposition ofthe

under-score, it is categorised as adjunct. Once the left

adjunctsarefoundout,theycanbeusedforthe

cat-egorisationoftherightadjuncts. Theythus appear

in the context as optional elements(this is helpful

to capturecircumpositions).

SincetheAdjectivePhraseoccurringinsideanNP

is notmarkedin theUpenn treebank,weintroduce

the class of adjunct of adjunct. The contexts used

to ndouttheadjunctsoftheleftadjunct are:

[ A

NU ]fortherightadjuncts ofA

l

The contexts are similar for adjuncts of right

ad-juncts.

5.2.3 The Linkers

By denition, a linker connects two elements, and

appears between them. The contexts used to nd

linkersare:

[ NU NU ] linkerofnuclei

[ A ANU ] linkerofleftadjuncts

[ NUA A ] linkerofrightadjuncts

Elements which occurin these contexts but which

havealreadybeencategorisedasnucleusoradjuncts

are deletedfromthelist.

5.2.4 The Break Property

Asequenceofseveralnuclei(andtheadjunctswhich

depend on them) canbelong to aunique structure

orcomposeseveraladjacentstructures. An element

isabreakerifitspresenceintroducesabreakintoa

sequence ofadjacentnuclei. Forexample,the

(5)

singlestructureinthetrainingcorpus.

...[the/DT coming/VBG week/NN]

[the/DT foreign/JJ exchange/NN

mar-ket/NN] ...

ThetagDT introducesabreakonitsleft,butsome

tagscanintroduceabreakontheirrightorontheir

left and right. Forinstance, thetagWDT (NU by

default) introduces a break on its left and on its

right. In other words, this tag can not belong to

thesamestructureastheprecedingadjacentnucleus

andto thesamestructureasthefollowingadjacent

nucleus.

...[ railroads/NNS and/CC trucking/NN

companies/NNS ] [ that/WDT ]

be-gan/VBDin/IN[1980/CD]...

...in/IN[which/WDT][people/NNS]

generally/RBare/VBP...

Inordertodetectwhichtaghasthebreakproperty,

webuilduptwofunctionsf

bleft andf

bright .

f

bleft (X)=

P

C

NU ] [X

P

C

wb NU X

f

bright (X)=

P

C

X ][NU

P

C

wb XNU

NU:fnucl eig

C

wb

:corpuswithoutbrackets

Thesefunctionsareusedtocomputethebreak

prop-erty for nuclei, but also for adjuncts (In this case,

the pattern X is completed by adding the element

NU to the left or to the right of X (the potential

adjunct) according to the kind of adjunct (left or

rightadjunct)). Thetable 2showssome valuesfor

sometags. Anelementcanbealeftbreaker(DT),a

rightbreaker(noexampleforEnglishNPatthetag

level),orboth(PRP).Thebreak propertyis

gener-allywell-markedandthethresholdiseasytosetup

(0.66in practice).

TAG f

bleft f

bright

DT 0.97(yes) 0.00(no)

PRP 0.97(yes) 0.68(yes)

POS 0.95(yes) 0.00(no)

PRP$ 0.94(yes) 0.00(no)

JJ 0.44(no) 0.00(no)

NN 0.04(no) 0.11(no)

NNS 0.03(no) 0.14(no)

Table2: Breakerdetermination. Valuesofthe

func-tionsf

bleft andf

bright

forsomeelements.

In therenement step(Section 5.3), thebreaker

itstag(NN)isnot.

5.3 The RenementStep

5.3.1 The notion ofreliability

Thepreceding functions identifythecategoryofan

elementwhenitoccurs inthe structure. Butan

ele-mentcanoccurinthestructureaswellasoutofthe

structure. Forexample,thetagVBGisonly

consid-eredasadjunctwhenitoccursinthestructure.

Nev-ertheless,itmainlyoccursoutofthestructure(84%

ofitsoccurrences). Ifanelementmainly 9

occursout

ofthestructure,itisconsideredasnon-reliable and

itsdefaultcategoryisOUT.Foreachelement

occur-ring inside the structure, its reliable is tested. The

initialgrammar correspondsto thegrammarwhich

onlycontainsthereliableelements. Itsprecisionand

itsrecallarearound86%.

How is determined the reliability of an element?

Thisnotionofreliableelementiscontextualand

de-pendsonthecategoryoftheelement.

Forthenuclei,thecontextisempty. Wejust

com-putetheratiobetweenthenumberofoccurrencesin

thestructureoverthenumberofoccurrences

occur-ringoutsideofthestructure.

For the adjuncts, the context includes an

adja-centnucleus(ontherightforleftadjunctsoronthe

left forrightadjuncts). Forinstance, thetag JJ is

categorisedasleftadjunctfortheEnglishNP.It

ap-pears 9617 times before a nucleus and 9489 times

in the structure. It is thus considered as reliable,

anditsdefault categoryisleft adjunct. Inthecase

wherethetagJJoccurswithoutnucleusonitsright

(a predicative use), it is not considered as adjunct

and this kind of occurrences is not used to

deter-minethereliabilityoftheelement. Onthecontrary,

the tag VBG appears 468 times before a nucleus,

but,in this context,itoccurs only138timesin the

structure. Thisisnotenough(29%)toconsiderthe

element as areliable left adjunct, and thus its

de-fault categoryis OUT.Fortheadjunct of adjunct,

thecontextincludes adjunctandnucleus.

5.3.2 Detection oferrors

Oncetheinitialgrammarisbuiltup,itserrorshave

to be corrected. Thedetection of the errors

corre-spondstoamiscategorisationofatag. Anautomatic

errordonebytheinitialgrammaristowrongly

anal-yse the structures composed with non-reliable

ele-ments(falsenegativeexamples). Each time that a

non-reliableelement occurs in the structure

corre-spondstoanerror. Forinstance,theinitialgrammar

cannotcorrectlyrecognisethefollowingsequenceas

anNP, the default categoryof the tagVBG being

OUT(outsidethestructure):

...[the/DTcoming/VBGweek/NN]...

(6)

The second kind of errors corresponds to

se-quenceswronglyrecognisedasstructures(false

pos-itiveexamples). Thiskind of erroris generated by

reliableelementswhichexceptionallydonotoccurin

the structure. In the following example, order/NN

occursoutsideofthestructure,althoughthedefault

categoryof the tag NN is NU (nucleus), and thus

theinitialgrammarrecognisesanNP.

...in/INorder/NNto/TO pay/VB...

5.3.3 Correction

Inbothkindsof errors,the sametechniqueisused

to correct them. For this purpose, ALLiS disposes

oftwooperators,thecontextualisation andthe

lex-icalisation.

Thecontextualisation consistsofnding out

con-texts in order to x the errors. Theideais to add

constraints for recategorising non-reliable elements

asreliable 10

. Thepresenceofsomespecicelements

cancompletelychangethebehaviourofatag. The

table3showsthelistofcontextswherethetagVBN

isnotcategorisedasOUTbutas leftAdjunct.

PRP$ VBG NU

IN[ VBG NU

DT VBG NU

JJ VBG NU

POS VBG NU

VBG[ VBG NU

Table3: Somecontextswhere the non-reliable

ele-mentVBN becomesreliable.

For each tag occurring in the structure, all the

possible contexts 11

are generated. For the

non-reliable tags (rst kind of error), we evaluate the

reliability of them contextually, and we delete the

contexts in which the tag is still non-reliable (the

list of contexts canbe empty, and in this case the

errorcannotbexed). Forthereliabletags(second

kindoferror),wekeepthecontextsinwhichthetag

iscategorisedOUT.

The lexicalisation consists of introducing lexical

information: theword level. Some words canhave

a specic behaviour which does not appear at the

Part-Of-Speech(POS)level. Forinstance,theword

yesterday isaleftbreaker,behaviourwhichcannot

begured outat thePOS level (Table4). The

in-troduction of the lexicalisationimprovesthe result

by2%(Section6).

The lexicalisation and the contextualisation can

becombinedwhenbothseparatelyarenotpowerful

enoughtoxtheerror. Forexample,thewordabout

taggedRB (defaultcategoryofRB:OUT) followed

10

Thesametechniqueisusedin(Sima'an,1997).

11

Thecontextsdependonthe categoryofthetag,butare

about/RB( CD) OUT A

l,B+

order(IN TO) NU OUT

yesterday/NN NU

B-NU

B+

operating/VBG OUT A

l,B-last/JJ A

l,B-A

l,B+

Table4: Somespecic lexicalbehaviours.

by the tag CD is recategorised asleft adjunct and

leftbreaker(Table4).

6 Evaluation

Wenowshow someresultsand givesome

compar-isons with other works (Table 5). The results are

quite similar to other approaches. Two rates are

measured: precisionanrecall.

R=

Numberof correctproposed patterns

Numberof correct patterns

P=

Numberof correctproposed patterns

Number of proposedpatterns

The training data are composed of the sections

15-18oftheWallStreetJournalCorpus(Marcuset

al., 1993), and we use the section 20 for the test

corpus 12

. The data is tagged with the Brill

tag-ger. TheworksgeneratingsymbolicruleslikeALLiS

are(RamshawandMarcus,1995)(T

ransformation-Based learning) and (Cardie and Pierce, 1998)

(error-driven pruningof treebank grammars).

AL-LiS provides better results than them. (Argamon

etal.,1998)useaMemory-BasedShallowLearning

system, (Tjong Kim Sangand Veenstra, 1999) the

Memory-BasedLearningmethodand(Mu~nozetal.,

1999)usesanetwork oflinearfunctions. Thelatter

work seems to integrate better lexical information

sinceALLiSgetsbetterresultswithPOSonly.

POSonly withwords

NP MPRZ99 90.3/90.9 92.40/93.10

ALLiS 91.0/91.2 92.56/92.36

TV99 92.50/92.25

ADK98 91.6/91.6

RM95 90.7/90.5 92.3/91.8

CP98 91.1/90.7

VP ALLiS 91.39/90.52 92.15/91.95

Table 5: Resultsfor NP and VP structures

(preci-sion/recall).

The main errors done by ALLiS are due to

er-rorsof tagging (the corpusis tagged with theBrill

tagger) or errors in the bracketing of the training

corpus. Then, the second type of errors concerns

12

(7)

correspondto51%oftheoverallerrors. Wecannd

the same typology in other works (Ramshaw and

Marcus, 1995), (Cardieand Pierce, 1998). We did

some tries in order to manually improve the nal

grammar,but theonlytypeof errorswhich canbe

manuallyimprovedconcernstheproblemofthe

quo-tationmarks(the improvementisaboutof0.2%in

precisionandrecall).

Errortypes # %

tagging/bracketing errors 57 28.5%

coordination 45 22.5%

gerund 15 7.5%

adverb 13 6.5%

appositives 13 6.5%

quotationmarks,punctuation 10 5%

pastparticiple 9 4.5%

that (IN) 6 3%

Table6: Typologyofthe200rsterrors.

References

Andreas Abecker and Klaus Schmid. 1996. From

theory renement to kb maintenance: aposition

statement. InECAI'96,Budapest,Hungary.

StevenAbney. 1996. Partialparsingvianite-state

cascades. InProceedingsoftheESSLLI'96Robust

ParsingWorkshop.

Shlomo Argamon, Ido Dagan, and Yuval

Kry-molowski. 1998. A memory-based approach to

learning shallow natural language patterns. In

COLING'98,Montreal.

RobertC.Berwick. 1985. Theacquisitionof

syntac-tic knowledge. MITpress,Cambridge.

Eric Brill. 1993. A Corpus-Based Approach to

Language Learning. Ph.D. thesis,Departmentof

ComputerandInformationScience,Universityof

Pennsylvania.

Cliord Alan Brunk and Michael Pazzani. 1995.

A lexically based semantics bias for theory

revi-sion. InMorganKauman,editor,Twelfth

Inter-national Conference on Machine Learning, pages

81{89.

Cliord Alan Brunk. 1996. An investigation

of Knowledge Intensive Approaches to Concept

Learning and Theory Renement. Ph.D. thesis,

UniversityofCalifornia,Irvine.

SabineBuchholz, JornVeenstra,andWalter

Daele-mans. 1999. Cascaded grammatical relation

as-signment. In Proceedings of EMNLP/VLC-99,

pagespp.239{246,UniversityofMaryland,USA.

ClaireCardieandDavidPierce. 1998. Error-driven

pruning of treebank grammars for base noun

guistics (COLING-ACL '98).

Claire Cardie and David Pierce. 1999. The role of

lexicalization and pruning for base noun phrase

grammars. In Proceedings of the Sixteenth

Na-tional Conferenceon Articial Intelligence.

SusanCrawandD.Sleeman. 1990. Automatingthe

renement of knowledge-based systems. In

Pro-ceedings of the ECAI'90 Conference, pages 167{

172.

The XTAG Research Group. 1998. A lexicalized

treeadjoininggrammarforenglish. Technical

Re-portIRCS98-18,UniversityofPennsylvania.

Lauri Karttunen, Tamas Gal, and Andre Kempe.

1997. Xerox nite-state tool. Technical report,

XeroxResearchCentre Europe,Grenoble.

MitchellMarcus, BeatriceSantorini,andMarcAnn

Marcinkiewicz. 1993. Buildingalargeannotated

corpus of english: thepenn treebank.

Computa-tional Linguistics,19(2):313{330.

Raymond J.Mooney. 1993. Induction overthe

un-explained: Using overly-general domain theories

toaidconceptlearning. MachineLearning,10:79.

Raymond J. Mooney. 1997. Inductive logic

pro-gramming for natural language processing. In

Sixth International Inductive Logic Programming

Workshop,pages205{224,Stockholm,Sweden.

Marcia Mu~noz,Vasin Punyakanok, Dan Roth, and

DavZimak. 1999. Alearningapproachtoshallow

parsing. InProceedings of EMNLP-WVLC'99.

DirkOurstonandRaymondMooney. 1990.

Chang-ingtherules: Acomprehensiveapproachtotheory

renement. InProceedings of the Eight National

Conference on Articial Intelligence, pages 815{

820.

LanceA. Ramshawand Mitchell P. Marcus. 1995.

Text chunking using transformation-based

learn-ing. InACLThirdWorkshop onVeryLarge

Cor-pora,pages82{94.

GeoreySampson. 1995. Englishfor theComputer.

The SUSANNE Corpus and Analytic Scheme.

Oxford: ClarendonPress.

Khalil Sima'an. 1997. Explanation-based learning

of dataorientedparsing. In T.MarkEllison,

ed-itor, Computational Natural Language Learning

(CoNLL), ACL/EACL-97,Madrid.

Erik Tjong Kim Sang and Jorn Veenstra. 1999.

Representing text chunks. In Proceedings of

EACL'99, Association for Computational

Lin-guistics, Bergen.

JacquesVergne and Emmanuel Giguet. 1998.

Re-gards theoriques sur le "tagging". In

proceed-ings of TraitementAutomatiquedesLangues