• No results found

An Efficient Algorithm for Projective Dependency Parsing

N/A
N/A
Protected

Academic year: 2020

Share "An Efficient Algorithm for Projective Dependency Parsing"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

DEPENDENCY PARSING

Joakim Nivre

Växjö University

School of Mathematics and Systems Engineering

SE-35195VÄXJÖ

[email protected]

Abstract

Thispaperpresentsadeterministicparsingalgorithmforprojectivedependencygrammar. The

runningtimeofthealgorithmislinearinthelengthoftheinputstring,andthedependencygraph

produced is guaranteed to be projective and acyclic. The algorithm has been experimentally

evaluated in parsing unrestricted Swedish text, achieving anaccuracy above 85% with a very

simplegrammar.

1 Introduction

Despitea long andvenerabletradition in linguistics,dependencygrammar hasuntilquite

re-centlyplayed a fairly marginal role in natural language parsing. However,dependency-based

representationshaveturnedouttobeusefulin statisticalapproachestoparsing and

disambig-uation(see,e.g.,Collins[4,5,6],Eisner[15 ,16,17],Samuelsson[25])andtheyalsoappearwell

suitedforlanguageswithlessrigidwordorderconstraints(Covington[9,10],Collinsetal.[7]).

Several dierent parsing techniques havebeen used with dependency grammar. The most

common approach is probably to use some version of the dynamic programming algorithms

familiar from context-freeparsing, withor without statistical disambiguation (Eisner[15, 16,

17], Barbero et al.[2 ], Courtin and Genthial [8], Samuelsson [25]). Another school proposes

thatdependencyparsing should be castas a constraint satisfactionproblemand solvedusing

constraintprogramming(Maruyama[22],Menzeland Schröder[24],Duchier[12,13,14]).

HereI willinsteadpursue a deterministicapproachto dependencyparsing, similarto

shift-reduceparsingforcontext-freegrammar. Inthepast,deterministicparsingofnaturallanguage

hasmostlybeenmotivated bypsycholinguisticconcerns,as in thewell-known Parsifalsystem

of Marcus [21]. However, deterministic parsing also has the more direct practical advantage

ofproviding very ecient disambiguation. If thedisambiguation can be performedwith high

accuracyandrobustness,deterministicparsingbecomesaninterestingalternativetomore

tra-ditionalalgorithms for natural language parsing. There are potential applications of natural

languageparsing,forexampleininformationretrieval,whereitmaynotalwaysbenecessaryto

constructacompletedependencystructureforasentence,aslongasdependencyrelationscan

beidentiedwithgoodenoughaccuracy,butwhereeciencycan bevitalbecauseofthelarge

amountofdata tobe processed. It mayalsobe possibleto improveparseaccuracybyadding

(2)

PP

(In ?

NN

60-talet

the-60's

?

VB

målade

painted PN

han

he

?

JJ

djärva

bold

?

NN

tavlor

pictures ?

HP

som

which

?

VB

retade

annoyed

?

PM

Nikita

Nikita

?

PM

Chrusjtjov.

Chrustjev.) ?

Figure1: DependencygraphforSwedish sentence

andshallowprocessing. Itis a kindofdeepprocessingin that thegoalisto builda complete

syntacticanalysisfortheinputstring,notjustidentifybasic constituentsasinpartialparsing.

Butitresemblesshallowprocessingin beingrobust,ecientanddeterministic.

In this paper, I present an algorithm that produces projective dependency graphs

deter-ministicallyinlineartime. Preliminaryexperimentsindicatethatparsing accuracyabove85%

forunrestricted text is attainable with a very simple grammar. Insection 2, I introducethe

necessaryconceptsfromdependencygrammarandthespecic formofgrammarrulesrequired

by theparser. Insection3,I dene thealgorithmand provethat itsworstcaserunningtime

is linear in the size of the input; I also prove that the dependency graphs produced by the

algorithm are projective and acyclic. In section 4, I present preliminary results concerning

theparsing accuracywhen applied to unrestricted Swedish text, using a simple hand-crafted

grammar. Insection5,Iconcludewithsome suggestionsforfurtherresearch.

2 Projective Dependency Grammar

The linguistic tradition of dependency grammar comprises a large and fairly diverse family

ofgrammatical theoriesand formalisms that share certain basic assumptions aboutsyntactic

structure,inparticulartheassumptionthatsyntacticstructureconsistsoflexical nodes linked

by binaryrelations called dependencies (see, e.g.,Tesnière[30], Sgall etal. [26], Mel'£uk [23],

Hudson [20]). Thus, the common formal property of dependency structures, as compared to

therepresentationsbasedonconstituency(orphrase structure),isthelackofphrasal nodes.

In a dependency structure, every lexical node is dependent on at most one other lexical

node, usuallycalled its head or regent, which meansthat thestructure can be represented as

a directed graph, with nodes representing lexical elementsand arcs representing dependency

relations. Figure1 showsadependencygraphforasimple Swedish sentence,where eachword

ofthesentenceislabeledwithitspartofspeech.

Overandabovethese minimal assumptions, dierentvarietiesof dependencygrammar

im-pose dierent conditions on dependency graphs. (In fact, not even the constraint that each

lexicalnodehasatmostoneheadisadopteduniversally;cf.Hudson[20].) Thefollowingthree

constraintsarecommonin theliteratureandwill allbe adoptedinthesequel:

1. Dependencygraphsshould beacyclic.

2. Dependencygraphsshould beconnected.

3. Dependencygraphsshould beprojective.

(3)

(painted)astherootnode.

Whiletheconditionsofacyclicityandconnectednessareadoptedinmostversionsof

depend-encygrammar, theprojectivity constraint is more controversial and can in factbe seen as a

major branchpointwithin this tradition. However,mostresearchersseemto agreethat,even

ifthe assumption of projectivity is questionable from a theoretical point of view, itis

never-thelessa reasonableapproximationin practicalparsing systems. Firstof all,itmustbe noted

thatprojectivityisnota propertyofthedependencygraphin itselfbutonlyin relationtothe

linearorderingof tokensin thesurface string. Several dierentdenitions of projectivity can

be foundin theliterature, but theyare allroughly equivalent (see,e.g., Mel£uk [23], Hudson

[20],SleatorandTemperley[27,28]). IwillfollowHudson [20]anddeneprojectivityinterms

ofanextendednotionofadjacency:

1. Adependencygraph isprojective ievery dependentnodeisgraph adjacent toitshead.

2. Two nodesnandn 0

aregraph adjacent ieverynoden 00

occurring betweennand n 0

inthe

surfacestringisdominatedbynorn 0

in thegraph.

Notethat projectivitydoesnotimplythat thegraphis connected;nordoesitimply thatthe

graphisacyclic(althoughitdoesexcludecyclesoflengthgreaterthan2).

Wearenowin apositiontodene whatwemeanbyawell-formed dependencygraph:

Astring ofwords W isrepresented asa listoftokens, whereeach tokenn=(i;w)is apair

consisting of a position i and a word form w; the functional expressions pos(n) = i and

lex(n)=wcanbe usedtoextractthepositionandwordform ofatoken. Welet<denote

thecompleteandstrict orderingoftokensin W,i.e.n<n 0

ipos(n)<pos(n 0

).

A dependency graph for W is a directed graph D =(N

W

;A), where the set of nodes N

W

is the set of tokens in W, and the arc relation A is a binary, irreexive relation on N

W .

Wewrite n ! n 0

to say that there is an arc from n to n 0

, i.e. (n;n 0

) 2 A; we use !

to

denotethe reexiveandtransitive closureof thearc relationA; and weuse $and $

for

thecorrespondingundirectedrelations,i.e.n$n 0

in!n 0

orn 0

!n.

Adependencygraph D=(N

W

;A)iswell-formedithefollowingconditions aresatised:

Singlehead (8nn 0

n 00

)(n!n 0

^ n 00

!n 0

) ) n=n 00

Acyclic (8nn

0

):(n!n 0 ^n 0 ! n) Connected (8nn 0

)n$ n 0 Projective (8nn 0 n 00

)(n$n 0

^ n<n 00

<n 0

) ) (n! n 00 _ n 0 ! n 00 )

Finally, letus note that alldependencygraphs considered in this paperare unlabeled,in the

sensethattheyhavenolabelsontheedgesrepresentingdierentkindsofdependencyrelations

(suchassubject,object,adverbial, etc.).

Mostformalizations of dependencygrammar use rules that specifywhole congurations of

dependents for a given head, using some notion of valence frames (Hays [19], Gaifman [18],

(4)

De-whereT isa (terminal)vocabulary (setofwordforms)andR isa setofrulesofthefollowing

form(wherew;w 0

2T):

w w

0

w ! w

0

Givena stringofwordswith nodesnandn 0

suchthat lex(n)=w,lex(n 0

)=w 0

andn<n 0

,

therstrule saysthat n 0

can be the headof n, whilethe second rulesays that ncan be the

headofn 0

. Rulesof therstkindare referred toas right-headed,rulesof thesecond kindas

left-headed. ThegrammarrulesusedhereareverysimilartoCovington's[9]notionofD-rules,

exceptthatthelatterareundirected,andI willthereforecallthemdirectedD-rules. Directed

D-rules are also similar to but simpler than the dependency relations of Courtin and

Genthial[8].

Finally, it is worth pointing out that, even though the parsing algorithm dened in the

nextsectionpresupposesdirected D-rules,this doesnotexcludeitsusewithmore traditional

dependencygrammars,sinceitisusuallystraightforwardtoextractdirectedD-rulesfromthese

grammars. For example, given a grammar in the formalism of Hays [19 ], we can construct a

setofdirectedD-rulesinthefollowingway:

Forevery ruleX(X

m

X

1 X

1

X

n

)intheoriginalgrammar,introducerulesw

l w

andw!w

r

forevery w2X;w

l 2X

m

[[X

1 ;w

r 2X

1

[[X

n .

However, using the parsing algorithm dened in the next section with the directed D-rules

extractedin this way willnotresultin a parser forthe originalgrammar, sincetheextracted

rulesnormallyimposemuch weakerconstraintsonthedependencystructurethantheoriginal

rules.

3 Parsing Algorithm

Theparsingalgorithmpresentedinthissectionisinmanywayssimilartothebasicshift-reduce

algorithm forcontext-free grammars(Aho etal. [1]),although the parse actionsare dierent

giventhat we areusing directedD-rules insteadofcontext-freegrammar rules. For example,

sincethere arenononterminalsymbols inthegrammar,thestackwillnevercontainanything

butinput tokens (graphnodes), anda reduce actionsimplyamountsto popping thetopmost

elementfromthestack.

Parser congurations arerepresentedby tripleshS;I;Ai, where S is thestack (represented

as a list), I is the list of (remaining) input tokens, and A is the (current) arc relation for

the dependency graph. (The set of of nodes N

W

in the dependency graph is given by the

inputstring,as dened inthepreceding section,andneednotberepresentedexplicitlyinthe

conguration.) Givenaninputstring W,theparserisinitializedto hnil;W;;iandterminates

whenitreachesacongurationhS;nil;Ai(foranylistSandsetofarcsA). TheinputstringW

isacceptedifthedependencygraphD=(N

W

;A)givenatterminationiswell-formed;otherwise

W isrejected. ThebehavioroftheparserisdenedbythetransitionsdenedinFigure2(where

(5)

Termination hS;nil;Ai

Left-Arc hnjS;n 0

jI;Ai!hS;n 0

jI;A[f(n 0

;n)gi lex(n) lex(n 0

)2R

:9n 00

(n 00

;n)2A

Right-Arc hnjS;n 0

jI;Ai!hn 0

jnjS;I;A[f(n;n 0

)gi lex(n)!lex(n 0

)2R

:9n 00

(n 00

;n 0

)2A

Reduce hnjS;I;Ai!hS;I;Ai 9n

0

(n 0

;n)2A

Shift hS;njI;Ai!hnjS;I;Ai

Figure 2: Parsertransitions

ThetransitionLeft-Arcaddsanarcn 0

!nfromthenextinputtokenn 0

to thenodenon

topofthe stack and reduces(pops) n from thestack, providedthat the grammar contains

therule lex(n) lex(n 0

)and provided that the graph doesnot contain an arc n 00

! n.

Thereason that the dependent node is immediately reduced isto eliminatethe possibility

ofadding an arc n! n 0

(should the grammarcontain the rulelex(n) !lex(n 0

)),which

wouldcreatea cycleinthegraph.

Thetransition Right-Arc addsan arcn! n 0

from the noden ontop ofthe stack tothe

nextinput token n 0

, licensed byan appropriate grammar rule, and shifts(pushes) n 0

onto

thestack. Thereasonthat thedependentnode isimmediatelyshiftedisthesame as inthe

precedingcase,i.e.topreventthecreationofcycles. (Sincen 0

mustbereducedbeforencan

becometopmostagain,nofurtherarclinking thesenodescaneverbeadded.)

ThetransitionReducesimplyreduces(pops)thenodenontopofthestack,providedthat

this node hasa head. This transition is needed for cases where a single headhas multiple

dependentson theright, in which case thecloserdependentnodes must be reducedbefore

arcscanbeaddedtothemoredistantones. Theconditionthatthereducednodehasahead

isnecessary to ensurethat the dependency graph is projective, sinceotherwisethere could

beungovernednodesbetweenaheadanditsdependent.

Thetransition Shift, nally, simply shifts(pushes) thenext inputtoken n onto the stack.

Thistransition is neededfor cases where a rightdependenthasits own left dependents, in

whichcasethesedependentshavetobereduced(byLeft-Arctransitions)beforethearccan

beaddedfromtheheadtotherightdependent. Moreover,theShifttransition,whichhasno

conditionassociatedwith itexcept thatthe inputlistis non-empty, is neededto guarantee

termination.

The transitions Left-Arc and Reduce have in common that they reduce the stack size by

(6)

pop-(in) (the-60's) (painted) (he) (pictures) på målade,

målade !han,

målade !tavlor g

hnil;på60-taletmålade hantavlor;;i

S

! hpå;60-taletmålade hantavlor;;i

RA

! h60-taletpå;målade hantavlor;f(på;60-talet)gi

R

! hpå;måladehantavlor;f(på;60-talet)gi

LA

! hnil;måladehan tavlor;f(på;60-talet);(målade;på)gi

S

! hmålade ;hantavlor;f(på;60-talet);(målade;på)gi

RA

! hhanmålade;tavlor;f(på;60-talet);(målade;på);(målade;han )gi

R

! hmålade ;tavlor;f(på;60-talet);(målade;på);(målade;han)gi

RA

! htavlormålade ;nil;f(på;60-talet);(målade;på);(målade;han);(målade;tavlor)gi

Figure3: ParseofSwedish sentence

theyreduce thelengthof theinput listby1 and increases thesize of thestack by1 and will

thereforebe calledpush-transitions.

Asitstands,this transitionsystemisnondeterministic,sinceseveraltransitionsoftenapply

tothesame conguration,and in orderto geta deterministicparser weneedto impose some

schedulingpolicyontopofthesystem. Thesimplestwaytodothisistouseaconstantpriority

orderingoftransitions Left-Arc>Right-Arc>Reduce >Shift(wherea>b meansthat

ahashigherprioritythanb). Figure3showshowareducedvariantoftheSwedish sentencein

Figure 1 can be parsed using this priorityordering (with theobvious transition labels LA =

Left-Arc,RA=Right-Arc,R=Reduce,S=Shift).

Decidinghowtoresolvethese conictsiscrucialfrom thepointofviewofparsingaccuracy,

as we will see in the next section, but it does not aect the basic properties of the parser

withrespecttorunningtimeand well-formednessofthedependencygraphproduced. Forthe

remainder of this section, I will therefore simply assume that there exists some unspecied

determinization of the parser (perhaps randomized) such that exactly one of the permissible

transitionsischosenforevery non-terminalconguration.

Proposition 1: Givenan inputstring W of lengthn, theparser terminates after at most

2ntransitions.

Proof of Proposition 1: Werstshowthata transitionsequence startingfromaninitial

conguration hnil;W;;i can contain at most n push-transitions and that such a sequence

mustbeterminating:

1. push-transitions have as a precondition that the currentinput listis non-empty and

de-creasesitslengthby1. Sincetherearenotransitionsthatincreasethelengthoftheinput

list, the maximum number of push-transitions is n. For the same reason, a transition

(7)

npop-transitions:

2. pop-transitionshaveasapreconditionthatthestackS isnon-emptyandhaveasaneect

that thesize ofS is decreased by1. Since theinitial conguration has anemptyS, and

since the onlytransitions that increase thesize ofS are push-transitionsof which there

can benomorethanninstances,itfollowsthatthemaximumnumberof pop-transitions

is alson.

Given that the number of pop-transitions is bounded by the number of push-transitions,

andgiventhatat leastone push-transition(Shift)isapplicabletoevery non-terminal

con-guration, weconclude that,for every initialconguration hnil;W;;i with aninputlist W

oflengthn, thereexists a transition sequencecontainingexactlynpush-transitionsand at

mostnpop-transitionsleading toa terminalcongurationhS;nil;Ai. 2

ThepracticalsignicanceofProposition 1isthat theparserisguaranteed tobe bothecient

androbust. Aslongaseachtransitioncanbeperformedinconstanttime,whichonlyrequiresa

reasonablechoiceofdatastructuresforlookupofgrammarrulesandgrapharcs,theworstcase

runningtimeof thealgorithm will be linearin thelengthof theinput. Moreover, evenifthe

parserdoesnotsucceedinbuildingawell-formeddependencygraph,itisalwaysguaranteedto

terminate.

Proposition2: ThedependencygraphG=(N

W

;A)givenatparserterminationis

project-iveandacyclic.

Proof of Proposition 2: For projectivity we need to show that, for every input string

W,ifhnil;W;;i!

hS;nil;Aithenthegraph (N

W

;A)is projective,whichmeansthat,for

every pair(n;n 0

)2A, nand n 0

aregraph adjacentin G. Assume hnil;W;;i!

hS;nil;Ai

andassume(n;n 0

)2A. Sincegraph adjacencyisasymmetricrelation,wecan withoutloss

ofgenerality assumethat n<n 0

. Thenwehave hnil;W;;i! hS 0 ;njn 1 jjn k jn 0 jI;A 0 i! hnjS 0 ;n 1 jjn k jn 0 jI;A 00 i! hnjS 0 ;n 0 jI;A 000 i!

hS;nil;Ai. Whatweneedto showis that

allof n

1 ;:::;n

k

are dominatedby n or n 0

in A, but since noarcs involving n

1 ;:::;n

k can

be in A 00

or A A 000

, it is sucient to show that this holds in A 00

A 000

. Werst observe

thatthereduction ofk nodesrequires exactly2k transitions,sinceevery reduction requires

aShiftfollowed by aLeft-Arc, ora Right-Arc followed bya Reduce(although thetwo

transitionsin each pairneed notbe adjacentin the sequence). Let (k) be theclaim that

ifhnjS 0 ;n 1 jjn k jn 0 jI;A 00 i! 2k hnjS 0 ;n 0 jI;A 000

i then all of n

1 ;:::;n

k

are dominated by n

or n 0

in A 000

, parameterized on the number k of nodes between n and n 0

. We nowuse an

inductiveproofto showthat(k)holdsforallk0:

1. Basis: Ifk=0thennandn 0

arestring adjacentand(k)holdsvacuously.

2. Induction: Assume (k) (k 0) and assume that hnjS 0 ;n 1 jjn k jn 0 jI;A 00 i ! 2(k+1) hnjS 0 ;n 0 jI;A 000

i. We beginbynoting that therst transition in this sequence must be a

push-transition, becauseotherwisenwouldbe reducedandwould notbeonthestackin

thenalconguration. Hence,weneedtoconsider two cases:

(a) IfthersttransitionisRight-Arc,thenn

1

(8)

hnjS;n

1 jjn

k

jnjI;A i !

hn 1 jnjS 0 ;n 2 jjn k jn 0 jI;A 00

[f(n;n

1 )gi ! 2m hn 1 jnjS 0 ;n 2+m jjn k jn 0 jI;A 0000 i ! hnjS 0 ;n 2+m jjn k jn 0 jI;A 0000 i ! 2(k+1) (2m+2) hnjS 0 ;n 0 jI;A 000 i

Since2(k+1) (2m+2)2k, wecan usetheinductivehypothesisto inferthat nand

n 0

areadjacent.

(b) Ifthersttransition isShift,then n

1

mustbe reducedwitha Left-Arctransitionand

thereexistssome m0suchthat:

hnjS 0 ;n 1 jjn k jn 0 jI;A 00 i ! hn 1 jnjS 0 ;n 2 jjn k jn 0 jI;A 00 i ! 2m hn 1 jnjS 0 ;n 2+m jjn k jn 0 jI;A 000 i ! hnjS 0 ;n 2+m jjn k jn 0 jI;A 000 [f(n 2+m ;n 1 )gi ! 2(k+1) (2m+2) hnjS 0 ;n 0 jI;A 000 i

Again,itfollowsfromtheinductivehypothesisthat nand n 0

areadjacent.

Theproofthat(N

W

;A)isfreeofcyclesisanalogoustotheproofofprojectivity,exceptthat

weconsidertheclaim (k)thatifhnjS 0 ;n 1 jjn k jn 0 jI;A 00 i! 2k hnjS 0 ;n 0 jI;A 000

ithenthere

isnopathfromnton 0

orfromn 0

ton,againparameterizedonthenumberkofnodesbetween

nandn 0

. HerethebasecasefollowsfromthedenitionofLeft-ArcandRight-Arc,which

preventscyclesoflength2. 2

InvirtueofProposition2,weknowthatevenifastringW isrejectedbytheparser,theresulting

dependency graph (N

W

;A) is projective and acyclic, although notnecessarily connected. In

fact,thegraph willconsist ofanumberofconnectedcomponents,each ofwhich formsa

well-formed dependency graph for a substring of W. This is importantfrom thepoint of viewof

robustness,sinceeachofthese componentsrepresenta partialanalysisoftheinputstring.

4 Empirical Evaluation

Inorder to estimatethe parsing accuracythat can be expectedwith thealgorithm described

intheprecedingsectionandagrammarconsistingofdirectedD-rules,asmallexperimentwas

performedusingdatafromtheStockholm-UmeåCorpusofwrittenSwedish(SUC[29]),which

isabalancedcorpusconsistingofctionalandnon-ctionaltexts,organizedinthesamewayas

theBrowncorpusofAmericanEnglish. Theexperimentisbasedonarandomsampleconsisting

ofabout4000words,madeupof16connectedsegments,andcontaining257sentencesintotal,

whichhavebeenmanuallyannotatedwithdependencygraphsbytheauthor.

TheStockholm-UmeåCorpusisannotatedforpartsofspeech(andmanuallycorrected),and

thetaggedsentenceswereusedas inputtotheparser. Thismeansthat thevocabularyofthe

grammarconsistsofword-tagpairs,although mostofthegrammarrulesusedonlytakeparts

of speech into account. The grammar used in the experiment is hand-craftedand contains a

totalof126 rules,dividedinto 90left-headedrules(oftheform w!w 0

)and36right-headed

(9)

Baseline 80.0 13.2

S/R 87.8 11.0

S/RA 89.0 10.6

Table1: Attachmentscores (meanandstandard deviation)

Parsing accuracywas measured bytheattachment score used byEisner [15] andCollins et

al.[7],whichiscomputedastheproportionofwords ina sentencethatis assignedthecorrect

head(or noheadif thewordis a root). Theoverall attachment score was then calculatedas

themeanattachmentscore overallsentencesinthesample.

Asmentionedintheprevious section, dierentschedulingpolicies fortheparsertransitions

yielddierent deterministicparsers. Inthis experiment,three dierent versions of theparser

werecompared:

The baseline parser uses the constant priority ordering of transitions dened in section 3

Left-Arc>Right-Arc>Reduce>Shift(cf. alsoFigure3).

The second parser, called S/R, retains the ordering Left-Arc > Right-Arc > Reduce,

Shiftbut usesthefollowingsimple ruletoresolveShift/Reduceconicts:

Ifthenodeontopofthestackcanbeatransitiveheadofthenextinputtoken(according

to thegrammar)thenShift;otherwiseReduce.

Thethirdparser,calledS/RA,inadditionusesasimplelookaheadforresolving

Shift/Right-Arc conicts, preferring a Shift-transition over a Right-Arc-transition in congurations

wherethetokenontopofthestackisa verbandthenexttokencouldbea post-modierof

thisverbbut couldalso be a pre-modier ofa followingtoken, as in thefollowing example

(cf.Figure1):

PN VB AB JJ NN

han målar extremt djärva tavlor

(he) (paints) (extremely) (bold) (pictures)

Making a Shift-transition in these cases is equivalent to a general preference for the

pre-modierinterpretation.

Table1showsthemeanattachmentscoreandstandarddeviationobtainedforthethreedierent

parsers. We can see that the parsing accuracy improves with the more complex scheduling

policies,alldierencesbeingstatisticallysignicantdespitetherelativelysmallsample(paired

t-test, = :05). There are no directlycomparable results for Swedish text, but Eisner [15]

reportsanaccuracyof90% forprobabilisticdependencyparsingofEnglishtext,sampledfrom

theWallStreetJournal sectionofthePennTreebank. Moreover,ifthecorrectpartof speech

tagsare given with the input, accuracy increases to almost 93%. Collins et al.[7] report an

accuracyof91%forEnglishtext(thesamecorpusasinEisner[15])and80%accuracyforCzech

text. GiventhatSwedishisintermediatebetweenEnglishandCzechwithregardtoinectional

(10)

is still very small. With a 95% condence interval, the accuracy of the best parser can be

estimatedto lie in the range8593%, so a conservativeconclusion is that a parsing accuracy

above85%isachievable.

5 Conclusion

Theparsing algorithm presented in this paperhas two attractiveproperties. It is robust, in

thesensethat itproducesaprojectiveandacyclicdependencygraphforanyinputstring,and

itisecient,producing these graphsin lineartime. Thecrucialquestion for furtherresearch

is whether the algorithm can achieve goodenough accuracy to be useful in practical parsing

systems. So far, it has been used with deterministic scheduling policies and simple directed

D-rules to achieve reasonable accuracy when parsing unrestricted Swedish text. Topics to

be investigated in the future include both alternative scheduling policies, possibly involving

stochasticmodels,andalternativegrammarformalisms,encodingstrongerconstraintson

well-formeddependencystructures.

References

[1] AlfredV.Aho,RaviSethi,andJereyD.Ullman. Compilers: Principles Techniques,and

Tools. AddisonWesley,1986.

[2] Cristina Barbero,LeonardoLesmo,VincenzoLombardo, andPaolaMerlo. Integrationof

syntacticandlexicalinformationinahierarchicaldependencygrammar.InSylvainKahane

and Alain Polguère, editors, Proceedings of the Workshop on Processing of

Dependency-BasedGrammars,pages5867,UniversitédeMontréal,Quebec,Canada,August1998.

[3] Glenn Carrolland EugeneCharniak. Two experimentsonlearning probabilistic

depend-encygrammarsfromcorpora.TechnicalReportTR-92,DepartmentofComputerScience,

BrownUniversity,1992.

[4] Michael Collins. A newstatistical parser basedon bigramlexical dependencies. In

Pro-ceedingsof the 34th Annatual Meeting of the Association for Computational Linguistics,

pages184191, SantaCruz,CA,1996.

[5] MichaelCollins. Threegenerative,lexicalisedmodelsforstatisticalparsing. InProceedings

of the35th AnnatualMeetingofthe AssociationforComputationalLinguistics,pages16

23,Madrid,Spain,1997.

[6] MichaelCollins.Head-DrivenStatisticalModelsforNaturalLanguageParsing.PhDthesis,

UniversityofPennsylvania, 1999.

[7] MichaelCollins,Jan Haji£,EricBrill,Lance Ramshaw,andChristophTillmann. A

(11)

parsing. In Sylvain Kahane and Alain Polguère, editors, Proceedings of the Workshop

on Processing of Dependency-Based Grammars, pages 95101, Université de Montréal,

Quebec,Canada,August1998.

[9] MichaelA.Covington. Adependencyparserforvariable-word-orderlanguages. Technical

ReportAI-1990-01,UniversityofGeorgia,Athens,GA,1990.

[10] Michael A. Covington. Discontinuous dependency parsing of free and xed word order:

Workinprogress. TechnicalReportAI-1994-02,UniversityofGeorgia,Athens,GA,1994.

[11] RalphDebusmann. Adeclarativegrammarformalismfordependencygrammar. Master's

thesis,ComputationalLinguistics,UniversitätdesSaarlandes,November2001.

[12] DenysDuchier. Axiomatizingdependencyparsingusing setconstraints. InSixth Meeting

on MathematicsofLanguage, pages115126, Orlando,Florida,July1999.

[13] DenysDuchier. Lexicalizedsyntaxandtopologyfornon-projectivedependencygrammar.

InJointConferenceonFormalGrammarsandMathematicsofLanguageFGMOL'01,

Hel-sinki, August2001.

[14] DenysDuchier.Congurationoflabeledtreesunderlexicalizedconstraintsandprinciples.

Journalof LanguageandComputation,1,2002.

[15] JasonM.Eisner. Anempiricalcomparisonofprobabilitymodelsfordependencygrammar.

Technical Report IRCS-96-11, Institute for Research in Cognitive Science, University of

Pennsylvania, 1996.

[16] JasonM.Eisner. Threenewprobabilisticmodelsfordependencyparsing: Anexploration.

InProceedingsofCOLING-96,Copenhagen,1996.

[17] Jason M. Eisner. Bilexicalgrammars and theircubic-time parsing algorithms. In Harry

BuntandAntonNijholt,editors,AdvancesinProbabilisticandOtherParsingTechnologies.

Kluwer,2000.

[18] HaimGaifman. Dependencysystemsandphrase-structuresystems.Informationand

Con-trol,8:304337,1965.

[19] David G. Hays. Dependency theory: A formalism and some observations. Language,

40:511525,1964.

[20] RichardA.Hudson. EnglishWordGrammar. Blackwell,1990.

[21] MitchellP.Marcus. A Theory ofSyntacticRecognition forNaturalLanguage. MITPress,

1980.

[22] HiroshiMaruyama.Structuraldisambiguationwithconstraintpropagation.InProceedings

of the 28thACL,pages3138,Pittsburgh,PA, 1990.

(12)

ing graded constraints. In Sylvain Kahane and Alain Polguère, editors, Proceedings of

the Workshop onProcessingof Dependency-BasedGrammars, pages7887, Universitéde

Montréal,Quebec,Canada,August1998.

[25] ChristerSamuelsson. Astatisticaltheoryofdependencysyntax. InProceedings

COLING-2000.MorganKaufmann,2000.

[26] Petr Sgall, Eva Hajicova, and Jarmila Panevova. The Meaning of the Sentence in Its

PragmaticAspects. Reidel,1986.

[27] Daniel Sleator and Davy Temperley. Parsing English with a link grammar. Technical

ReportCMU-CS-91-196,CarnegieMellon University,ComputerScience, 1991.

[28] Daniel Sleator and Davy Temperley. Parsing English with a link grammar. In Third

International WorkshoponParsing Technologies,1993.

[29] Stockholm Umeå Corpus. Version 1.0. Produced by Department of Linguistics, Umeå

University and Department of Linguistics, Stockholm University. ISBN 91-7191-348-3.,

August1997.

References

Related documents