DEPENDENCY PARSING
Joakim Nivre
Växjö University
School of Mathematics and Systems Engineering
SE-35195VÄXJÖ
Abstract
Thispaperpresentsadeterministicparsingalgorithmforprojectivedependencygrammar. The
runningtimeofthealgorithmislinearinthelengthoftheinputstring,andthedependencygraph
produced is guaranteed to be projective and acyclic. The algorithm has been experimentally
evaluated in parsing unrestricted Swedish text, achieving anaccuracy above 85% with a very
simplegrammar.
1 Introduction
Despitea long andvenerabletradition in linguistics,dependencygrammar hasuntilquite
re-centlyplayed a fairly marginal role in natural language parsing. However,dependency-based
representationshaveturnedouttobeusefulin statisticalapproachestoparsing and
disambig-uation(see,e.g.,Collins[4,5,6],Eisner[15 ,16,17],Samuelsson[25])andtheyalsoappearwell
suitedforlanguageswithlessrigidwordorderconstraints(Covington[9,10],Collinsetal.[7]).
Several dierent parsing techniques havebeen used with dependency grammar. The most
common approach is probably to use some version of the dynamic programming algorithms
familiar from context-freeparsing, withor without statistical disambiguation (Eisner[15, 16,
17], Barbero et al.[2 ], Courtin and Genthial [8], Samuelsson [25]). Another school proposes
thatdependencyparsing should be castas a constraint satisfactionproblemand solvedusing
constraintprogramming(Maruyama[22],Menzeland Schröder[24],Duchier[12,13,14]).
HereI willinsteadpursue a deterministicapproachto dependencyparsing, similarto
shift-reduceparsingforcontext-freegrammar. Inthepast,deterministicparsingofnaturallanguage
hasmostlybeenmotivated bypsycholinguisticconcerns,as in thewell-known Parsifalsystem
of Marcus [21]. However, deterministic parsing also has the more direct practical advantage
ofproviding very ecient disambiguation. If thedisambiguation can be performedwith high
accuracyandrobustness,deterministicparsingbecomesaninterestingalternativetomore
tra-ditionalalgorithms for natural language parsing. There are potential applications of natural
languageparsing,forexampleininformationretrieval,whereitmaynotalwaysbenecessaryto
constructacompletedependencystructureforasentence,aslongasdependencyrelationscan
beidentiedwithgoodenoughaccuracy,butwhereeciencycan bevitalbecauseofthelarge
amountofdata tobe processed. It mayalsobe possibleto improveparseaccuracybyadding
PP
På
(In ?
NN
60-talet
the-60's
?
VB
målade
painted PN
han
he
?
JJ
djärva
bold
?
NN
tavlor
pictures ?
HP
som
which
?
VB
retade
annoyed
?
PM
Nikita
Nikita
?
PM
Chrusjtjov.
Chrustjev.) ?
Figure1: DependencygraphforSwedish sentence
andshallowprocessing. Itis a kindofdeepprocessingin that thegoalisto builda complete
syntacticanalysisfortheinputstring,notjustidentifybasic constituentsasinpartialparsing.
Butitresemblesshallowprocessingin beingrobust,ecientanddeterministic.
In this paper, I present an algorithm that produces projective dependency graphs
deter-ministicallyinlineartime. Preliminaryexperimentsindicatethatparsing accuracyabove85%
forunrestricted text is attainable with a very simple grammar. Insection 2, I introducethe
necessaryconceptsfromdependencygrammarandthespecic formofgrammarrulesrequired
by theparser. Insection3,I dene thealgorithmand provethat itsworstcaserunningtime
is linear in the size of the input; I also prove that the dependency graphs produced by the
algorithm are projective and acyclic. In section 4, I present preliminary results concerning
theparsing accuracywhen applied to unrestricted Swedish text, using a simple hand-crafted
grammar. Insection5,Iconcludewithsome suggestionsforfurtherresearch.
2 Projective Dependency Grammar
The linguistic tradition of dependency grammar comprises a large and fairly diverse family
ofgrammatical theoriesand formalisms that share certain basic assumptions aboutsyntactic
structure,inparticulartheassumptionthatsyntacticstructureconsistsoflexical nodes linked
by binaryrelations called dependencies (see, e.g.,Tesnière[30], Sgall etal. [26], Mel'£uk [23],
Hudson [20]). Thus, the common formal property of dependency structures, as compared to
therepresentationsbasedonconstituency(orphrase structure),isthelackofphrasal nodes.
In a dependency structure, every lexical node is dependent on at most one other lexical
node, usuallycalled its head or regent, which meansthat thestructure can be represented as
a directed graph, with nodes representing lexical elementsand arcs representing dependency
relations. Figure1 showsadependencygraphforasimple Swedish sentence,where eachword
ofthesentenceislabeledwithitspartofspeech.
Overandabovethese minimal assumptions, dierentvarietiesof dependencygrammar
im-pose dierent conditions on dependency graphs. (In fact, not even the constraint that each
lexicalnodehasatmostoneheadisadopteduniversally;cf.Hudson[20].) Thefollowingthree
constraintsarecommonin theliteratureandwill allbe adoptedinthesequel:
1. Dependencygraphsshould beacyclic.
2. Dependencygraphsshould beconnected.
3. Dependencygraphsshould beprojective.
(painted)astherootnode.
Whiletheconditionsofacyclicityandconnectednessareadoptedinmostversionsof
depend-encygrammar, theprojectivity constraint is more controversial and can in factbe seen as a
major branchpointwithin this tradition. However,mostresearchersseemto agreethat,even
ifthe assumption of projectivity is questionable from a theoretical point of view, itis
never-thelessa reasonableapproximationin practicalparsing systems. Firstof all,itmustbe noted
thatprojectivityisnota propertyofthedependencygraphin itselfbutonlyin relationtothe
linearorderingof tokensin thesurface string. Several dierentdenitions of projectivity can
be foundin theliterature, but theyare allroughly equivalent (see,e.g., Mel£uk [23], Hudson
[20],SleatorandTemperley[27,28]). IwillfollowHudson [20]anddeneprojectivityinterms
ofanextendednotionofadjacency:
1. Adependencygraph isprojective ievery dependentnodeisgraph adjacent toitshead.
2. Two nodesnandn 0
aregraph adjacent ieverynoden 00
occurring betweennand n 0
inthe
surfacestringisdominatedbynorn 0
in thegraph.
Notethat projectivitydoesnotimplythat thegraphis connected;nordoesitimply thatthe
graphisacyclic(althoughitdoesexcludecyclesoflengthgreaterthan2).
Wearenowin apositiontodene whatwemeanbyawell-formed dependencygraph:
Astring ofwords W isrepresented asa listoftokens, whereeach tokenn=(i;w)is apair
consisting of a position i and a word form w; the functional expressions pos(n) = i and
lex(n)=wcanbe usedtoextractthepositionandwordform ofatoken. Welet<denote
thecompleteandstrict orderingoftokensin W,i.e.n<n 0
ipos(n)<pos(n 0
).
A dependency graph for W is a directed graph D =(N
W
;A), where the set of nodes N
W
is the set of tokens in W, and the arc relation A is a binary, irreexive relation on N
W .
Wewrite n ! n 0
to say that there is an arc from n to n 0
, i.e. (n;n 0
) 2 A; we use !
to
denotethe reexiveandtransitive closureof thearc relationA; and weuse $and $
for
thecorrespondingundirectedrelations,i.e.n$n 0
in!n 0
orn 0
!n.
Adependencygraph D=(N
W
;A)iswell-formedithefollowingconditions aresatised:
Singlehead (8nn 0
n 00
)(n!n 0
^ n 00
!n 0
) ) n=n 00
Acyclic (8nn
0
):(n!n 0 ^n 0 ! n) Connected (8nn 0
)n$ n 0 Projective (8nn 0 n 00
)(n$n 0
^ n<n 00
<n 0
) ) (n! n 00 _ n 0 ! n 00 )
Finally, letus note that alldependencygraphs considered in this paperare unlabeled,in the
sensethattheyhavenolabelsontheedgesrepresentingdierentkindsofdependencyrelations
(suchassubject,object,adverbial, etc.).
Mostformalizations of dependencygrammar use rules that specifywhole congurations of
dependents for a given head, using some notion of valence frames (Hays [19], Gaifman [18],
De-whereT isa (terminal)vocabulary (setofwordforms)andR isa setofrulesofthefollowing
form(wherew;w 0
2T):
w w
0
w ! w
0
Givena stringofwordswith nodesnandn 0
suchthat lex(n)=w,lex(n 0
)=w 0
andn<n 0
,
therstrule saysthat n 0
can be the headof n, whilethe second rulesays that ncan be the
headofn 0
. Rulesof therstkindare referred toas right-headed,rulesof thesecond kindas
left-headed. ThegrammarrulesusedhereareverysimilartoCovington's[9]notionofD-rules,
exceptthatthelatterareundirected,andI willthereforecallthemdirectedD-rules. Directed
D-rules are also similar to but simpler than the dependency relations of Courtin and
Genthial[8].
Finally, it is worth pointing out that, even though the parsing algorithm dened in the
nextsectionpresupposesdirected D-rules,this doesnotexcludeitsusewithmore traditional
dependencygrammars,sinceitisusuallystraightforwardtoextractdirectedD-rulesfromthese
grammars. For example, given a grammar in the formalism of Hays [19 ], we can construct a
setofdirectedD-rulesinthefollowingway:
Forevery ruleX(X
m
X
1 X
1
X
n
)intheoriginalgrammar,introducerulesw
l w
andw!w
r
forevery w2X;w
l 2X
m
[[X
1 ;w
r 2X
1
[[X
n .
However, using the parsing algorithm dened in the next section with the directed D-rules
extractedin this way willnotresultin a parser forthe originalgrammar, sincetheextracted
rulesnormallyimposemuch weakerconstraintsonthedependencystructurethantheoriginal
rules.
3 Parsing Algorithm
Theparsingalgorithmpresentedinthissectionisinmanywayssimilartothebasicshift-reduce
algorithm forcontext-free grammars(Aho etal. [1]),although the parse actionsare dierent
giventhat we areusing directedD-rules insteadofcontext-freegrammar rules. For example,
sincethere arenononterminalsymbols inthegrammar,thestackwillnevercontainanything
butinput tokens (graphnodes), anda reduce actionsimplyamountsto popping thetopmost
elementfromthestack.
Parser congurations arerepresentedby tripleshS;I;Ai, where S is thestack (represented
as a list), I is the list of (remaining) input tokens, and A is the (current) arc relation for
the dependency graph. (The set of of nodes N
W
in the dependency graph is given by the
inputstring,as dened inthepreceding section,andneednotberepresentedexplicitlyinthe
conguration.) Givenaninputstring W,theparserisinitializedto hnil;W;;iandterminates
whenitreachesacongurationhS;nil;Ai(foranylistSandsetofarcsA). TheinputstringW
isacceptedifthedependencygraphD=(N
W
;A)givenatterminationiswell-formed;otherwise
W isrejected. ThebehavioroftheparserisdenedbythetransitionsdenedinFigure2(where
Termination hS;nil;Ai
Left-Arc hnjS;n 0
jI;Ai!hS;n 0
jI;A[f(n 0
;n)gi lex(n) lex(n 0
)2R
:9n 00
(n 00
;n)2A
Right-Arc hnjS;n 0
jI;Ai!hn 0
jnjS;I;A[f(n;n 0
)gi lex(n)!lex(n 0
)2R
:9n 00
(n 00
;n 0
)2A
Reduce hnjS;I;Ai!hS;I;Ai 9n
0
(n 0
;n)2A
Shift hS;njI;Ai!hnjS;I;Ai
Figure 2: Parsertransitions
ThetransitionLeft-Arcaddsanarcn 0
!nfromthenextinputtokenn 0
to thenodenon
topofthe stack and reduces(pops) n from thestack, providedthat the grammar contains
therule lex(n) lex(n 0
)and provided that the graph doesnot contain an arc n 00
! n.
Thereason that the dependent node is immediately reduced isto eliminatethe possibility
ofadding an arc n! n 0
(should the grammarcontain the rulelex(n) !lex(n 0
)),which
wouldcreatea cycleinthegraph.
Thetransition Right-Arc addsan arcn! n 0
from the noden ontop ofthe stack tothe
nextinput token n 0
, licensed byan appropriate grammar rule, and shifts(pushes) n 0
onto
thestack. Thereasonthat thedependentnode isimmediatelyshiftedisthesame as inthe
precedingcase,i.e.topreventthecreationofcycles. (Sincen 0
mustbereducedbeforencan
becometopmostagain,nofurtherarclinking thesenodescaneverbeadded.)
ThetransitionReducesimplyreduces(pops)thenodenontopofthestack,providedthat
this node hasa head. This transition is needed for cases where a single headhas multiple
dependentson theright, in which case thecloserdependentnodes must be reducedbefore
arcscanbeaddedtothemoredistantones. Theconditionthatthereducednodehasahead
isnecessary to ensurethat the dependency graph is projective, sinceotherwisethere could
beungovernednodesbetweenaheadanditsdependent.
Thetransition Shift, nally, simply shifts(pushes) thenext inputtoken n onto the stack.
Thistransition is neededfor cases where a rightdependenthasits own left dependents, in
whichcasethesedependentshavetobereduced(byLeft-Arctransitions)beforethearccan
beaddedfromtheheadtotherightdependent. Moreover,theShifttransition,whichhasno
conditionassociatedwith itexcept thatthe inputlistis non-empty, is neededto guarantee
termination.
The transitions Left-Arc and Reduce have in common that they reduce the stack size by
pop-(in) (the-60's) (painted) (he) (pictures) på målade,
målade !han,
målade !tavlor g
hnil;på60-taletmålade hantavlor;;i
S
! hpå;60-taletmålade hantavlor;;i
RA
! h60-taletpå;målade hantavlor;f(på;60-talet)gi
R
! hpå;måladehantavlor;f(på;60-talet)gi
LA
! hnil;måladehan tavlor;f(på;60-talet);(målade;på)gi
S
! hmålade ;hantavlor;f(på;60-talet);(målade;på)gi
RA
! hhanmålade;tavlor;f(på;60-talet);(målade;på);(målade;han )gi
R
! hmålade ;tavlor;f(på;60-talet);(målade;på);(målade;han)gi
RA
! htavlormålade ;nil;f(på;60-talet);(målade;på);(målade;han);(målade;tavlor)gi
Figure3: ParseofSwedish sentence
theyreduce thelengthof theinput listby1 and increases thesize of thestack by1 and will
thereforebe calledpush-transitions.
Asitstands,this transitionsystemisnondeterministic,sinceseveraltransitionsoftenapply
tothesame conguration,and in orderto geta deterministicparser weneedto impose some
schedulingpolicyontopofthesystem. Thesimplestwaytodothisistouseaconstantpriority
orderingoftransitions Left-Arc>Right-Arc>Reduce >Shift(wherea>b meansthat
ahashigherprioritythanb). Figure3showshowareducedvariantoftheSwedish sentencein
Figure 1 can be parsed using this priorityordering (with theobvious transition labels LA =
Left-Arc,RA=Right-Arc,R=Reduce,S=Shift).
Decidinghowtoresolvethese conictsiscrucialfrom thepointofviewofparsingaccuracy,
as we will see in the next section, but it does not aect the basic properties of the parser
withrespecttorunningtimeand well-formednessofthedependencygraphproduced. Forthe
remainder of this section, I will therefore simply assume that there exists some unspecied
determinization of the parser (perhaps randomized) such that exactly one of the permissible
transitionsischosenforevery non-terminalconguration.
Proposition 1: Givenan inputstring W of lengthn, theparser terminates after at most
2ntransitions.
Proof of Proposition 1: Werstshowthata transitionsequence startingfromaninitial
conguration hnil;W;;i can contain at most n push-transitions and that such a sequence
mustbeterminating:
1. push-transitions have as a precondition that the currentinput listis non-empty and
de-creasesitslengthby1. Sincetherearenotransitionsthatincreasethelengthoftheinput
list, the maximum number of push-transitions is n. For the same reason, a transition
npop-transitions:
2. pop-transitionshaveasapreconditionthatthestackS isnon-emptyandhaveasaneect
that thesize ofS is decreased by1. Since theinitial conguration has anemptyS, and
since the onlytransitions that increase thesize ofS are push-transitionsof which there
can benomorethanninstances,itfollowsthatthemaximumnumberof pop-transitions
is alson.
Given that the number of pop-transitions is bounded by the number of push-transitions,
andgiventhatat leastone push-transition(Shift)isapplicabletoevery non-terminal
con-guration, weconclude that,for every initialconguration hnil;W;;i with aninputlist W
oflengthn, thereexists a transition sequencecontainingexactlynpush-transitionsand at
mostnpop-transitionsleading toa terminalcongurationhS;nil;Ai. 2
ThepracticalsignicanceofProposition 1isthat theparserisguaranteed tobe bothecient
androbust. Aslongaseachtransitioncanbeperformedinconstanttime,whichonlyrequiresa
reasonablechoiceofdatastructuresforlookupofgrammarrulesandgrapharcs,theworstcase
runningtimeof thealgorithm will be linearin thelengthof theinput. Moreover, evenifthe
parserdoesnotsucceedinbuildingawell-formeddependencygraph,itisalwaysguaranteedto
terminate.
Proposition2: ThedependencygraphG=(N
W
;A)givenatparserterminationis
project-iveandacyclic.
Proof of Proposition 2: For projectivity we need to show that, for every input string
W,ifhnil;W;;i!
hS;nil;Aithenthegraph (N
W
;A)is projective,whichmeansthat,for
every pair(n;n 0
)2A, nand n 0
aregraph adjacentin G. Assume hnil;W;;i!
hS;nil;Ai
andassume(n;n 0
)2A. Sincegraph adjacencyisasymmetricrelation,wecan withoutloss
ofgenerality assumethat n<n 0
. Thenwehave hnil;W;;i! hS 0 ;njn 1 jjn k jn 0 jI;A 0 i! hnjS 0 ;n 1 jjn k jn 0 jI;A 00 i! hnjS 0 ;n 0 jI;A 000 i!
hS;nil;Ai. Whatweneedto showis that
allof n
1 ;:::;n
k
are dominatedby n or n 0
in A, but since noarcs involving n
1 ;:::;n
k can
be in A 00
or A A 000
, it is sucient to show that this holds in A 00
A 000
. Werst observe
thatthereduction ofk nodesrequires exactly2k transitions,sinceevery reduction requires
aShiftfollowed by aLeft-Arc, ora Right-Arc followed bya Reduce(although thetwo
transitionsin each pairneed notbe adjacentin the sequence). Let (k) be theclaim that
ifhnjS 0 ;n 1 jjn k jn 0 jI;A 00 i! 2k hnjS 0 ;n 0 jI;A 000
i then all of n
1 ;:::;n
k
are dominated by n
or n 0
in A 000
, parameterized on the number k of nodes between n and n 0
. We nowuse an
inductiveproofto showthat(k)holdsforallk0:
1. Basis: Ifk=0thennandn 0
arestring adjacentand(k)holdsvacuously.
2. Induction: Assume (k) (k 0) and assume that hnjS 0 ;n 1 jjn k jn 0 jI;A 00 i ! 2(k+1) hnjS 0 ;n 0 jI;A 000
i. We beginbynoting that therst transition in this sequence must be a
push-transition, becauseotherwisenwouldbe reducedandwould notbeonthestackin
thenalconguration. Hence,weneedtoconsider two cases:
(a) IfthersttransitionisRight-Arc,thenn
1
hnjS;n
1 jjn
k
jnjI;A i !
hn 1 jnjS 0 ;n 2 jjn k jn 0 jI;A 00
[f(n;n
1 )gi ! 2m hn 1 jnjS 0 ;n 2+m jjn k jn 0 jI;A 0000 i ! hnjS 0 ;n 2+m jjn k jn 0 jI;A 0000 i ! 2(k+1) (2m+2) hnjS 0 ;n 0 jI;A 000 i
Since2(k+1) (2m+2)2k, wecan usetheinductivehypothesisto inferthat nand
n 0
areadjacent.
(b) Ifthersttransition isShift,then n
1
mustbe reducedwitha Left-Arctransitionand
thereexistssome m0suchthat:
hnjS 0 ;n 1 jjn k jn 0 jI;A 00 i ! hn 1 jnjS 0 ;n 2 jjn k jn 0 jI;A 00 i ! 2m hn 1 jnjS 0 ;n 2+m jjn k jn 0 jI;A 000 i ! hnjS 0 ;n 2+m jjn k jn 0 jI;A 000 [f(n 2+m ;n 1 )gi ! 2(k+1) (2m+2) hnjS 0 ;n 0 jI;A 000 i
Again,itfollowsfromtheinductivehypothesisthat nand n 0
areadjacent.
Theproofthat(N
W
;A)isfreeofcyclesisanalogoustotheproofofprojectivity,exceptthat
weconsidertheclaim (k)thatifhnjS 0 ;n 1 jjn k jn 0 jI;A 00 i! 2k hnjS 0 ;n 0 jI;A 000
ithenthere
isnopathfromnton 0
orfromn 0
ton,againparameterizedonthenumberkofnodesbetween
nandn 0
. HerethebasecasefollowsfromthedenitionofLeft-ArcandRight-Arc,which
preventscyclesoflength2. 2
InvirtueofProposition2,weknowthatevenifastringW isrejectedbytheparser,theresulting
dependency graph (N
W
;A) is projective and acyclic, although notnecessarily connected. In
fact,thegraph willconsist ofanumberofconnectedcomponents,each ofwhich formsa
well-formed dependency graph for a substring of W. This is importantfrom thepoint of viewof
robustness,sinceeachofthese componentsrepresenta partialanalysisoftheinputstring.
4 Empirical Evaluation
Inorder to estimatethe parsing accuracythat can be expectedwith thealgorithm described
intheprecedingsectionandagrammarconsistingofdirectedD-rules,asmallexperimentwas
performedusingdatafromtheStockholm-UmeåCorpusofwrittenSwedish(SUC[29]),which
isabalancedcorpusconsistingofctionalandnon-ctionaltexts,organizedinthesamewayas
theBrowncorpusofAmericanEnglish. Theexperimentisbasedonarandomsampleconsisting
ofabout4000words,madeupof16connectedsegments,andcontaining257sentencesintotal,
whichhavebeenmanuallyannotatedwithdependencygraphsbytheauthor.
TheStockholm-UmeåCorpusisannotatedforpartsofspeech(andmanuallycorrected),and
thetaggedsentenceswereusedas inputtotheparser. Thismeansthat thevocabularyofthe
grammarconsistsofword-tagpairs,although mostofthegrammarrulesusedonlytakeparts
of speech into account. The grammar used in the experiment is hand-craftedand contains a
totalof126 rules,dividedinto 90left-headedrules(oftheform w!w 0
)and36right-headed
Baseline 80.0 13.2
S/R 87.8 11.0
S/RA 89.0 10.6
Table1: Attachmentscores (meanandstandard deviation)
Parsing accuracywas measured bytheattachment score used byEisner [15] andCollins et
al.[7],whichiscomputedastheproportionofwords ina sentencethatis assignedthecorrect
head(or noheadif thewordis a root). Theoverall attachment score was then calculatedas
themeanattachmentscore overallsentencesinthesample.
Asmentionedintheprevious section, dierentschedulingpolicies fortheparsertransitions
yielddierent deterministicparsers. Inthis experiment,three dierent versions of theparser
werecompared:
The baseline parser uses the constant priority ordering of transitions dened in section 3
Left-Arc>Right-Arc>Reduce>Shift(cf. alsoFigure3).
The second parser, called S/R, retains the ordering Left-Arc > Right-Arc > Reduce,
Shiftbut usesthefollowingsimple ruletoresolveShift/Reduceconicts:
Ifthenodeontopofthestackcanbeatransitiveheadofthenextinputtoken(according
to thegrammar)thenShift;otherwiseReduce.
Thethirdparser,calledS/RA,inadditionusesasimplelookaheadforresolving
Shift/Right-Arc conicts, preferring a Shift-transition over a Right-Arc-transition in congurations
wherethetokenontopofthestackisa verbandthenexttokencouldbea post-modierof
thisverbbut couldalso be a pre-modier ofa followingtoken, as in thefollowing example
(cf.Figure1):
PN VB AB JJ NN
han målar extremt djärva tavlor
(he) (paints) (extremely) (bold) (pictures)
Making a Shift-transition in these cases is equivalent to a general preference for the
pre-modierinterpretation.
Table1showsthemeanattachmentscoreandstandarddeviationobtainedforthethreedierent
parsers. We can see that the parsing accuracy improves with the more complex scheduling
policies,alldierencesbeingstatisticallysignicantdespitetherelativelysmallsample(paired
t-test, = :05). There are no directlycomparable results for Swedish text, but Eisner [15]
reportsanaccuracyof90% forprobabilisticdependencyparsingofEnglishtext,sampledfrom
theWallStreetJournal sectionofthePennTreebank. Moreover,ifthecorrectpartof speech
tagsare given with the input, accuracy increases to almost 93%. Collins et al.[7] report an
accuracyof91%forEnglishtext(thesamecorpusasinEisner[15])and80%accuracyforCzech
text. GiventhatSwedishisintermediatebetweenEnglishandCzechwithregardtoinectional
is still very small. With a 95% condence interval, the accuracy of the best parser can be
estimatedto lie in the range8593%, so a conservativeconclusion is that a parsing accuracy
above85%isachievable.
5 Conclusion
Theparsing algorithm presented in this paperhas two attractiveproperties. It is robust, in
thesensethat itproducesaprojectiveandacyclicdependencygraphforanyinputstring,and
itisecient,producing these graphsin lineartime. Thecrucialquestion for furtherresearch
is whether the algorithm can achieve goodenough accuracy to be useful in practical parsing
systems. So far, it has been used with deterministic scheduling policies and simple directed
D-rules to achieve reasonable accuracy when parsing unrestricted Swedish text. Topics to
be investigated in the future include both alternative scheduling policies, possibly involving
stochasticmodels,andalternativegrammarformalisms,encodingstrongerconstraintson
well-formeddependencystructures.
References
[1] AlfredV.Aho,RaviSethi,andJereyD.Ullman. Compilers: Principles Techniques,and
Tools. AddisonWesley,1986.
[2] Cristina Barbero,LeonardoLesmo,VincenzoLombardo, andPaolaMerlo. Integrationof
syntacticandlexicalinformationinahierarchicaldependencygrammar.InSylvainKahane
and Alain Polguère, editors, Proceedings of the Workshop on Processing of
Dependency-BasedGrammars,pages5867,UniversitédeMontréal,Quebec,Canada,August1998.
[3] Glenn Carrolland EugeneCharniak. Two experimentsonlearning probabilistic
depend-encygrammarsfromcorpora.TechnicalReportTR-92,DepartmentofComputerScience,
BrownUniversity,1992.
[4] Michael Collins. A newstatistical parser basedon bigramlexical dependencies. In
Pro-ceedingsof the 34th Annatual Meeting of the Association for Computational Linguistics,
pages184191, SantaCruz,CA,1996.
[5] MichaelCollins. Threegenerative,lexicalisedmodelsforstatisticalparsing. InProceedings
of the35th AnnatualMeetingofthe AssociationforComputationalLinguistics,pages16
23,Madrid,Spain,1997.
[6] MichaelCollins.Head-DrivenStatisticalModelsforNaturalLanguageParsing.PhDthesis,
UniversityofPennsylvania, 1999.
[7] MichaelCollins,Jan Haji£,EricBrill,Lance Ramshaw,andChristophTillmann. A
parsing. In Sylvain Kahane and Alain Polguère, editors, Proceedings of the Workshop
on Processing of Dependency-Based Grammars, pages 95101, Université de Montréal,
Quebec,Canada,August1998.
[9] MichaelA.Covington. Adependencyparserforvariable-word-orderlanguages. Technical
ReportAI-1990-01,UniversityofGeorgia,Athens,GA,1990.
[10] Michael A. Covington. Discontinuous dependency parsing of free and xed word order:
Workinprogress. TechnicalReportAI-1994-02,UniversityofGeorgia,Athens,GA,1994.
[11] RalphDebusmann. Adeclarativegrammarformalismfordependencygrammar. Master's
thesis,ComputationalLinguistics,UniversitätdesSaarlandes,November2001.
[12] DenysDuchier. Axiomatizingdependencyparsingusing setconstraints. InSixth Meeting
on MathematicsofLanguage, pages115126, Orlando,Florida,July1999.
[13] DenysDuchier. Lexicalizedsyntaxandtopologyfornon-projectivedependencygrammar.
InJointConferenceonFormalGrammarsandMathematicsofLanguageFGMOL'01,
Hel-sinki, August2001.
[14] DenysDuchier.Congurationoflabeledtreesunderlexicalizedconstraintsandprinciples.
Journalof LanguageandComputation,1,2002.
[15] JasonM.Eisner. Anempiricalcomparisonofprobabilitymodelsfordependencygrammar.
Technical Report IRCS-96-11, Institute for Research in Cognitive Science, University of
Pennsylvania, 1996.
[16] JasonM.Eisner. Threenewprobabilisticmodelsfordependencyparsing: Anexploration.
InProceedingsofCOLING-96,Copenhagen,1996.
[17] Jason M. Eisner. Bilexicalgrammars and theircubic-time parsing algorithms. In Harry
BuntandAntonNijholt,editors,AdvancesinProbabilisticandOtherParsingTechnologies.
Kluwer,2000.
[18] HaimGaifman. Dependencysystemsandphrase-structuresystems.Informationand
Con-trol,8:304337,1965.
[19] David G. Hays. Dependency theory: A formalism and some observations. Language,
40:511525,1964.
[20] RichardA.Hudson. EnglishWordGrammar. Blackwell,1990.
[21] MitchellP.Marcus. A Theory ofSyntacticRecognition forNaturalLanguage. MITPress,
1980.
[22] HiroshiMaruyama.Structuraldisambiguationwithconstraintpropagation.InProceedings
of the 28thACL,pages3138,Pittsburgh,PA, 1990.
ing graded constraints. In Sylvain Kahane and Alain Polguère, editors, Proceedings of
the Workshop onProcessingof Dependency-BasedGrammars, pages7887, Universitéde
Montréal,Quebec,Canada,August1998.
[25] ChristerSamuelsson. Astatisticaltheoryofdependencysyntax. InProceedings
COLING-2000.MorganKaufmann,2000.
[26] Petr Sgall, Eva Hajicova, and Jarmila Panevova. The Meaning of the Sentence in Its
PragmaticAspects. Reidel,1986.
[27] Daniel Sleator and Davy Temperley. Parsing English with a link grammar. Technical
ReportCMU-CS-91-196,CarnegieMellon University,ComputerScience, 1991.
[28] Daniel Sleator and Davy Temperley. Parsing English with a link grammar. In Third
International WorkshoponParsing Technologies,1993.
[29] Stockholm Umeå Corpus. Version 1.0. Produced by Department of Linguistics, Umeå
University and Department of Linguistics, Stockholm University. ISBN 91-7191-348-3.,
August1997.