D ep en d en cy Parsing
N o r m a n M a c A sk ill F raser
Thesis sub m itted for the degree o f P h D U n iversity College London
ProQuest Number: 10106699
All rights reserved
INFORMATION TO ALL USERS
The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
uest.
ProQuest 10106699
Published by ProQuest LLC(2016). Copyright of the Dissertation is held by the Author.
All rights reserved.
This work is protected against unauthorized copying under Title 17, United States Code. Microform Edition © ProQuest LLC.
ProQuest LLC
789 East Eisenhower Parkway P.O. Box 1346
A b stra ct
Syntactic structure can be expressed in terms of either constituency or de pendency. Constituency relations hold between phrases and their constituent lexical or phrasal parts. Dependency relations hold between individual words.
Almost all results in formal language theory relate to constituency grammars,
of which the phrase structure grammars are best known. In the realm of n atu
ral language description, almost all m ajor linguistic theories express syntactic
structure in terms of constituency. This dominance carries over into natural
language processing, where most parsers are designed to discover the vertical
constituency relations which hold between words and phrases, rather than the
horizontal dependency relations which hold between pairs of words.
This thesis introduces dependency grammars, their formal properties, their
origins in linguistic theory and, particularly, their use in parsers for natural lan
guage processing. A survey of dependency parsers — the most comprehensive
to date — is presented. It includes detailed discussions of twelve published de
pendency parsing algorithms. The survey highlights similarities and differences
between dependency parsing and m ainstream phrase structure gram m ar pars
ing. In particular, it examines the hypotheses th at (i) it is possible to construct
a fully functional dependency parser based on an established phrase structure
parsing algorithm without altering any fundamental aspects of the algorithm,
and (ii) it is possible to construct a fully functional dependency parser using
an algorithm which could not be applied without substantial modification in
a fully functional phrase structure parser.
Elements of a taxonomy of dependency parsing are outlined. These include
variables in origin, manner, order, and focus of search, as well as in the number
of passes made during parsing, techniques for the management of ambiguity,
and the use of an adjacency constraint to limit search.
Com puter implementations of a number of original dependency parsing
algorithms are presented in an Appendix, together with new implementations
C ontents
A cknow ledgem ents 13
A bbreviations 15
1 Introduction 16
1.1 Scope of the t h e s i s ...16
1.2 C hapter o u t l i n e ... 22
2 D ep en d en cy gram mar 23 2.1 O verview ... 23
2.2 Gaifman g r a m m a r s ... 24
2.2.1 D efin itio n s...24
2.2.2 A recognizer for Gaifman g ra m m a rs ... 30
2.2.3 Representing dependency s tru c tu re s ...32
2.2.4 T he generative capacity of Gaifman g ra m m a rs ... 36
2.3 Beyond Gaifman grammars ... 41
2.4 Origins in Hnguistic t h e o r y ... 43
2.5 Related gram m atical fo rm alism s...51
2.5.1 Case g ra m m a r... 52
2.5.2 Categorial gram m ar ... 53
2.5.3 Head-driven phrase structure g r a m m a r ... 57
2.6 S u m m a r y ... 58
3.1.1 M achine tran slatio n s y s t e m s ... 61
3.1.2 Speech understanding s y s te m s ... 63
3.1.3 O th er a p p l i c a t i o n s ... 64
3.1.4 Im plem entations of t h e o r i e s ... 64
3.1.5 E xploratory s y s t e m s ... 65
3.2 PARS: Parsing A lgorithm R epresentation S c h e m e ...69
3.2.1 D a ta s t r u c t u r e s ... 69
3.2.2 E x p r e s s io n s ... 71
3.3 S u n u n a r y ...75
4 T h e R A N D parsers 76 4.1 O v erv iew ... 76
4.2 T h e b o tto m -u p a l g o r i t h m ... 78
4.2.1 Basic p r in c ip le s ... 78
4.2.2 T h e parsing a l g o r i t h m ...79
4.3 T h e top-dow n a l g o r i t h m ...85
4.3.1 T h e parsing a l g o r i t h m ...85
4.4 S u m m a r y ...88
5 H e llw ig ’s P L A IN s y ste m 90 5.1 O v erv iew ... 90
5.2 D ependency R epresentation Language ... 91
5.2.1 T h e form of D RL e x p r e s s io n s ... 91
5.2.2 W ord order c o n s tr a in t s ... 94
5.2.3 T h e base l e x i c o n ... 96
5.2.4 T h e valency lexicon . ... 96
5.3 T h e parsing a l g o r i t h m ... 98
5.4 T h e well-formed su b strin g t a b l e ... 102
6 T h e K ielik on e parser 107
6.1 O v erv iew ... 107
6.2 Evolution of th e p a r s e r ... 109
6.2.1 T h e earliest version: two way finite a u t o m a t a ...109
6.2.2 A gram m ar represen tatio n language: D P L ... 113
6.2.3 C o nstrain t based gram m ar: F U N D P L ... 115
6.3 T h e p a r s e r ... 120
6.3.1 T h e g r a m m a r ... 120
6.3.2 B lackboard-based c o n tro l... 121
6.3.3 T h e parsing a l g o r i t h m ... 123
6.3.4 A m b ig u ity ... 128
6.3.5 Long distance d e p e n d e n c i e s ... 128
6.3.6 S tatistics and p e rfo rm a n c e ...129
6.3.7 O pen q u e s t i o n s ... 130
6.4 S u m m a r y ...132
7 T h e DLT M T sy ste m 134 7.1 O v erv iew ... 134
7.2 D ependency gram m ar in D L T ... 137
7.3 An ATN for parsing d e p e n d e n c ie s ...140
7.4 A probabilistic dependency p a r s e r ...143
7.5 S u m m a r y ...149
8 L exicase parsers 151 8.1 O v erv iew ... 151
8.2 Lexicase t h e o r y ...152
8.2.1 D ependency in L e x i c a s e ... 153
8.2.2 Lexical entries in L e x ic a s e ... 159
8.3 Lexicase p a r s in g ...164
8.3.2 Lindsey’s p a r s e r ... 170
8.4 Sum m ary ... 172
9 W ord G ram m ar parsers 174 9.1 O v erv iew ... 174
9.2 W ord G ram m ar t h e o r y ... 175
9.2.1 Facts ab o u t words ... 175
9.2.2 G eneralizations ab ou t w o r d s ... 181
9.2.3 A single-predicate s y s t e m ...186
9.2.4 S y n tax in W G ... 187
9.2.5 Sem antics in W ord G r a m m a r ... 191
9.3 W ord G ram m ar parsing ...193
9.3.1 F raser’s p a r s e r ... 194
9.3.2 H udson’s p a r s e r ...208
9.4 Sum m ary ... 215
10 C o v in g to n ’s parser 217 10.1 O v erv iew ... 217
10.2 E arly dependency g r a m m a r ia n s ... 217
10.3 U nification-based dependency g r a m m a r ... 218
10.4 C ovington’s p a r s e r ...220
10.5 S u m m a r y ...228
11 T h e CSELT la ttic e parser 230 11.1 O v erv iew ... 230
11.2 T h e problem: lattice p a r s in g ... 231
11.3 T h e solution: th e SYNAPSIS parser ...235
11.3.1 Overview of SYNAPSIS ... 235
11.3.2 D ependency gram m ar ...238
11.3.3 Casefram es ... 240
11.3.5 T h e sequential p a r s e r ... 243
11.3.6 T h e parallel p a r s e r ...249
11.4 S u m m a r y ...251
12 E lem en ts o f a ta x o n o m y o f d ep en d en cy parsing 254 12.1 Search origin ... 254
12.1.1 B ottom -up dependency p a r s i n g ... 256
12.1.2 Top-down dependency p a r s i n g ...261
12.1.3 Mixed top-dow n and b o tto m -u p dependency parsing . . 269
12.2 Search m anner ...271
12.3 Search o r d e r ...272
12.4 N um ber of p a s s e s ... 275
12.5 Search f o c u s ...276
12.5.1 N etwork n a v ig a tio n ... 277
12.5.2 P a ir s e l e c t i o n ... 277
12.5.3 Heads seek d e p e n d e n ts ...278
12.5.4 D ependents seek h e a d s ...278
12.5.5 Heads seek dependents ordependents seek h e a d s ... 279
12.5.6 Heads seek dependents an d d ependents seek heads . . . . 279
12.5.7 Heads seek dependents th en d ependents seek heads . . . . 279
12.5.8 D ependents seek heads th e n h eads seek dependents . . . . 281
12.6 A m biguity m anagem ent ...281
12.7 A djacency as a constraint on s e a r c h ...288
12.8 S u m m a r y ... 289
L ist o f F ig u res
2.1 stem m a for S m a rt people dislike stupid ro b o ts... 33
2.2 tree diagram (D -m arker) for Sm art people dislike stupid robots . 33 2.3 arc diagram for S m art people dislike stupid r o b o t s...34
2.4 dependency tree for * S m a r t people stupid dislike r o b o ts...35
2.5 arc diagram for *Smart people stupid dislike ro b o ts...35
2.6 D ependency stru c tu re of Old sailors tell tall t a l e s... 36
2.7 F irst phrase stru c tu re analysis of They are racing horces . . . . 39
2.8 Second phrase stru c tu re analysis of They are racing horces . . . 39
2.9 D ependency stru c tu re for They are racing horses. T h e sentence root is racing...40
2.10 syntactic stru c tu re in DG (a) and in H PSG (b) 58
3.1 dependency-based NLP p r o j e c t s ... 68
5.1 stem m a showing a simple dependency s t r u c t u r e ...92
5.2 Hellwig’s W EST for Flying planes can be d a n g e r o u s... 104
6.1 a functional dependency s t r u c t u r e ... 110
6.2 left and right context s t a c k s ...112
6.3 a D PL definition of S u b j e c t ...115
6.4 th e general form of functional s c h e m a ta ... 117
6.5 a schem a for Finnish tran sitiv e v e r b s ...118
6.6 th e binary relation ‘S u b ject’ ...118
6.7 th e ‘S y n C a t’ c a te g o ry ... 119
6.9 th e Kielikone parser control strateg y a u t o m a t o n ... 126
7.1 th e D istributed Language T ranslation system ... 137
7.2 dependency analysis of th e sentence Whom did you say it was given t o ?...139
7.3 th e use of com ma in coordinate stru c tu re a n a ly s e s ...140
7.4 an ATN for parsing D anish s e n te n c e s ... 142
7.5 an ATN for parsing D anish s u b je c ts ...143
7.6 a dependency link netw ork for th e sentence You can remove the docum ent fro m the d r a w e r...148
8.1 a syn tactic stru c tu re w ith em pty nodes 155 8.2 a sy n tactic stru c tu re w ithout em pty n o d e s ... 155
8.3 a sy n tactic stru c tu re constrained by th e one-bar co n strain t . . .1 5 8 8.4 a Lexicase syntactic s t r u c t u r e ... 159
8.5 com ponents of S taro sta & N om ura’s Lexicase p a r s e r ... 164
8.6 a m aster en try showing th e intersection of th e featu re sets of two hom ographie w o rd s ...171
9.1 dependency stru c tu re of Ollie obeyed R o n n i e... 177
9.2 p a rt of th e W G ontological h i e r a r c h y ... 181
9.3 p a rt of th e W G word ty p e h ie ra rc h y ...182
9.4 p a rt of th e W G gram m atical relation h ie ra rc h y ...184
9.5 a W G dependency a n a ly s is ... 187
9.6 th e use of constituency in W G ... 188
9.7 a stru c tu re p erm itted by W G ’s version of a d ja c e n c y ...189
9.8 th e use of visitor links to bind an extracted elem ent to th e m ain v e r b ... 189
9.10 th e use of visitor links to in terp ret th e object of an em bedded
s e n t e n c e ...191
9.11 sem antic stru c tu re is very sim ilar to syntactic stru c tu re in W G . 192 9.12 a prohibited dependency s tr u c tu r e ... 203
9.13 with a telescope depends on s a w...215
9.14 with a telescope depends on the m an ...215
11.1 a simple lattice for th e u tte re d words I k n o w...232
11.2 a SYNAPSIS case fra m e...241
11.3 a SYNAPSIS dependency r u l e ... 242
11.4 an o th er SYNAPSIS c a s e f r a m e ... 242
11.5 a SYNAPSIS knowledge s o u r c e ...242
11.6 a simplified DI showing jolly s l o t s ... 249
11.7 a single parse t r e e ...250
11.8 a d istrib u ted representation of th e sam e parse t r e e ...251
12.1 PSG and DG analyses of th e sentence Tall people sleep in long beds ...258
12.2 phrase stru c tu re of A cat sleeps on the c o m p u te r... 263
L ist o f T ables
2.1 Subtrees in Figure 2.6 37
2.2 C om plete subtrees in Figure 2 . 6 ...37
2.3 C om plete subtree labels in Figure 2 . 6 ... 38
2.4 Subtrees and com plete subtrees in th e DG analysis of th e sen tence They are racing horses shown in Figure 2.9. O nly com plete subtrees are labelled... 40
2.5 C o n stitu en ts in th e phrase stru c tu re analysis of th e sentence They are racing horses shown in Figure 2 . 7 ...41
4.1 m ain features of H ays’ bottom -up dependency p a r s e r ... 88
4.2 m ain features of H ays’ top-dow n dependency p a r s e r ... 89
5.1 m ain features of Hellwig’s dependency p a r s e r ... 106
6.1 m ain features of th e Kielikone dependency p a r s e r ... 133
7.1 different dependency links retrieved from th e B K B ... 146
7.2 m ain features of th e DLT ATN dependency p a r s e r ... 150
7.3 m ain features of th e DLT probabilistic dependency parser . . . .1 5 0 8.1 m ain features of S taro sta and N om ura’s Lexicase p a r s e r ...173
8.2 m ain features of Lindsey’s Lexicase p a r s e r ... 173
9.1 inheriting properties for w l ...185
9.2 m ain features of F raser’s W ord G ram m ar p a r s e r ... 216
10.1 m ain features of C ovington’s first two dependency parsers . . . . 229
11.1 m ain features of th e SYNAPSIS dependency p a r s e r ... 253
12.1 origin of search—s u m m a r y ... 255
12.2 m anner of search— s u m m a r y ... 272
12.3 order of search—s u m m a r y ... 273
12.4 num ber of passes— s u m m a r y ... 276
12.5 focus of search— s u m m a r y ... 277
A ck n o w led g em en ts
This thesis m ay b ear one nam e on its title page b u t it represents an in
vestm ent of tim e and effort, of wise advice and honest criticism , of practical
su p p o rt and unfailing love on th e p a rt of m any people. I am grateful to th em
all.
F irst m ention m ust go to Dick Hudson, who has been so m uch m ore th a n
ju st a thesis supervisor. O ver th e years he has selflessly given me his tim e,
enthusiasm and insight. He has listened p atien tly to all of m y hair-brained
ideas and helped me to have fewer of them . M y heartfelt thanks go to him
and to his family, Gay, Lucy and Alice, who have never failed to respond
positively to my all too frequent disruptions of their dom estic hves.
I am very grateful to Neil Sm ith and all mem bers of th e D e p artm en t of
Phonetics and Linguistics a t U niversity College London for su p p o rtin g m e so
well during m y tim e in th eir m idst. Special thanks are due to M ark Huckvale,
M onika P ounder, and a num ber of members of th e W ord G ram m ar sem inar,
including Billy C lark, Jo h n Fletcher, and And R osta. I have also benefited
enorm ously from th e su p p o rt and encouragem ent I have received as a m em ber
of th e Social and C om puter Sciences R esearch G roup a t the U niversity of S u r
rey. I am grateful to all members of th e group, and especially to Nigel G ilbert
for enabling me to fit thesis-w riting into a hectic research schedule, and to
Scott M cG lashan for his expert assistance w ith th e HTjgX/ ty p esettin g pack
age. I have gained m uch from discussions w ith other people a t th e U niversity
of Surrey, p articu larly G rev C o rb e tt an d R on K n o tt.
T h e finishing touches were added while I was a m em ber of th e Speech
and Language Division of Logica C am bridge Ltd. I am grateful to Jerem y
Peckham for his persistent belief in th e value of NLP research and for his
practical su p p o rt, and to Nick Youd, Simon T h o rn to n , Trevor T hom as and
A significant portion of this thesis is devoted to dissecting other people’s
dependency parsers. I would not have been able to do so w ithout th e help of
those individuals who m ade otherw ise unobtainable inform ation available to
me. M any of th em have read drafts of p arts of th e thesis, and th eir com m ents
have been invaluable. T hey include Doug Arnold, P aulo Baggia, M ichael Cov
ington, P e te r Hell wig, G erhard N iederm air, C laudio R ullent, K laus Schubert,
Stan S taro sta and Job van Zuijlen.
I have lost track of the num ber of friends and relations who have helped
me by providing practical su p p o rt, by telling me to get on w ith it, and by
m aking me laugh. T h e generous gift of Ian and M air B unting, who provided
th e perfect re tre a t in which to work w ithout fear of in terru p tio n , has hastened
th e com pletion of this thesis by an enorm ous am ount. Likewise, th e p ractical
su p p o rt of Jim and Rilla C annon, whose h o sp itality knows no bounds. My
family have provided th e sort of long-distance su p p o rt which always feels close
a t hand.
M ost of all, I w ant to th a n k Sarah for p u ttin g up w ith m y n o ctu rn al w riting
habits, for believing th a t I really would finish this thing, and for being my
friend.
A b b rev ia tio n s
A PSG augm ented phrase stru c tu re gram m ar ATN augm ented tran sitio n network
B F P b est fit principle
COG com binatory categorial gram m ar CD conceptual dependency
C F PS G context-free phrase stru c tu re gram m ar CG categorial gram m ar
CN F C hom sky norm al form DCG definite clause gram m ar
DDG d au g h ter dependency gram m ar DG dependency gram m ar
DUG dependency unification gram m ar FU G functional unification gram m ar GB governm ent-binding theory
G PS G generalized phrase stru c tu re gram m ar H PSG head-driven phrase stru c tu re gram m ar ID im m ediate dom inance
LFG lexical-functional gram m ar LP linear precedence
M T m achine translatio n
N LP n a tu ra l language processing PSG phrase stru c tu re gram m ar TAG tree-adjoining gram m ar
C h a p ter 1
In tr o d u c tio n
T h e in tu itive appeals of th e two theories cannot b e discussed, since intuitions are personal and irrational. (Hays 1964: 522)
1.1
S c o p e o f t h e th e s is
T here are, in contem porary linguistic theory, two different views of gram m at
ical relations. T h e first of these sees relations of g ram m atical dependency as
basic: sy n tactic stru ctu res are essentially networks of gram m atically related
entities. T h e second view denies gram m atical relations basic s ta tu s, instead
seeing th em as being derived from more fundam ental stru ctu res, such as con
stitu e n t structures. This la tte r view has predom inated th ro u g h o u t m ost of
this century, first in I m m e d ia te C o n s t i t u e n t (IC) analysis (Bloomfield 1914,
1933), an d later, from th e mid-1950s onw ards, in P h r a s e S t r u c t u r e G r a m
m a r (PSG ) (C hom sky 1957).
T h e dom ination of constituency-based approaches has n o t been lim ited
to theoretical linguistics. In com putational linguistics also, th e overwhelm
ing m a jo rity of proposals which posit a d istin ct sy n tactic layer assum e th a t
th a t layer is based on co n stitu en t stru c tu re ra th e r th a n dependency stru ctu re.
This asym m etry can not legitim ately be a ttrib u te d to an y established results
showing th e superiority of one system over th e o th er in respect of descriptive
adequacy, or any oth er sub stan tive function: no such results exist. However,
gram m atical dependency is alm ost as old as th e stu d y of gram m ar, it has, for
m ost of its existence rem ained ju st th a t: a notion.
T h e first rigorous form alization of a dependency gram m ar (DG) cam e ju st
over th irty years ago (see G aifm an 1965), a few years after th e first form aliza
tion of th e class of PSG s (C hom sky 1956). By th e tim e th e form al definition of
a DG was published in a wide circulation journal, th e corresponding definitions
of PSG had been in th e public dom ain for a decade, w ith large in tern atio n al
program m es of research in form al language theory and theoretical linguistics
building on a PSG foundation. DG as an expHcitly articu lated system thus
entered an aren a in which PSG was already well-established. Given th a t th e
earliest published form al accounts of DG established its equivalence (weak and
strong) w ith context-free PSG (C FPSG )^, there was little incentive to a b an
don th e now fam iliar and w ell-understood form alism in favour of th e unfam iliar
an d com paratively less-well understood formalism.
A rem arkable situ atio n now obtains. Form al work in DG is virtually frozen
in th e s ta te it was in around th e mid-1960s, w ith only a handful of groups
around th e world m aking any (m odest) advances since then (hardly any of
w hich has ever been published in English). In co n trast, a much larger —
th o u gh still m odest by PSG stan d ard s — num ber of theoretical linguists con
tinues to assum e some version of DG as th e foundation of sy n tactic stru c
tu re. U nfortunately, alm ost all linguistic theories based on DG have d ep arted
to some ex ten t from th e te rra firm a of form al definition.^ Since th e choice
of DG as basic is a m inority preference, those m aking th e choice have gone
to some lengths to argue th e case for DG ra th e r th a n PSG (for exam ple,
H udson 1984: 92-8, forthcom ing; S taro sta 1988: 35-6). T he opposite is gen
erally not found: proponents of theories based on PSG do no t typically su pp o rt
th e choice of PSG w ith argum ents for th e superiority of PSG over DG (b u t
^Given a definition of equivalence to be described in Chapter 2 below.
see th e d e b ate in H udson 1980a; D ahl 1980; H udson 1980b; H ie taran ta 1981;
and H udson 1981b for some responses to argum ents against PSG ).
T h e principal arg u m ent offered by proponents of DG is th a t PSG ap
proaches introduce a red u n d an t layer of structure. Lexical-Functional G ram
m ar (L FG ) offers a p articu larly clear illustration of this, w ith its c-stru ctu re
(c o n stitu en t stru ctu re) and sep arate f-stru ctu re (functional stru c tu re ), th e la t
te r being constructed by reference to th e form er (K aplan an d B resnan 1982).
In a DG approach a single stru c tu re suffices. T h e position ad o p ted by m any
advocates of PSG is th a t it is unnecessary, not to say im possible, to argue
against m oving targ ets such as the underform alized versions of DG on offer.
T his is to present th e issues as being neatly polarized. In fact, m ost lin
guists nowadays w ork w ith hybrid system s which express b o th dependency
and constituency in a single stru c tu re , albeit one which owes m ore to th e
PSG tra d itio n th a n to th e DG trad itio n . T h e m ost w idespread exam ple is
X gram m ar (originally proposed by H arris 1951) which augm ents a C FPS G
by distinguishing one elem ent in each co n stitu en t as th e h e a d of th a t con
stitu e n t. However, th e re are com plications here since a num ber of sy n tactic
theories have been charged w ith uncritically ad o p tin g unform alized versions of
X theory (P u llu m 1985; K ornai and P u llu m 1990) — th e very charge laid a t
th e door of certain DG theories!
T h e general p au city of form al results concerning DG carries over from
theoretical to co m p u tatio n al linguistics. Here DG is scarcely m entioned, far
less argued against. In th e sm all num ber of cases in which it achieves passing
m ention, th e sam e reasons for not using DG are employed: first, th e only
existing form al results show th e equivalence of DG and C F P S G so th e re is
no incentive to work w ith th e less fam iliar system ; second, alm ost n o th in g
else is known form ally ab o u t DG so u n til such tim e as ad d itio n al solid results
becom e available th ere is no incentive to invest effort in try in g to w ork w ith in
Let us consider these points in turn . F irst, then, th e equivalence of DG
and C FPSG . In th eir m onograph Linguistics and Inform ation Science Sparck
Jones and K ay provide a brief introduction to DG and th en furnish an account
for w hy DG is not m entioned again;
We have p u t phrase stru c tu re and dependency to g eth er in th e sam e
class because it is easy to show th a t th e differences betw een th e m
are triv ial from alm ost every point of view (see G aifm an 1965).
It is also possible to w rite gram m atical rules in a suitable no
ta tio n which describes a single language and which assigns to
each sentence of th a t language bo th p h rase-stru ctu re and d ep en
dency trees (see K ay 1965; R obinson 1967). In this p ap er we shall
m ake no fu rth er references to dependency gram m ar, intending w hat
we say a b o u t p h rase-stru ctu re gram m ar to b e und ersto o d as a p
plying also to dependency w ith occasional m inor m odifications”
(Sparck Jones and Kay 1973: 83-4).
Sparck Jones and K ay’s observation th a t it is possible to devise a m eta
form alism which includes b o th dependency and constituency inform ation is
useful from a descriptive point of view. However, th e point it misses is th a t th e
equivalence of th e formalisms or th e possibility of devising a m eta-form alism
leaves open th e question of w hether phrase s tru c tu re parsing and dependency
parsing can be achieved by m eans of identical algorithm s. This is a question
which has hardly ever been raised in th e literatu re. H ays’ claim th a t “a phrase-
stru c tu re parser can be converted in to a dependency parser w ith only a m inor
a lte ra tio n ” (Hays 1966b: 79) is presented w ithout argu m en t or illu stratio n so
its sta tu s is, a t best, uncertain. A sem inal tex t in com puter science bears th e
title Algorithm s + Data Structures = Programs (W irth 1975). It is well u n d er
sto o d th a t a change in d a ta stru c tu re m ay necessitate a change in algorithm
if th e n et effects of th e program are to rem ain co n stan t. “T h e developm ent of
tu re ” (G oldschlager and Lister 1982: 65). T hus it cannot be taken for granted
a priori th a t fam iliar phrase stru ctu re parsing algorithm s will m ap effortlessly
in to th e dependency parsing dom ain.
T h e second criticism of DG in com putational linguistics is th a t where DG
has been employed, for exam ple in parsing system s, th e resulting systems
have not been constructed on a principled or even well-defined foundation.
W inograd writes:
T h e form al theory of dependency gram m ar has em phasized ways
of describing stru ctu res ra th e r th a n how th e system ’s p erm anent
knowledge is stru ctu red or how a sentence is processed. It does not
address in a system atic way th e problem of finding th e correct de
pendency stru c tu re for a given sequence of words. In system s th a t
use dependency as a way of characterizing stru ctu re, th e parsing
process is generally of an ad hoc n a tu re (W inograd 1983: 75).
O nce again, this claim is presented w ithout fu rth er arg u m ent or evidence.
T h e absence of em pirical d a ta which characterizes these claims is not as
surprising as it m ight first seem when it is u nderstood th a t th e num ber of
dependency parsing systems in existence is severely lim ited in com parison w ith
th e num ber of phrase stru c tu re parsing system s. It is also th e case th a t those
descriptions of dependency parsing system s which have app eared in p rin t have,
on th e whole, been published in relatively obscure sources or have only been
circulated privately. Some accounts have been terse to th e p o in t of leaving m ost
of th e d etail unreported. No survey or com parative account of dependency
parsers is currently in existence.
One of th e chief objectives of this thesis is to fill this gap in th e literatu re
by presenting an extensive survey of existing dependency parsing system s, th e
first such survey to be prepared.
T h e availability of this survey m aterial presents a unique o p p o rtu n ity to
in dependency parsing com pare w ith those which are widely used and well-
understood in phrase stru c tu re parsing. This stu d y focuses on two hypotheses:
H y p o th e sis 1
It is possible to construct a fully functional dependency parser
based directly on an established phrase stru c tu re parsing algorithm
w ithout altering any fundam ental aspects of th e algorithm .
T his hypothesis is a strong version of H ays’ (1966b: 79) claim. It is m otivated
by G aifm an’s definition of strong equivalence betw een DG and PSG which
g uarantees some m easure of stru c tu ra l correspondence a t each po in t in th e DG
and PSG parse trees (see C h ap ter 2 below). However, it is not th e strongest
possible hypothesis, since it stops short of predicting th a t a dependency parser
can b e co n stru cted based on any phrase stru c tu re parsing algorithm .
H y p o th e sis 2
It is possible to construct a fully functional dependency parser using
an alg o rith m which could not be used w ithout su b sta n tia l m odifi
cation in a fully functional conventional phrase stru c tu re parser.
This hypothesis is m otivated by an appreciation of th e particu lar way in which
DG rules encode inform ation, as com pared w ith th e way in which PSG rules
encode inform ation.
As I have previously noted, m ost linguistically m otivated DGs have p ro
ceeded beyond th e lim its of w hat has been defined in a m ath em atically rigor
ous way. It is impossible to un d ertak e a survey of dependency parsing system s
w ith o u t encountering some of these devices of unknow n form al power. W hile
noting in passing these extensions where relevant, I shall co n cen trate my a n al
ysis on th e parsing of th e context free backbone of these theories (i.e. th a t
which can be m apped onto a G aifm an gram m ar). I shall no t be concerned
in this thesis to m ake an y q u alitativ e judgem ents between DG and PSG qua
1.2
C h a p te r o u tlin e
W h at follows divides conceptually into three parts,
1. C h ap ter 2 introduces dependency gram m ar. It presents a form al account
of DG and outlines th e equivalence relation used to com pare DG w ith
PSG . T he developm ent of DG from its origins in th e classical world
th ro u g h to th e present day are charted in th e la tte r p a rt of th e chapter.
2. C hapters 3 to 11 present th e m ost detailed review and critique of de
pendency parsers yet assembled. C h ap ter 3 describes th e grow th of th e
use of DG in com p utation al systems for n a tu ra l language processing.
C hapters 4 to 11 are each devoted to th e description and evaluation of
a different dependency parser or closely related fam ily of dependency
parsers. T h e chapters are arranged in ap p ro xim ate chronological order;
th e oldest parser is presented first and th e m ost recent parser is presented
last. Needless to say, th e developm ent phases of some parsers overlapped
so th e ordering of chapters m ust be regarded as no m ore th a n a rough
guide to th e relative age of th e system s reported therein.
3. Finally, draw ing heavily on the preceding analyses of existing dep en
dency parsers. C h ap ter 12 sets out some elem ents of a first taxonom y of
dependency parsing, defines some technical vocabulary for th e field and
specifies th e range of relevant variables. T h e two hypotheses s ta te d above
are exam ined in C h ap ter 13 in light of th e survey of existing dependency
C h a p ter 2
D e p e n d e n c y gram m ar
“It all depends.” C.E.M . Joad,
BBC R adio ‘Brains T ru st’, 1942-1948
2.1
O v e r v ie w
Before proceeding w ith a survey of parsing system s based on DG it is necessary
to b e clear ab o u t exactly w hat a DG is. One of th e dangers when working
w ith a notion like gram m atical dependency is th a t it can come to m ean all
things to all people. T h e purpose of this ch apter is therefore to furnish an
unam biguous definition of DG, to introduce some terminology, and to review
w here system s approxim ating to this definition of DG have been employed in
theoretical linguistics.
Section 2.2 introduces G aifm an gram m ars, th e only version of DG to be
defined w ith full m ath em atical rigour. Accordingly, these system s are tak en as
a stab le reference point in this thesis. T h e formal properties of G aifm an gram
m ars a re defined, together w ith a decision procedure for determ ining w hether
or n o t a given strin g is accepted or rejected by an a rb itra ry G aifm an gram m ar.
A ltern ativ e conventions for p o rtrayin g dependency stru ctu res diagram m ati-
cally are introduced. A lthough th ere is insufficient space here to reproduce
PSG , th e equivalence relation employed is described an d scrutinized.
In practice, very few — if any — linguists have used G aifm an’s system
in th e description of n a tu ra l language w ithout making use of various aug
m entations of unknow n formal power. These augm entations are flagged in
Section 2.3. Those which m ust necessarily be exam ined in th e course of the
survey of dependency parsing system s are described in greater d etail in later
chapters. Section 2.4 charts th e origins and developm ent of DG in linguistic
theory.
In Section 2.5, th ree gram m atical formalisms bearing some sim ilarities to
DG are identified, nam ely Case G ram m ar, C ategorial G ram m ar, and Head-
D riven P h rase S tru ctu re G ram m ar. A lthough a full description of these fram e
works is no t ap p ro p riate here, th eir basic concepts are introduced and some
reasons for excluding th em from this stu d y are provided.
2 .2
G a ifm a n g r a m m a r s
2 .2 .1
D e f in it io n s
T h e first form al definition of DG was offered by H aim G aifm an (1965). In this
section, I present his definition along w ith illustrative examples.^
D e f i n i t i o n
A d e p e n d e n c y g r a m m a r A is a 5-tuple
A = (T,C,X,7e,^)
where
1. T is a finite set of word symbols, i.e. th e term inal sym bols. For th e p u r
poses of exposition, th e letters u, v, w, x, y, z, w ith or w ith o u t subscripts,
will denote members of this set.
2. C is a finite set of category symbols. For th e purposes of exposition,
th e letters U, V, W, X , Y, Z, w ith or w ithout subscripts, will d enote
m em bers of this set.
3. ^ is a set of assignm ent rules, whose elem ents are all m em bers of T x C.
Every word belongs to a t least one category and every category m ust
have a t least one word assigned to it. A word m ay be assigned to m ore
th a n one category.
4. 7^ is a set of rules which give for each category th e set of categories
which m ay derive directly from it w ith th eir relative positions. For each
category X , th ere is a finite num ber of rules of th e form
(w here Yi to Yn are m em bers of C) indicating th a t Yi • • • m ay de
pend on X in th e order given, where m arks th e position of X in
th e sequence. A rule of th e form X{ * ) allows X to occur w ith o u t any
dependents.
5. ^ is a subset of C whose members are those categories which m ay govern
a sentence, i.e. th e s ta r t symbols.
Ex a m p l e
A i is an exam ple of a dependency gram m ar, where A i = ({people, robots,
dislike, sm art, stupid} , {N, V, A ) , {(people, N), (robots, N), (dislike, V),
(sm art, A), (stu p id. A) } , {N(*), N (A ,*), V (N ,*,N ), A(*) }, {V} ).
C o n v e n t i o n
By convention, th e fact th a t some % is a m em ber of Q m ay be indicated
Following this convention, G of A i may be represented as *(V).
C o n v e n t i o n
By convention, A m ay be represented as follows: for each d istin ct category
X m C create a correspondence of th e form X : L where L is th e set of all
words X such th a t ( t , X ) is in A.
T h u s, A of A i m ay be represented as {N:{people, robots}, V:{dislike} ,
A :{sm art, stupid}}.
C o n v e n t i o n
To im prove readability, a gram m ar of ty p e A m ay be represented by w riting
each m em ber of ^ on a line by itself, followed by each m em ber of 7?. on a line by
itself, followed by each m em ber of ^4 on a line by itself. T and C are im plicitly
defined in A .
T hus, A i m ay be represented as follows:
*(V) N(*) N (A ,*) V (N ,*,N ) A(*)
N :{people, robots} V:{dishke}
A :{sm art, stupid}
T h e next definition elucidates th e relationship betw een sentences of a lan
guage A and th e gram m ar of type A which generates A.
In this definition it is necessary to m ake reference to occurrences of words
or categories in a sequence. An occurrence is an ordered pair (a:,z), w here x is
th e word or category and i is th e position num ber of x in th e sequence. P , Q
and P , w ith or w ith o u t subscripts denote occurrences of words or categories.
said to be of category X .
D E F I N I T I O N
A s e n te n c e XiX2 • • • is analyzed by a gram m ar of type A iff th e following
are true:
1. A sequence of categories X1X2 ' ' ’ Xm can be form ed such th a t Xi is of
category A i for 1 < i < m .
2. A 2-place relation d can be established betw een pairs of words in X1X2 • - • Xj
P dQ signifies th e fact th a t P depends on Q, i.e. th e relation d holds be
tween P and Q.
For every d we define an o th er relation d* where Pd*Q iff there is a se
quence Pq, Pi " ■ Pn such th a t Pq = P^ P^ = Q and PidPi^i for every
0 < 2 < n — 1.
T he relation d is constrained in th e following ways:
(a) For no P , Pd*P.
(b) For every P , there is a t m ost one Q such th a t P dQ .
(c) If Pd*Q and R is betw een P and Q in sequence (i.e. either S { P ) <
6"(P) < 5'(Q) or 5"(P) > 5 '(P ) > S (Q )), th en Pd-'Q .
(d) T h e whole set of occurrences is connected by th e relation d.
3. If P is an occurrence of Xj and if th e occurrences th a t depend on
it are P i , P2 -- - Pn, also, if Ph is an occurrence of w here h =
1 ’ "71, and th e order in which these words occur in th e sentence is
5 5 ‘ 5 " " ’ 1 ) th en A j(A jj • • • X{^ * " ' ' ^ i n) is a
rule of R. In th e case th a t no occurrence depends on P, A j(* ) is a rule
4. T h e occurrence which governs th e sentence (i.e. which depends on no
oth er occurrence) is an occurrence of a word whose category is a m em ber
O Î Ç .
T h e stru c tu re corresponding to a sentence of a language generated by a
gram m ar of type A is called a dependency tree.
D e f i n i t i o n
A d e p e n d e n c y t r e e for a sentence Xi - - • Xn consists of th e strin g of c ate
gories X i ’ ' ' X nt together w ith th e relation d.
D E F I N I T I O N
A la n g u a g e is weakly generated by a dependency gram m ar iff for every
sentence in th a t language th ere is a corresponding dependency tree and no
dependency tree exists for a sequence of words which is no t a sentence. A lan
guage is strongly generated by a dependency gram m ar iff it is weakly generated
by th a t dependency gram m ar and, for every syntactically correct in te rp re ta
tion, and only for these, th ere are corresponding dependency trees.
T h e above definitions can be sum m arized inform ally as follows. In the
stru c tu re corresponding to a sentence of a language generated by a dependency
gram m ar of type A:
1. one and only one occurrence is independent (i.e. does n o t depend on any
other);
2. all o th er occurrences depend on some elem ent;
3. no occurrence depends on m ore th a n one other; and
4. if A depends directly on B and some occurrence C intervenes betw een
th em (in linear order of string), th en C depends d irectly on A or on B
To aid discussion, I shall ad o p t th e following terminology. All occurrences of
words in a sentence shall be called w o rd s . W here th e intention is to refer
to words in th e lexicon, this will be stated explicitly. T h e single independent
word in a sequence (i.e. th e word which depends on no other) shall be called
th e r o o t. O ne word W i is said to be a s u b o r d i n a t e of an o th er word W2
if W i depends on W2 or on an o th er subordinate of W2, i.e. W i depends di
rectly or indirectly on W2. T h e word on which another word depends shall be
called its h e a d . T h e requirem ent th a t a head-dependent pair eith er be next to
each oth er or sep arated by direct or indirect dependents of them selves (point
4 above) is known as th e a d ja c e n c y c o n s tr a in t .
Ex a m p l e
Given these definitions, th e sentences in (1) belong to th e language defined
by A i, whereas th e sequences in (2) are outside of th a t language. (By conven
tion, sequences which are not well-formed in respect of a p artic u la r gram m ar
are prefixed by ‘*’).
(1) a People dislike robots.
b S tupid people dislike sm art robots,
c S m art robots dislike people,
d People dishke sm art people.
(2) a * Sm art people dislike,
b *Stupid dislike robots,
c *Stupid robots.
d * R obots people dislike,
e * R obots sm art dislike people.
E xam ple (2a) is ill-formed because dislike is a V, and Vs require two dep en
dents, one preceding and one following. In this case, no following d ependent is
present. Exam ple (2b) is ill-formed because all of th e words are not connected
to g eth er by dependency. T h e sequence is divided into two parts: stupid (which
category N for dislike). None of th e words in (2c) is missing a dependent. How
ever, th e in dependent word robots is of category N, b u t only words of category
V m ay govern a sentence. In exam ple (2d), none of th e words is missing a de
p en d en t and th e independent elem ent dislike belongs to th e required category
V. However, th e dependents of V are required to occur one on either side of
V, whereas here th ey b o th occur before it. Exam ple (2e) is ill-formed because
of th e in ap p ro p riate position of smart. E ith er it is a dependent of robots, in
w hich case it should precede th a t word, or it is a dependent of people. If it is
a d ep en d en t of people th en it precedes it as it ought, b u t sm art an d people are
sep arated b y th e word dislike, which is dependent on neither.
I shall henceforth refer to dependency gram m ars of ty p e A as G a if m a n
G r a m m a r s .
2*2.2
A r e c o g n iz e r for G a ifm a n g r a m m a r s
So far, I have characterized G aifm an gram m ars in term s of co n strain ts on th e
well-formedness of gram m ar rules and dependency stru ctu res. In this section
I describe a decision procedure — a r e c o g n iz e r — which accepts all an d only
th e well-formed strings of th e language described by a G aifm an gram m ar. T h e
recognizer is based on one described by Hays (1964: 516-17).
T h e principal d a ta stru c tu re used by th e recognizer is a table. To d eterm in e
w h eth er or not a strin g is generated by a G aifm an gram m ar A proceed as
follows:
1. S ta rtin g from 1, and counting upw ards in units of 1, assign an in teg er to
each word in th e string, working from left to right. T h e in teg er assigned
to a word shall be known as th e position of th a t word. L et M a x equal
th e position of th e rightm ost word.
2. Set up a table, having M a x positions, num bered from 1 to M a x . A cell
[a,h] shall occupy all th e positions from Pa to Pb, where 1 < a < 6 <
3. For each word Wi in th e string retrieve all th e classes X i to assigned
to th a t word by assignm ent rules of th e form W : { X i , X ^ } in A. If
Pi is th e position of VFj, w rite Xi to X n in th e tab le a t cell
4. For each word class X a t cell [j, j] in th e tab le (1 < j < M a x ) determ ine
w hether a rule of th e form % (*) exists in A. If so, insert % (*) in th e
tab le a t cell
5. Let y be a variable. Set V = 2.
6. Consider each sequence of V adjacent cells in th e table. For each se
quence which consists of exactly one word class sym bol X and V -1 trees,
arranged in th e order
Fi )
X^ Yj
, . . . , Yy—1search in A j for a corresponding rule of th e form:
. . . , Zi^ " " I Zy— \ )
If th e ro o t of each tree Yn in th e tab le is identical to each dependent
Zn in th e gram m ar rule th en if Y\ is located a t cell and
Yy-i is located a t cell [FV-i,c/t, in sert a new tre e in th e tab le
occupying cell FV-i^ight]- T h e form of th e new tre e should be as
follows:
^ ( T i , F2, ..., F^, *, F j , ..., FV -i)
7. If y = M a x th en go to step 8, otherw ise increm ent V and go to step 6.
8. If a tree exists in th e tab le occupying cell [ l,M a x ] th e n succeed if th e
ro o t of th e tre e is of type X and a rule of th e form *(A') exists in A.
Hays presents his algorithm informally, so it has been necessary to recon
stru c t some of th e details in the above account.
A Prolog im plem entation of this recognition algorithm can be found in the
file h a y s _ r e c o g n iz e r .p l in A ppendix A.3.
Hays also outlines a generative procedure for en u m eratin g all the strings
generated by a G aifm an gram m ar (Hays 1964; 514-15). A Prolog im plem en
ta tio n of a reconstructed version of H ays’ procedure can be found in the file
h a y s _ g e n e r a t o r .p l in A ppendix A.3.
2 .2 .3
R e p r e s e n t in g d e p e n d e n c y s t r u c t u r e s
T h ere are a t least th ree conventions for presenting dependency stru ctu res di-
agram m atically: stemmas^ tree diagrams and arc diagrams.
T h e first representational scheme — due to Tesniere (1959) — presents
words as nodes in a graph which is known as a s t e m m a (see Figure 2.1,
for exam ple). Dependencies betw een word occurrences are signalled by links
betw een nodes. By convention, heads are located nearer th e to p of th e diagram
th a n th e ir dependents. T he first occurrence in a sentence is positioned fu rthest
to th e left in a diagram and th e n th occurrence appears to th e rig h t of th e
n - l t h occurrence and to th e left of th e n + l t h occurrence. For simplicity,
category labels are usually om itted from diagram s of all types.
A lthough stem m as contain th e ap p ro p riate am o u n t of inform ation, they
can som etim es prove to be difficult to read, especially w hen th e sentences
represented are long and involve a lot of altern atio n betw een left-pointing and
right-pointing dependencies.
In th e second typ e of diagram , exemplified in F igure 2.2, dependency is
represented by th e relative vertical position of nodes in a tree; if a line connects
a lower node to a higher node then th e sym bol corresponding to th e lower node
depends on th e one corresponding to th e higher node. I shall call diagram s of
this kind t r e e d ia g r a m s . T hey are also known as D - m a r k e r s .
dislike
robots
sm art stupid
F igure 2.1: stem m a for S m art people dislike stupid robots
sm art people dislike stu p id robots
' V '
S m art people dislike stupid robots
Figure 2.3: arc diagram for S m a rt people dislike stupid robots
m eans of directed arcs. I shall ad o p t th e convention of directing arcs from
heads to dependents, although (unfortunately) th ere is no generally accepted
convention and it is no t unusual to find examples in th e literatu re of arcs being
oppositely directed. I shall refer to diagram s of this kind as a r c d ia g ra m s .
F ig u re 2.3 is equivalent to Figures 2.1 and 2.2 in th e inform ation it expresses.
Some authors (such as M atthew s 1981) draw arc diagram s w ith th e arcs
below th e symbols in th e sentence ra th e r th a n above th em as shown here.
H udson som etim es divides th e arcs so th a t those having a designated func
tio n a p p ear below th e sentence symbols, whilst th e rest ap p ear above them
(H udson 1988b: 202; page 189 below).
T h e adjacency co n strain t is satisfied in th e sentence Sm a rt people dislike
stupid robots^ as can be seen in th e dependency stru c tu re variously represented
in Figures 2.1, 2.2 an d 2.3. T h e co nstrain t is violated in th e dependency
stru c tu re shown in Figure 2.4.
In Figure 2.4, S tu p id violates th e constraint, s tu p id is sep arated from its
head robots by d is lik e which depends on n eith er s tu p id nor robots^ neither
is it a su b o rd in ate of stu p id nor robot. In a tree diagram , th e d o tted line
w hich connects a w ord w ith its node is called its p r o je c tio n . N ote th a t in
F ig u re 2.2, links and projections do not intersect. Such tree diagram s and th eir
corresponding sy n tactic stru ctu res are said to be p r o je c tiv e . In Figure 2.4 a
link and a pro jectio n intersect a t precisely th e po in t where ill-formedness was
people stupid dislike robots sm art
Figure 2.4: dependency tree for * S m a rt people stupid dislike robots
S m art people stupid dislike robots
Figure 2.5: arc diagram for * Sm a rt people stupid dislike robots
are said to be n o n - p r o je c tiv e .
T h e vocabulary of projectivity is rooted in th e im agery of tree diagram s.
I shall henceforth m ake use of th e m ore n eu tral term s a d j a c e n t and n o n -
a d ja c e n t.
T h e arc diagram corresponding to Figure 2,4 is shown in Figure 2.5. N otice
th a t arcs never cross in arc diagram s of stru ctu res which satisfy th e adjacency
constraint, whereas arcs do cross where th e stru ctu res violate th e adjacency
constraint. (T h e only exception to this generalization is discussed below).
In general, I shall use arc diagram s to represent dependency structures;
when describing a p articu lar dependency system rep o rted in th e lite ra tu re I
Old sailors tell tall tales
F igure 2.6: D ependency stru c tu re of Old sailors tell tall tales
2 .2 .4
T h e g e n e r a t iv e c a p a c it y o f G a ifm a n g r a m m a r s
As well as providing a formally explicit definition of one class of DG, Gaifman
went on to investigate th e generative capacity of th e class. He did this by
com paring his DG w ith phrase stru c tu re gram m ar.
He concluded th a t for every DG there is a strongly equivalent CFPSG
and for a subclass of C FPSG s (in which every p h rase is a projection of a
lexical category) th e re is a strongly equivalent DG. His proof is too lengthy
to reproduce here; it can be found in G aifm an (1965). Definitions of strong
equivalence betw een th e two systems can be found in Hays (1961b) and in
G aifm an (1965: 320-25).
Let a s u b t r e e be a connected subset of a dependency tree. (T his is w hat
Pickering and B arry (1991) have recently called a ‘dependency c o n stitu en t’.)
Let a c o m p le te s u b t r e e consist of some elem ent of a tree, plus a ll other
elem ents directly or indirectly dependent on it. T h u s, th e dependency tree
in Figure 2.6 includes th e subtrees shown in Table 2.1. O f these, only those
shown in Table 2.2 are com plete subtrees.
A phrase stru c tu re and a dependency stru ctu re, b o th defined over th e sam e
string, c o r r e s p o n d r e la tio n a l ly if every c o n stitu en t is coextensive w ith a
sub tree and every com plete subtree is coextensive w ith a co n stitu en t. Two
stru c tu ra l entities are c o e x te n s iv e if they refer to exactly th e sam e elements
in a string.
Old
Old sailors Old sailors tell
Old sailors tell tall tales sailors
sailors tell tell
tell tall tales tell tales tall tall tales tales
Table 2.1: Subtrees in Figure 2.6
Old Old sailors
Old sailors tell ta ll tales tall
ta ll tales
L A B E L S U B T R E E
Old sailors tell tall tales
Old
Old sailors
Old sailors tell tall tales tall
tall tales
T able 2.3: C om plete subtree labels in Figure 2.6
which depends on no other word in th e sam e subtree. Labels for th e com plete
subtrees of th e dependency tree shown in Figure 2.6 are given in T able 2.3.
Let each p h rasal co n stitu en t in a PSG also have a label, w here th e label
is conventionally und ersto o d (for exam ple, th e label of a noun phrase is often
given as ‘N P ’, etc).^
In dependency theory, a strin g is said to d e r iv e f r o m th e label of th e
corresponding com plete subtree. In phrase stru c tu re theory, a strin g is said to
d e r iv e fr o m th e label of th e corresponding co n stitu en t. A label a c c o u n ts fo r
th e set of strings th a t derive from it. Tw o labels are s u b s t a n t i v e l y e q u iv a
l e n t if they account for th e sam e set of strings.
A phrase stru c tu re and a dependency stru c tu re c o r r e s p o n d if (i) th ey
correspond relationally and (ii) every com plete su b tree has a label which is
su b stan tiv ely equivalent to th e label of th e coextensive co n stitu en t.
A DG is s t r o n g l y e q u iv a le n t to a PSG if (i) th ey have th e sam e te r
m inal alp h ab et, an d (ii) for every strin g over th a t a lp h ab e t, every s tru c tu re
a ttrib u te d by eith er gram m ar corresponds to a stru c tu re a ttrib u te d by th e
other.
Let us consider, by way of exam ple, th e am biguous sentence (3), th e two
p h rase stru c tu re in terp retatio n s of which are shown in Figures 2.7 an d 2.8.
T h e linguistic plausibility of these analyses is no t an issue here.)
(3) T hey are racing horces.
p VP
A uxP V P
Aux
T hey are racing horses
Figure 2.7: F irst phrase stru c tu re analysis of They are racing horces
P P
V N P
A djP N
Adj
T h ey are racing horses
They are racing horses
Figure 2.9: D ependency stru c tu re for They are racing horses. T h e sentence root is racing.
L A B E L S U B T R E E
they they
are are
th ey racing they are racing
racing th ey are racing horses are racing
are racing horses racing
racing horses horses horses
Table 2.4: Subtrees and com plete subtrees in th e DG analysis of th e sentence
They are racing horses shown in Figure 2.9. O nly com plete subtrees are la belled.
Now consider th e dependency stru c tu re in Figure 2.9. This includes the
subtrees shown in T able 2.4.
T h e constituents in Figure 2.7 are shown in Table 2.5 (ignoring th e initial
category assignm ents).
Since every c o n stitu en t in Figure 2.7 is coextensive w ith a su b tree in Fig
ure 2.9 an d every com plete subtree in Figure 2.9 is coextensive w ith a con
stitu e n t, th e stru ctu res correspond relationally. Since it is also th e case th a t
every com plete su b tree has a label which is su bstantiv ely equivalent to th e
label of th e coextensive co n stitu en t, th e stru ctu res correspond. Close exam i
L A B E L CO N S T I T U E N T
NP
s
AuxP VP VP NP
they
they are racing horses are
are racing
are racing horses horses
T able 2.5: C o n stitu en ts in th e phrase stru c tu re analysis of th e sentence They are racing horses shown in Figure 2.7
However, only Figures 2.7 and 2.9 share substantively equivalent labellings so
only these stru ctu res can be said to correspond.
2 .3
B e y o n d G a ifm a n g r a m m a r s
In presenting his work on PSG s, C hom sky frequently and explicitly represented
th e m as a form alization of th e stru c tu ra list Im m ediate C o n stitu en t model (e.g.
C hom sky 1962). This claim has recently been contested by M anaster-R am er
and Kac (1990), thus highlighting some of th e difficulties inherent in trying to
formalize a pre-existing linguistic notion faithfully.
T h e issues are som ew hat clearer in th e case of D G , since G aifm an, as
a u th o r of th e form alization, makes no claims regarding its relation to any
existing notion other th a n th a t em bodied in a RAND C orporation m achine
tra n slatio n program . Hays, on th e o th er hand, represents G aifm an’s work as
being a form alization of th e hnguistic notion of dependency. For example,
following a discussion of th e different linguistic notions underlying IC theory
and dependency theory in his 1964 Language paper, his sum m ary of w hat is
to follow includes th e following statem en t:
Section 2 presents a form alism for th e theory, identifying th e com
ponents of any dependency gram m ar (Hays 1964: 512, my em pha
I have been unable to find any discussions anyw here in th e lite ra tu re which
investigate this assertion by reference to actu al linguistic theories which claim
to be based on some notion of dependency.
W h at is noticeable is th a t few of the self-proclaim ed dependency-based
theories of language have m ade use of G aifm an’s formalism . This contrasts
sharply w ith th e u p take of C hom sky’s PSG formalism, an d particularly C F
PSG . T h e only DGs which incorporate a m ore or less in tact version of G aifm an
gram m ar are those which use it as th e base com ponent in a transform ational
gram m ar (Hays 1964: 522-4; Robinson 1970) or as th e tran scrip tio n system
on one s tra tu m of a stratificational gram m ar (Hays 1964: 522-4). O therw ise,
a ltern ativ e quasi-form alism s are employed.
It is common to find versions of DG which m ake use of com plex feature
stru ctu res instead of or as well as word category labels, w ith dependency rules
being allowed to m anipulate features in a rb itra ry ways (e.g. S ta ro sta 1988;
Covington 1990b). Consider th e following illu strativ e exam ple of a dependency
rule for intransitive verbs which enforces subject-verb agreem ent (a d a p ted from
Covington 1990b: 234):
category : verb person : X n u m ber : Y
Here th e head is of syntactic category ‘verb ’, of person ‘X ’ and num ber ‘Y ’.
Its single dependent m ust be a preceding nom inative case noun, also of person
‘X ’ and num ber ‘Y ’. ‘X ’ and ‘Y ’ are variables over featu re values.
T his kind of augm entation could easily b e form alized as an extension to
G aifm an’s definition of DG. So long as th e feature stru c tu re s a re sim ply a r
rangem ents of symbols draw n from a finite set, th e generative pow er rem ains
unchanged. T h e proof is trivial: any arrangem ent of features m ay b e ‘frozen’
and tre a ted as though it were an atom ic symbol.^ T his is d irectly analogous to / ' category : noun \
person : X
num ber : Y 5 *
\ case : n o m in a tiv e /