For well-nested structures generally, the running time increases exponentially with the gap degree: well-nested structures with a gap degree bounded by a constant k can be parsed in timeO(n5+2k)(G´omez-Rodr´ıguez et al., 2011). Whenk = 1, this gives us the familiar O(n7)parsing time for well-nested structures with gap degree at most 1.
However, with the added restriction of no gap inheritance, the restriction to gap degree one is unnecessary. In this section, we show how to modify Algorithm 1 to find the max- imum scoring tree that has no gap inheritance, is well-nested, and can havearbitrary gap degree. This change has no effect on the running time of the algorithm: the maximum scoring tree in this class can still be found inO(n5).
Arabic Czech Danish Dutch Portuguese Swedish Parsing
Well-nested 1458 (99.9) 72321 (99.5) 5175 (99.7) 12896 (96.6) 8650 (95.4) 10955 (99.2) O(n7) +block degree 2 +0-Inherit 1394 (95.5) 70695 (97.2) 4985 (96.1) 12068 (90.4) 8481 (93.5) 10787 (97.7) O(n5) Well-nested 1394 (95.5) 70883 (97.5) 4986 (96.1) 12116 (90.8) 8825 (97.3) 10792 (97.7) O(n5) + 0-Inherit
Table 3.4: Empirical coverage when the gap degree restriction is dropped.
The effect on empirical coverage of dropping the gap degree restriction is in Table 3.4. For Portuguese, the coverage is actuallyhigherfor no inheritance butunbounded gap degreethan the case with gap degree 1 (block degree 2) butunbounded inheritance degree. We can parse well-nested trees with no gap inheritance by modifying the definition of theDhelper function:
D0[i,j,h,m,b]: The maximum score of the score of the edge from h to m plus the scores of any set oftwo or more gap-minding trees, alternating betweentrees rooted ath and rooted atmwith vertices[i, j]∪ {h, m}such that vertexiis in a tree rooted athifb=
true(and atmifb =false), and vertexj is always in a tree rooted atm.
The intuition is that now, rather than concatenating together just one pair of a node’s interval and its gap, we can repeatedly alternate between concatenating on another interval or concatenating on another gap. No gap inheritance means that all the projection intervals of a node are independent given that node, and this holds equally well for an arbitrary number of intervals as it did when we had just two (gap degree one).
The two cases which need to be updated are below:
verticesV = [i, j]∪ {h}. ThenT and its score are derived from one of the following:
M/C Ifhhas multiple children inT andiandjare descended from the same childminT, then there is a split pointksuch thatT’s score is: C[i,k,m] +D0[k+1,j,h,m,T].
M/C: Letk be the rightmost vertex inm’s leftmost projection interval. By no gap in- heritance, we can split up the score of the tree as the the score of the subtree corresponding tom’s leftmost interval (C[i,k,m]), and the score of the edge fromhtom, the score of the subtrees rooted athwith its remaining children and the subtrees rooted atmcorresponding to all the other intervals ofx’s projection (D0[k+1,j,h,m,T,F]).
The other case which needs to be updated is the definition ofD0:
D0(i,j,h,m,T) = max k C[i, k, h] +D0[k+ 1, j, h, m, F] C[i, k, h] +C[k+ 1, j, m] +Score(Edge(h, m)) (3.3) D0(i,j,h,m,F) = max k C[i, k, m] +D 0[k+ 1, j, h, m, T]
When b =true, D0 is made up of two or more trees that alternate being rooted at h andm such that the leftmost subtree is rooted at h and the rightmost subtree is rooted at m. This could either have exactly two subtrees (base case), in which we concatenate two individual trees (C[i,k,h]andC[k+1,j,m]) and add the score of the edge fromhtom. Otherwise, this interval has four or more subtrees and so is created by concatenating an interval rooted ath(C[i,k,h]) to aD0 alternating interval that begins with a tree rooted at m(and sob=false).
Whenb =false, then the number of subtrees is at least three and odd, and so this can only be built by concatenating a interval rooted atmto an existing alternating interval that begins with a tree rooted ath.
We do not pursue this modification experimentally at this time, as the next chapter will show a different type of non-projectivity (defined over edges, rather than subtrees) that has higher coverage in every language and lower asymptotic parsing time.
3.9
Conclusion
Gap inheritance, a structural property on trees, has implications both for natural language syntax and for natural language parsing. We have shown that the well-nested block degree 2 trees present in natural language treebanks all have zero or one children inherit each parent’s gap. We also showed that the assumption of 1 gap inheritance removes a factor of nfrom parsing time, and the further assumption of 0 gap inheritance removes yet another factor ofn. More recent work has shown that restricting the 1 gap inherit class to trees that arehead-split(requiring a child that gaps over its parent to also inherit its parent’s gap) can also be parsed in O(n5), with almost the same coverage as the 1 gap inherit class (Satta and Kuhlmann, 2013). The space of gap-minding trees provides a closer fit to naturally occurring linguistic structures than the space of projective trees, and unlike spanning trees, the inclusion of higher order factors does not substantially increase the difficulty of finding the maximum scoring tree in that space. Furthermore, we showed that unlike general well- nested trees, which have a parsing complexity that increases exponentially with the gap degree, arbitrarily large gap degrees pose no additional complexity for well-nested trees without gap inheritance.
Chapter 4
Finding Optimal 1-Endpoint-Crossing
Trees
Material in this chapter previously appeared in Pitler et al. (2013).