1.3 Outline
2.1.4 Summary
So far I have introduced the formalism of dependency grammar, ex- plained how one can produce the respective representations by means of dependency parsing and listed the most important properties of such systems. For each property I have mentioned the state of the art parsers associated with it. I will briefly summarise the most important systems according to their most prominent characteristics before introducing my own parser MDParser in the next section. I will also assess some of their properties in a qualitative manner, however, a detailed quantitative eval- uation will come in a later section dedicated to that.
MaltParser (Nivre et al., 2006) is a data-driven transition-based de- terministic parser. It provides various parsing strategies which support both projective and non-projective parsing and the grade of incremental- ity, i.e. the size of the lookahead, can be specified with a formal feature specification language. It also supports both parsing and labeling in one step, as well as doing it separately. It does not have a mechanism for au- tomatic feature engineering, therefore the feature models have to be care- fully specified and have a big impact on the quality of the results. There are currently two versions of machine learning strategies supported: Lib- SVM and Liblinear. I will refer to these as MaltParser(LibSVM) and MaltParser(Liblinear) in the course of the thesis. LibSVM requires a lot of time both during the training and testing phase, whereas Liblinear is very fast during both. MaltParser(LibSVM), however, produces slightly more accurate results than MaltParser(LibLinear). MaltParser can be trained for both Stanford and CoNLL dependencies and it only delivers the syntactic analysis of the input.
MSTParser (Mcdonald et al., 2005) is a graph-based parser. Due to this fact it is non-deterministic and non-incremental by nature. It does parsing and labeling separately. The feature models are responsible for a great portion of its accuracy and therefore a careful feature engineering is necessary. MST Parser has a very complex machine learning approach behind it, which requires a lot of time and resources to train the model. The resulting model is huge and it also takes a lot of time and resources to apply in the parsing phase. MST Parser can learn both Stanford and CoNLL dependencies and also produces only the syntactic analysis.
Stanford Parser (Klein and Manning, 2003b) is a phrase structure grammar parser. It is able to derive so called Stanford dependencies out of phrase structures with help of head rules, which identify the head and the children for all possible PSG constructions. Since Stanford Parser is a PSG parser it has a completely different training phase. In PSG parsing one has to first read off all possible rules from the training treebank and in the second stage learn their probabilities. The starting point for that in Stanford Parser is the maximum-likelihood estimation and then some other steps like markovisation or smoothing (Klein and Manning, 2003b) are performed to deal with the problem of sparsity. Stanford models are relatively compact, but because of the two stage processing (first PSG, then transformation to DG) the overall parsing time is still long. Stanford Parser is the only system, which is able to work with plain text. It contains all preprocessing components like sentence splitter, tokeniser, pos-tagger, which are necessary to transform the plain text input into a parsable format.
Minipar (Lin, 1998) is a rule-based dependency parser for English. It is an old parser which has not been developed for a long time, but it still popular because of its efficiency. Minipar is a chart parser which constructs all possible parse trees for a sentence and then selects the best. Additionally, Minipar’s parsing algorithm requires all words to be initiated with their categories before the processing and thus the parser is neither deterministic nor incremental. As far as the quality of results is concerned Minipar performs worse than the state-of-the-art data-driven parsers. However, because it is rule-based it is able to judge whether a sentence is grammatical or not, whereas data-driven parsers always provide some dependency analysis for whatever input sentence given.
Ensemble (Surdeanu and Manning, 2010) is a combination of all pars- ing algorithms implemented in MaltParser: Nivre’s arc-eager, Nivre’s arc-standard and Covington’s non-projective model, each with the pars- ing direction left to right and right to left, plus the default algorithm of MST Parser: Overall 7 different models. Each individual parser runs in a separate thread which allows to parse at the time of a single parser, provided the necessary processing resources. Ensemble is different to many other experiments of combining different strategies at the learning time, so that one gets either only one hybrid model or the individual models at least are enriched with features based on the output of other models. Ensemble performs the combination after parsing is done, by a
voting mechanism. Additionally, it has a procedure to guarantee that the resulting dependency tree is well-formed. Otherwise the system has the same properties as MaltParser.
Clear Parser (Choi and Palmer, 2011a) is a new parser inspired by MaltParser. It proposes some modifications which improve both effi- ciency and accuracy. The efficiency improvement comes from a modi- fication of Nivre’s parsing strategy. Clear Parser differentiates between projective and non-projective structures and is able to avoid unnecessary search for non-projectivity if its model predicts so. For the accuracy boost the developers use a different training strategy. They do not only learn on the gold standard training data, but they also produce a variant of the training data parsed by the parser. Thus the parser is trained both on gold-standard annotation and automatically parsed trees. This helps them to create a better model, which allows to reduce the gap between the training data and what the parser is able to replicate. Otherwise the parser has the same properties as MaltParser. Besides syntactic analysis there is also a semantic labelling component.
Mate-tools (Bohnet, 2010) is another new parser, however, inspired by MST Parser. It proposes a modification for considerably improving parsing speed by means of optimising the feature extraction procedure, which accounts for the most time spent on processing. On the one hand the developers employ a method improving the mapping of feature val- ues to their indexes, i.e. corresponding unique integers. On the other hand they make use of several CPU cores, if available, in order to extract features in parallel. Mate-tools also offers a semantic labelling compo- nent.
In this summary I have presented the most prominent parsers of the last years. The older data-driven parsers have had high accuracies as their top priority and are very slow. The only exception to this is Mini- par, which is remarkably fast, considering the year of its development. Despite the fact that it was less accurate, it still enjoys a wide popular- ity. The probably most widely-used system, however, is Stanford Parser, despite the fact that it is neither the most accurate one, nor does it al- low fast processing. In fact Stanford Parser is very slow, because it first creates phrase structures and it has problems with longer sentences. To my mind the popularity of the parser arises from the fact that it allows processing of plain text and the user does not have to search for com- ponents which have to do that in the desired format, as it is the case
with all other systems. The newer developments show that parsing speed is becoming more important, however the speed improvements are only admissible if they do not impair accuracy, as it is the case with more efficient data structures or parallel computing.