Dependency Parsing - Performance-oriented dependency parsing

1.3 Outline

2.1.2 Dependency Parsing

Parsing is the process of assigning a syntactic analysis to a sentence. In dependency parsing the analysis is based on the dependency grammar. There are numerous ways of getting dependency representations. The most important approaches are rule-based dependency parsing, data- driven dependency parsing and the derivation of dependencies from a different syntactic theory.

In the rule-based approach a formal grammar describes the language which can be parsed and how the structure of its sentences looks like. A grammar parsing algorithm then is able to determine whether a given sentence belongs to the language or not. In positive case the derivation tree is the dependency analysis of the sentence. The main challenge of this approach is to provide a grammar of sufficient coverage, since it is difficult to provide enough rules in order to accept all possible sentences and at the same time avoid introducing ungrammatical structures. An- other challenge with this approach is disambiguation, since sometimes the same sentence can have several possible derivations and a mechanism determining the most probable one is necessary.

In the data-driven approach the mapping from strings to structures is induced from the data. A data-driven parsing algorithm then is able to construct different analyses, score them and select the most probable, according to what is most reasonable given the data it has seen before. Thus data-driven parsing can be split in two phases: the training phase, when the mapping is learned, and the parsing phase, when the mapping is applied to get the most probable structure. Rule-based parsers have good accuracies, but only for the sentences which are covered by the grammar, whereas the data-driven parsers accept any string given and try to make best out of it. The drawback is that even ungrammatical sentences get an analysis, but coverage is no longer a problem. For good accuracies, however, a lot of data is necessary, which is the main challenge of this approach.

Both approaches are inherently doing the same thing: the parsers are given rules, apply them and derive a structure. The difference is how one gets the rules: in the data-driven approach they are induced from the data and in the grammar-based approach the rules should deduce the data. The difference is thus as with inductive and deductive rea- soning. One can simplify it by saying that rule-based systems initially have perfect accuracy and grow in coverage during their development and data-driven systems initially have perfect coverage and then grow in accuracy (Nivre, 2006).

From the technical point of view the data-driven approach is more appealing, since the main labour with this approach is the preparation of the appropriate data, which usually has already been done by some- one else. In the rule-based approach the developer of the parser has to develop the set of rules, as well as their weights and priorities manually, whereas with data-driven approaches it is automatised by machine learning techniques. Especially, if one desires a multilingual parser it is only possible with the data-driven approach, since one and the same parser can work for various languages, provided the data. On the contrary, a rule-based system needs language-specific grammars for each language, which is an unmanageable problem.

The third possibility of deriving dependency representations does not strictly belong to the field of dependency parsing, because the underlying parsing process is based on a different syntactic theory. However, it is still relevant, because it allows to derive the same representation. For example one can derive dependencies from phrase structures by applying head rules (catherine De Marneffe et al., 2006) or one can derive them even easier from other deeper formalisms like LFG (Kaplan and Bresnan,

1995) or HPSG (Pollard and Sag, 1994), since their analyses contain even more linguistic information to construct dependency trees out of them (Zhang and Wang).

Since we are interested in the efficiency-oriented parsing for applications, and applications often require multilinguality, I will focus on the data-driven parsing in this thesis. Therefore I will define the training and parsing phases for this approach in more details.

Depending on whether the data one wants to learn from is annotated or not, the learning can be supervised or not supervised, respectively. I will restrict myself to supervised parsing only because supervised dependency parsing is a mature technology, whereas unsupervised dependency parsing is still a relatively unexplored area. Therefore it would be to difficult to perform a reasonable evaluation and meaningful comparison of my work with other developments.

There are two possibilities how one can proceed in order to produce dependency representations given the appropriate data: graph-based and transition approaches.

Algorithms belonging to the category of transition-based parsing strategies deliver the parse of a sentence after performing a sequence of actions one after another. The final result should be the set of all dependency relations, required to construct the correct dependency tree. The module responsible for choosing the best operation in every step is called oracle. At first, during the training phase, the perfect oracle is simulated by using the training data, during which the system is driven by the already given gold standard result. During this phase the system learns which operation is likely to be chosen in which situation. These situations depend on the current state of the system and its auxiliary data structures, and are called configurations. The result is a model, which is then used to make predictions about the most likely action, when the gold standard result is unavailable, which happens in the parsing phase. An example for a transition-based parser is again MaltParser. In graph-based parsing the algorithms construct dependency graphs for a given sentence, typically by initially assuming edges between all words and then eliminating all wrong ones until the graph does not be- come a valid dependency tree with the maximum global score according to the model. The model is approximated in the training phase by learning the edge weights from the data. This approach is very different from the transition-based parsing, since it finds the solution in several global steps, which involve the information about the entire sentence, whereas transition-based algorithms use only local information and require a long

sequence of parser decisions in order to arrive at the final result. MST Parser is an example for a parser of this class.

In document Performance-oriented dependency parsing (Page 37-40)