3.2 State-of-the-Art for German Data-Driven
3.2.1 Morphological Information
An approach which has not been tried for English (for obvious reasons) is the enrichment of the parsing models with morphological information. Cahill(2004),
Schiehlen (2004) and Versley (2005) present a somewhat simplistic way of inte- grating morphological information into the syntactic node labels of their gram- mars and report contradicting results.
AsCahill (2004) andSchiehlen (2004) both work with a treebank which does not include explicit morphological annotation (NEGRA and TiGer Release 1, re- spectively), they automatically simulate morphological information in the trees. They exploit functional annotations in the treebanks and percolate case infor- mation, which is implicitly encoded in the grammatical function labels, down to the leaf nodes. Cahill(2004) annotates POS tags like determiners, adjectives and pronouns with case information, while Schiehlen (2004) assigns case marking to the categorial nodes themselves and, for NPs, also to NP-internal common nouns and pronouns. Grammatical function labels triggering such a transformation are SB, PD and SP (nominative), OA and OA2 (accusative), DA (dative), and AG and OG (genitive).
Cahill (2004) did not observe any improvement over parsing models without case information. She puts this down to the incompleteness and coarseness of the grammar transformation and expects better results for a more detailed and complete morphological analysis. In contrast to Cahill (2004), the results of
Schiehlen (2004) show a clear improvement of around 4% for a constituency- based evaluation and around 3% for a dependency-based evaluation. It is not clear whether the contradictory results are due to the differences with respect to the tree transformations, the different sizes of the training sets (Cahill trained on a TIGER training set of about twice the size of the NEGRA treebank) or the parsing models themselves (Schiehlen’s PCFG includes grammatical function labels only for the case-marking transformations described above, while Cahill uses an LFG f-structure-annotated PCFG with far more information; Cahill’s
model integrates grammatical functions and LFG f-structure annotations into the syntactic node labels).
Cahill (2004) and Schiehlen (2004) try to improve parser accuracy for Ger- man by enriching the node labels with case information. Dubey (2005) presents a different approach to include morphology into the parsing model. He provides a special treatment for unknown words by the means of a suffix analyser (Brants,
2000). Results show that the suffix analysis does improve parser performance, but only after applying a number of linguistically motivated treebank transforma- tion strategies. In contrast to Schiehlen (2004), who argued that Markovisation does not help for the German NEGRA treebank, Dubey (2005) achieves better results for a Markovised grammar induced from NEGRA. However, Dubey(2005) presents a constituency-based evaluation only, so the question whether Marko- visation does help for parsing German in general (i.e. also for a dependency- based evaluation) cannot be answered here. Versley (2005) addresses this is- sue by presenting parsing experiments for German across different text types. Like Schiehlen (2004) and Dubey (2005), he applies a number of linguistically motivated treebank transformations. In his experiments Markovisation gives a slight improvement for the transformed grammar (dependency evaluation), while it hurts performance for a vanilla PCFG. Case marking, included in the syntactic node labels of NPs as well as the POS tag labels of determiners and pronouns, also helps for all different text types.
So far the literature on parsing German has reported a rather confusing picture of the usefulness of different features like grammatical functions, lexicalisation, Markovisation, split & merge operations and morphology for boosting parsing performance for German. Rafferty and Manning (2008) follow up on this and try to establish baselines for unlexicalised and lexicalised parsing of German, using the Stanford parser (Klein and Manning, 2003) with different parameter settings, trained on the German TiGer and T¨uBa-D/Z treebanks. The results obtained, however, do not settle the case but rather add to the confusion. What becomes clear is that the three settings tested in the experiments (Markovisation, lexicalisation and state splitting) strongly interact with each other, and also with a number of other factors like the size of the training set, the encoding and, in particular, the number of different categorial node labels to be learned by the
parser. This number crucially increases when including grammatical function labels in the categorial node labels. It becomes apparent that especially the TiGer treebank suffers from a sparse data problem, caused by the flat trees, and that smoothing could present a possible way out of the dilemma. This is consistent withDubey (2004,2005), who achieves considerable improvements by experimenting with different smoothing techniques.
Rafferty and Manning (2008) present no dependency-based evaluation but
PARSEVAL F-scores only, which leads them to conclude that including gram- matical functions in the parsing model increases data sparseness and therefore reduces parser performance by 10-15%. The inclusion of grammatical functions into the node labels results in a set of 192 (instead of 24) syntactic category la- bels for TiGer, which have to be learned by the parser. Therefore, a decrease in F-score is not surprising. However, due to the variability of the relatively free order of complements and adjuncts in German, it is not sufficient to identify say an NP node label with the correct phrase span. In order to recover the meaning of a sentence, it is also necessary to distinguish arguments from adjuncts, and to identify the grammatical function of each argument. Therefore it is arguable whether higher F-scores for an impoverished parser output present useful infor- mation, or whether lower scores for a more meaningful representation are, in fact, better.