• No results found

words, automatic syntactic analysis (i.e., syntactic parsing) would have a hard time discovering the correct syntactic structure in Warlpiri. While not all languages in the world are as extreme as Warlpiri, most of them employ morphology to some extent to express syntactic structure. Morphological analysis is therefore an important part of syntactic parsing in a multilingual setting. However, automatic morphological analysis in itself is not a simple task either, since—as always in natural language processing—it needs to deal with ambiguity.

3.2

Syncretism

Consider the two German sentences presented in Examples 3.1 and 3.2. Both sentences contain the word fahren (to drive). In the first example, fahren is a finite word form in third person singular present tense, in the second example, fahren is the present tense infinitive. Although both word forms look the same, they have different morphological features and are indeed different inflections of the lemma fahren. The phenomenon when two (or more) morphological forms of a lemma have the same surface form is called syncretism.

(3.1) Peter Peter sagte, said daß that.CONJ sie they morgen tomorrow nach to Berlin Berlin fahren. drive.3-SG-PRES

Peter said that they will drive to Berlin tomorrow.

(3.2) Peter Peter will want.MODAL morgen tomorrow nach to Berlin Berlin fahren. drive.INFINITIVE

Peter wants to drive to Berlin tomorrow.

To correctly predict the morphological features of fahren in each sentence, one needs to know if the verb is the main verb of a subordinate clause introduced by daß (Example 3.1) or if the verb is embedded by the modal verb will (Example 3.2). Thus, the resolution of the syncretism must rely on syntactic information about the sentential context of the syncretic word.

Baerman et al. (2005: 2) characterize syncretism as a mismatch between morphology and syntax: A syntactically relevant distinction is not made by the morphology. In the examples above,

36 3 Motivation

it is the distinction between finite and non-finite verb forms. Syncretism is a common phe- nomenon in many languages and occurs in verbal and nominal morphological paradigms. The World Atlas of Language Structures Online2lists 60 languages with syncretism in verbalPERSONandNUMBERmarking out of the 141 languages listed that markPERSONor

NUMBERat all (Baerman and Brown 2013b). Similarly, it lists 40 languages with a syncretic

case system out of the 75 languages listed that have a case system (Baerman and Brown 2013a).

Formally, syncretism can be characterized as an Identity in form between two grammatically different inflections (Trask 1997, as cited in Baerman et al. 2005: 2). Syncretism occurs when one surface form of a single word occupies more than one cell in this word’s inflection paradigm. To illustrate this, Table 3.1 shows the declension paradigms of two Czech nouns, bratr (brother) and mˇesto (city). Syncretic forms are marked by different colors. The two examples show mostly syncretism with respect toCASEwith the exception of the

form mˇesta additionally being ambiguous with respect toNUMBER. Case syncretism is a typical property of Indo-European languages. Among these, Slavonic languages show the highest degree of variation and complexity (Baerman et al. 2005: 38).

MASC ANI SG PL

NOM bratr bratˇri

ACC bratra bratry

DAT bratrovi/u bratr ˚um

GEN bratra bratr ˚u

VOC bratˇre –

LOC bratrovi/u bratrech

INS bratrem bratry

(a)Czech, masculine animate: brother

NEUT SG PL

NOM mˇesto mˇesta

ACC mˇesto mˇesta

DAT mˇestu mˇest ˚um

GEN mˇesta mˇest

VOC mˇesto –

LOC mˇestˇe/u mˇestech INS mˇestem mˇesty

(b)Czech, neuter: city

Table 3.1:Syncretism in two Czech nominal inflection paradigms (masculine animate and neuter).

3.2.1 Disambiguation in Context

Given in isolation, neither machine nor human are able to fully disambiguate a word form like bratra in Table 3.1a. By the word form alone, we can disambiguate it to masculine

3.2 Syncretism 37

singular but cannot decide on the case value; for a word like mˇesta (Table 3.1b), we would not even be able to decide for the number value. However, we rarely encounter words in isolation but rather in sentences and texts. In a sentence, a word is embedded into a syntactic context that can be used to disambiguate the ambiguous word form.3

Examples 3.1 and 3.2 already show that the syntactic context (presence of a subordinating conjunction versus embedding under a modal verb) can serve to disambiguate a syncretic word form. Another way of disambiguating a given word form in a sentence is to use the morphosyntactic rules in the grammar of a language, e.g., rules of government and agreement. Both describe a particular relationship between words in a sentence:

• Government describes the situation when a word imposes certain morphological values onto another word, e.g., when a verb imposes specific case values on its nominal dependents. For example, German subjects have to be in nominative case.

• Agreement can be defined as a systematic co-variance of a semantic or formal feature between two words (Corbett 2006: 4). For example, the number feature of a subject and a predicate co-vary in English.

Figure 3.3 illustrates government and agreement with a simple German sentence. Govern- ment is shown in the arc above the sentence, agreement with the arcs below. The predicate schl¨aft governs the nominative case of the noun Schnecke. The verb and the noun agree with respect toNUMBERand the noun agrees with the adjective and the determiner with respect toCASE,NUMBER, andGENDER.

Die klein-e Schnecke schl¨af-t .

the.NOM.SG.F small-NOM.SG.F snail.NOM.SG.F sleep-3SG.PRS .

gov:NOM

agr:SG

agr:NOM.SG.F

agr:NOM.SG.F

Figure 3.3:Government and Agreement examples in German.

3Humans use many different linguistic and non-linguistic sources to disambiguate a sentence. In this work, we focus on the syntactic context and its relation to morphological ambiguity. It is clear however that even though a system that perfectly models all interactions between syntax and morphology in a language would get us a big step forward, there would still be a long road ahead of us.

38 3 Motivation

The government and agreement rules shown in Figure 3.3 are part of the grammar of German and need to be obeyed in German sentences. To see how they help in dealing with syncretism, consider the inflection paradigm of the German definite article and the noun Schnecke in Table 3.2. The noun in Table 3.2b is morphologically marked only for

NUMBER, there are no forms that distinguish case values. German noun inflection generally

distinguishesNUMBER and only in some cases marksCASE. The definite determiner in Table 3.2a, however, is generally better marked forCASEthan forNUMBER. Both words therefore carry different loads of morphological information within a noun phrase, a situation that has been dubbed Funktionsteilung (function sharing) by Eisenberg (2006: 142).

M.SG N.SG F.SG MFN.PL

NOM der das die die

ACC den das die die

DAT dem dem der den

GEN des des der der

(a)German, definite determiner: the

Schnecke.F SG PL

NOM Schnecke Schnecke-n

ACC Schnecke Schnecke-n

DAT Schnecke Schnecke-n

GEN Schnecke Schnecke-n

(b)German, feminine: snail

Table 3.2:Syncretism in the definite determiner and a noun in feminine gender in German.

By relating the two words by government and agreement, as shown previously in Fig- ure 3.3, both words are able to disambiguate each other. The nominative case is imposed on the noun by the governing verb and thus excludes all other case values. At the same time, the determiner agrees with it with respect toGENDER,NUMBER, andCASE. The fully specified noun therefore fully disambiguates the determiner by virtue of the Agreement relation.

We can thus see that access to syntactic structure is an important source for disambiguating morphological information in the cases where a word form is ambiguous. A word form that is ambiguous between nominative and accusative case in a language where subjects are marked by nominative case fails at marking subjecthood. The syntax must then rely on other means to determine the subject, e.g.,NUMBERagreement.

A third source of information that interacts with government and agreement is the valency, or the subcategorization, of a verb . Consider the example in Figure 3.4. The word form mˇesta can be nominative or accusative plural, or genitive singular (see Table 3.1b). But the fact that the verb already has a subject (bratr) restricts the choice of functions for mˇesta, because verbs cannot have more than one subject. If mˇesta turns out to be direct object, its