In this section, we specify how we implement the incremental entity-mention and the mention-pair models which we evaluate later on in this chapter. In chapter 2, we have outlined the conceptual differences of the two models and argued for the advantages of our incremental entity-mention model. Here, we focus on how certain functions of the algorithms are realized in their implementations. This includes e.g. choosing appropriate filter settings for certain parameters, such as sentence distance between two potentially coreferring markables etc.
5.2.1 Outline of the algorithms
In section 2.1, we have outlined the standard algorithms for training and testing in the Soon et al. (2001) implementation of the mention-pair model. In the previous chapter, we have discussed how German coreference resolution approaches adapted this model (Strube et al., 2002, Hinrichs et al., 2005, 2007, Klenner and Ailloud, 2009, Wunsch, 2010). Since related work used different evaluation protocols and different test sets, we implement a related work baseline which is based on the mention-pair model. This enables us to more directly compare our incremental entity-mention model to previous work than contrasting F-scores obtained in different evaluation settings.
Table 5.3 juxtapositions the two algorithms we investigate in our experiments. Note that we depart slightly from the original mention-pair algorithms presented in Soon et al. (2001) and use an adaption in line of Hinrichs et al. (2005), Klenner and Ailloud (2009), Wunsch (2010). That is, for personal and possessive pronouns, we query the preceding three sentences for antecedent candidates, both during the creation of training instances and during automatic resolution of the pronouns. During testing, we choose the highest weighted candidate as antecedent within the three sentence window, i.e. we adapt a best-first heuristic.8
Comparing the two algorithms, we see that the mention-pair model lacks three main features compared to our incremental entity-mention model:
• The incremental formation of the coreference partition (lines 11-13), which allevi- ates the need for a final clustering step
• The division of antecedent candidates into those stemming from the coreference partition and those coming from the buffer list (lines 2-7)
Chapter 5. Empirical validation of our entity-mention model 81
Algorithm: Mention-pair model Input: Markables
Output: Coreference partition 1: for mi ∈ M arkables do
2: for mj∈ Buf f erList do
3: if compatible(mj, mi) then
4: Candidates ⊕ mj
5: ante ← get best(Candidates) 6: if ante 6= ∅ then
7: P airs ⊕ {ante ⊕ mi}
8: Buf f erList ⊕ mi
9: Coref P artition ← trans merge(P airs) 10: return Coref P artition
Algorithm: Entity-mention model Input: Markables
Output: Coreference partition 1: for mi∈ M arkables do
2: for ek ∈ Coref P artition do
3: if compatible(ekn, mi) then 4: Candidates ⊕ ekn 5: for mj ∈ Buf f erList do
6: if compatible(mj, mi) then
7: Candidates ⊕ mj
8: ante ← get best(Candidates) 9: if ante 6= ∅ then
10: ante, mi← disambiguate(ante, mi)
11: if ∃ek ∈ Coref P artition : ante ∈
ek then ek⊕ mi
12: else
13: Coref P artition ⊕ {ante ⊕ mi}
14: Buf f erList ⊖ ante 15: else
16: Buf f erList ⊕ mi
17: return Coref P artition
Table 5.3: Mention-pair vs. entity-mention algorithms used in our experiments.
We have outlined in section 2.3 that the function disambiguate(·, ·) propagates all se- mantic and morphological information from the antecedent onto the anaphor, effectively disambiguating its morphological properties in the case of underspecification. Addition- ally, we project the grammatical role of an antecedent to a possessive pronoun in the case that both are in the same sentence. The salience of an entity depends to a large degree on the grammatical role of its last mention in our entity-mention model. In the case that an entity occurs e.g. first as a subject (high salience) and then as a possessive pronoun (lower salience) in a sentence, we want to keep the high salience evoked by the subject mention. In other words, we do not want the salience of the entity to degrade in a sentence only because there is also a possessive pronoun mention of that entity. We found in Tuggener and Klenner (2014) that this technique improves performance overall.
5.2.2 Morphological agreement and distance constraints
One main function in both algorithms is compatible(·, ·). This function determines mor- phological compatibility of a pronoun and a potential antecedent candidate. As we have seen in the previous chapter, filtering based on morphological agreement is a crucial step to reduce the number of potential antecedent candidates for German pronouns, which
8
Chapter 5. Empirical validation of our entity-mention model 82
removes up to 50% of the incorrect potential candidates. However, underspecification of certain German pronouns complicates this step, as these underspecified pronouns allow for specific subsets of all possible morphological attributes. That is, these pronouns are not fully underspecified, which would license any candidate antecedent. Therefore, testing morphological compatibility is not a simple unification process. To account for the specific possible combination of morphological properties, we implement a PoS- and lemma-based filtering scheme for matching number and gender properties, similar to Wunsch (2010). Note that all antecedent candidates and pronouns have to match re- garding their person feature.9 Also, a personal pronoun cannot link to an antecedent
governed by the same verb as the pronoun. For example, in the sentence “Peter likes him”, “him” cannot refer to “Peter” due to binding constraints. Our filtering scheme works as follows:
• Lemma-based filtering: We first ensure exclusiveness of gender incompatible pronouns by lemma-based matching. That is, pairs are discarded if the pronoun lemma is either sie (she/they) or ihr (her/their) and the antecedent candidate is singular, masculine, or neuter. Conversely, if the pronoun lemma is either er(he) or sein (his/its) and the pronoun lemma of the antecedent candidate is sie (she/they) or ihr (her/their), the pair is discarded.
• Gender and number matching for non-possessive pronouns: Pairs of per- sonal, relative, and demonstrative pronouns and potential antecedent candidates are licensed if they match in number and gender, i.e. if they share the same re- spective values. This constraint is relaxed in the following manner. If the pronoun or the antecedent candidate is underspecified in both number and gender, the pair is licensed. If either the pronoun or the antecedent is underspecified in number, but gender matches, the pair is licensed. Conversely, if either the pronoun or the antecedent is underspecified in gender but number matches, the pair is licensed. • Possessive pronouns: For personal pronouns, we license antecedent candidates
based on the lemma of the possessive pronoun. That is, for sein (his/its), the candidate has to be masculine or neuter, but not feminine and singular. For ihr (her/their), the candidate has to be either plural, or singular and feminine.
Both the mention-pair and our entity-mention model use the same filtering scheme. The difference between the two models w.r.t. to morphological filtering becomes evident when a pronoun which is per se underspecified serves as an antecedent candidate for another pronoun. For example, assume we resolve an instance of the personal pronoun
9
Cf. section A.1 for the heuristics we apply to resolve first person pronouns to their third person antecedents.
Chapter 5. Empirical validation of our entity-mention model 83
[sie]Sg.(she) and the antecedent candidate is a possessive pronoun [ihr]∗(her/their). Let us assume we have already resolved [ihr]∗ to [F rauen]P l. in both models. The mention-
pair model would generate the pair [ihr]∗ − [sie]Sg., although they are exclusive. By contrast, the entity-mention model would have projected the morphological properties to the possessive pronoun (line 10 in the right algorithm in table 5.3) and thus would not create the pair [ihr]P l.− [sie]Sg..
As stated in the previous section, we allow for a window of three preceding sentences to look for antecedent candidates for personal and possessive pronouns. Obviously, relative pronouns can only bind to antecedents in the same sentence. Demonstrative pronouns present a difficult class, since our assumption that all pronouns are anaphoric here leads to many false positives, as can be seen in figure 5.1. The difficulty of resolv- ing demonstrative pronouns is also documented in Schiehlen (2004) and Strube et al. (2002). Schiehlen reported an overall pronoun F-score of 65.4%, but demonstratives only achieved an 16.6% F-score. Similarly, Strube et al. reported an F-score of 82.79% for personal pronouns, but only 15.38% F-score for demonstratives. To cope with the anaphoricity detection problem, we limit the search for antecedent candidates to the current and previous sentence. This restriction can be rooted in linguistic theory on entity accessibility in short-term memory. Theories on givenness (Ariel, 1988, Gundel et al., 1993, inter alia) state that demonstratives can be used to refer to entities that are activated in the hearer’s short-term memory, but are not necessarily in the discourse focus at the time of the occurrence of the demonstrative. That is, demonstratives are generally used to refer to entities which have been mentioned very recently but are not currently the most salient ones. We thus argue that limiting sentence distance more strictly for antecedent candidates for demonstrative pronouns is a reasonable approach to tackle the anaphoricity problem. That is, if no candidates are found in the current or previous sentence, we do not resolve a demonstrative pronoun.