This chapter outlined the goals of this theses and introduced the coreference phe- nomenon. We presented a Text Linguistics account of coreference and discussed the implications of coreference for an exemplary set of applications in Computational Lin- guists.
We discussed a common pipeline for coreference resolution systems and showed how the contained steps affect evaluation. Additionally, we introduced the common nomenclature encountered in shared tasks on coreference resolution.
Chapter 2
Discourse processing models for
coreference resolution
In this section, we focus on the discourse processing models that approaches to coref- erence resolution apply to process the markables. We first discuss the predominant model and its shortcomings, and then overview conceptual and empirical improvements. We introduce our coreference model that addresses the issue of underspecification of mentions, with a focus on German pronouns.
We note that there is a large diversity in the experimental setup in the related work discussed below. Thus, it is difficult to assess which approach performs best in general.1 Nonetheless, we will indicate performance scores in cases where a baseline is compared to an extension in the same experimental setup.
To facilitate our discussion, we overload the term mention to denote NPs and pronouns (potentially) partaking in coreference relations.2
2.1
Mention-pair model
The most prevalent model for establishing coreference between mentions is the so called mention-pair model introduced by Aone and Bennett (1995) and McCarthy and Lehnert (1995), but popularized in its variant presented in Soon et al. (2001). The model’s name
1Some of the approaches discussed below resolve gold mentions only. Given the gold mentions, a system only needs to establish the correct coreference links between them. That is, the system does not need to decide which NPs it should resolve (cf. section 1.5.4). Other systems process all markables in a more realistic setting, which raises the task complexity. Furthermore, the approaches evaluate on different test sets collected from different corpora, sometimes using different evaluation metrics. Therefore, we cannot compare their scores directly.
2
For a concise explanation of the nomenclature, cf. section 1.5.4.
Chapter 2. Discourse processing models for coreference resolution 18
implies its basic workings: It creates pairs of mentions and represents them in feature vectors. A binary classifier then decides for each pair instance whether it should be labeled as coreferent.
Algorithm: Mention-pair Training
Input: Markables, gold coreference partition Output: Pair instances P airs
1: for mi∈ M arkables do
2: if mi∈ Coref P artition then 3: for mj∈ M arkables do
4: if j < i ∧ coref (mj, mi) then 5: P airs ⊕ {mj, mi, positive}
6: break
7: else if j < i ∧ ¬coref (mj, mi) then 8: P airs ⊕ {mj, mi, negative} 9: return P airs
Algorithm: Mention-pair Testing (Closest first) Input: Markables
Output: Coreference partition
1: for mi∈ M arkables do
2: for mj∈ reversed(M arkables) do 3: if j < i then
4: class ← classif y(mj, mi) 5: if class == positive then
6: P ositiveP airs ⊕ {mj, mi}
7: break
8: Coref P artition ← trans merge(P ositiveP airs)
9: return Coref P artition
Table 2.1: Mention-pair algorithms for creating training instances (left) and for re- solving markables (right).
Table 2.1 shows the algorithms for creating training and testing instances as proposed by Soon et al. To obtain training instances (left algorithm), a gold mention mi (line 2
determines that the mention is coreferent according to the gold standard) is paired with the immediate antecedent in its coreference chain to create a positive instance (line 5). Negative instances are formed by pairing mi with all mentions (including singletons) of
other entities on the way to the closest antecedent of mi (lines 7-8). Soon et al. trained
a binary decision tree on these training instances. Subsequent work has explored other machine learning frameworks.
In our running example (Die Staatsanwaltschaft [...] Im Januar hat die Arbeiterwohlfahrt Bremen ihren langj¨ahrigen Gesch¨aftsf¨uhrer Hans Taake fristlos entlassen), consider mi
to be the possessive pronoun [ihre]. The algorithm would iterate the previous (morpho- logically compatible) mentions from right to left, i.e. [die Arbeiterwohlf ahrt Bremen] and [Staatsanwaltschaf t]. It would, however, stop at [die Arbeiterwohlf ahrt Bremen], since it is the closest antecedent according to the gold standard. The algorithm would thus only create one positive instance, i.e. the pair [die Arbeiterwohlf ahrt Bremen − ihre]. If there were any intervening mentions of other entities, these would be used to form negative instances.
When establishing coreference relations (right algorithm in table 2.1), the mentions are traversed from left to right and for each an antecedent (i.e. a preceding mention) is sought, again in a backward-looking manner (lines 2-7). That is, each mention is paired with preceding mentions (sorted by proximity) until a pair is classified as positive. This is called the closest-first heuristic. The best-first heuristic pairs a mention with all preceding mentions. The positive pair with the highest score then yields the antecedent
Chapter 2. Discourse processing models for coreference resolution 19
for the mention at hand. Once all mentions have been traversed, the positive pairs are transitively merged (line 8) to form the desired coreference chains.
Consider again our running example and [ihre] as mi. The algorithm would again
start to iterate the preceding mentions in a right-to-left manner and pair the pronoun with them to create the vector representations. Using the closest-first heuristic, if the classifier labeled the first pair [die Arbeiterwohlf ahrt Bremen − ihre] as positive, it would append the pair to the list of positive pairs and stop. For the best-first heuristic, the algorithm would also consider the pair [Staatsanwaltschaf t−ihre]. The pair labeled positive with the highest score would then be appended to the set of positive pairs. Despite using what Soon et al. called a shallow feature set3, the approach yielded com- petitive results compared to the mainly rule-driven coreference resolution approaches participating in the MUC-6 and MUC-7 coreference shared tasks (Hirshman and Chin- chor, 1998). Furthermore, the system has often been re-implemented and extended. For instance, most systems in the CoNLL shared tasks relied on a mention-pair architecture (Pradhan et al., 2011, 2012, p. 23; 22), despite its commonly known weaknesses, which we discuss next.
2.1.1 Issues of the mention-pair model
Research evolving around the mention-pair model has revealed several conceptual weak- nesses. The main issue of the model lies within its local confinement regarding the coreference decisions. All decisions are kept local during the iteration over the men- tions, i.e. no information is propagated to subsequent decisions. This runs counter to the transitive nature of the coreference phenomenon and yields several problems.
2.1.1.1 Underspecification of antecedent candidates
During resolution, the local confinement of the mention-pair model is prone to lead to inconsistent coreference sets when the pairs of coreferring mentions found locally are merged (Klenner and Ailloud, 2008, Raghunathan et al., 2010, Ng, 2010, Klenner and Tuggener, 2011a, inter alia). The merge operation (line 8 in the right algorithm in table 2.1) exploits the transitive nature of coreference: If mention A is coreferent with mention B, and mention B is coreferent with mention C, then A and C have to be coreferent. Thus, systems implementing the mention-pair model use the transitive closure to merge the local pairs [A − B] and [B − C] into a coreference chain [A − B − C].
3Here, we focus on the discourse models and leave out the discussion of feature sets. For an overview of a feature set like the one used in Soon et al., see section 4.2.
Chapter 2. Discourse processing models for coreference resolution 20
However, this approach suffers from underspecification of mentions in local contexts. For example, assume we have processed the following three mentions: [Bill Clinton], [Clinton], [she]. We have established the following positive pair-wise decisions:
[Bill Clinton − Clinton] and [Clinton − she]. The transitive closure will construct the following coreference chain: [Bill Clinton − Clinton − she], which is obviously inconsistent, since [Bill Clinton] and [she] are exclusive. However, since the [Clinton] mention is morphologically underspecified when viewed in isolation, it is a valid local antecedent candidate for the pronoun [she].
This is particularly problematic in combination with the morphological underspecifica- tion of certain German pronouns, e.g. sein (its/his). A classifier might label the following pairs as coreferent: [Berlin − sein], [sein − er] ([Berlin − its/his], [its/his − he]). This would yield the chain [Berlin − sein − er], since the incompatibility of [Berlin] and [er] is not evident to the greedy merge operation which only considers positive pairs and has no knowledge of negative evidence. As outlined in the introduction in section 1.2, this particular problem comprises one of the main interests of this theses.
2.1.1.2 Redundant instances and skewed training sets
A second major issue of the mention-pair model is the generally large number of instances and the imbalance of positive and negative ones. For training, the method for creating pair instances shown in table 2.1 leads to a skewed set, since the closest antecedent for a given mention can be quite far away. Collecting all intermediate mentions of other entities as negative instances yields many such negative instances. For example, Soon et al. (2001) reported that only between 4-7% of the instances in their training set were positive. This imbalance biases the trained classifier towards negative classification (Ng, 2010, inter alia), which can leave e.g. third person pronouns unresolved if no pair of antecedent candidate and pronoun is classified as positive (Hinrichs et al., 2005, Wunsch, 2010).
Since transitive coreference links between mentions are established after classification, i.e. during the merge step, the model yields many redundant instances. Consider that we have the two coreference chains: [Bill Clinton - Clinton - President - he] and [An- gela Merkel - Merkel - Chancellor - she] and we want to resolve the pronouns. Since all mentions of each coreference chain are generally accessible, the mention-pair model po- tentially pairs the pronouns with all of them, except for the first mentions that have clear morphological properties on their own (given the approach implements morphological agreement as a hard filter), as shown in figure 2.1.
Chapter 2. Discourse processing models for coreference resolution 21 he she Bill Clinton Clinton President Angela Merkel Merkel Chancellor
Figure 2.1: Example of redundant pairs formed by the mention-pair model for pro- nouns. Solid arrows denote pairs considered by the model, green ones positive instances and red ones negative instances. Dashed arrows signify coreference links invisible to
the model.
The mention-pair model could thus create five pairs for each of the pronouns, i.e. ten in total. All pairs but one per entity formed by the mention-pair model can be considered (at least implicitly) redundant, since they denote the same underlying entity. The pairs that denote morphologically incompatible entities can be regarded as irrelevant, since they should not be considered for resolution.