5.5 Error analysis
5.5.3 Candidate ranking performance
Finally, we analyze the quality of the ranking of the antecedent candidates in those cases where the entity-mention model selects an incorrect one. This analysis provides an additional insight into the ranking strength of our approach.
We analyse the rank index of the correct antecedent in cases where our approach selects an incorrect antecedent candidate by tracking the frequency of the different rank indices of the correct antecedent for incorrectly resolved pronouns.
DEVELOPMENT SET 2 3 4 5 6 0 50 100 150 200 190 36 8 2 2 Personal pronouns 2 3 4 5 6 0 20 40 60 70 17 1 1 2 Possessive pronouns TEST SET 2 3 4 5 6 0 50 100 150 200 187 32 10 4 3 Personal pronouns 2 3 4 5 6 7 8 0 20 40 60 63 11 2 3 2 0 1 Possessive pronouns
Figure 5.8: Rank index frequency of the correct antecedent for third person pronouns where the entity-mention model ranks an incorrect candidate highest. A lower rank
means a higher score for a candidate w.r.t. denoting the antecedent.
Figure 5.8 shows the rank frequency of the correct antecedent for such incorrectly re- solved personal and possessive pronouns. Note that the ranking is inversed, i.e. the lower the rank, the better the candidate, i.e. rank 1 denotes the selected antecedent. We see that in the cases where the entity-mention model selects an incorrect antecedent, the
Chapter 5. Empirical validation of our entity-mention model 132
correct one is ranked 2nd in 79.83% (development set; i.e. 190 of 238 total cases) and 79.24% (test set) of the personal pronoun instances and in 76.92% (development set) and 76.82% (test set) of the possessive pronoun cases. That is, selecting the second best candidate as antecedent in the cases where the first one is incorrect would reduce almost 80% of the errors made by the classifiers in the entity-mention model. Thus, re-ranking the top two candidates seems to be a fruitful notion. We explore such a re-ranking attempt based on distributional semantics in the next chapter.
5.6
Chapter summary
In this chapter, we empirically validated our incremental entity-mention approach for coreference resolution for German. We evaluated pronoun resolution in detail and ap- plied different evaluation measures and strategies that assess performance on different levels.
Throughout our evaluation, we substantiated the proclaimed theoretical advantages of our incremental entity-mention model over the commonly used mention-pair model with empirical evidence. In all our experiments, the entity-mention model outperformed the mention-pair competitor.
We introduced an extended feature set for German pronoun resolution and compared it to a standard feature set typically encountered in related work. Our extended set improved performance of the classifiers in both the mention-pair and the entity-mention model. In subsequent evaluation, the extended set did improve performance of the entity-mention model. However, for the mention-pair model, the extended features only improved resolution performance for possessive pronouns, while lowering the perfor- mance on personal pronouns.
Furthermore, we investigated and compared different machine learning frameworks that correspond to different antecedent selection methods. We introduced our own simple fea- ture weighting scheme which performed on par with the top ranking machine learning approaches. Overall, we found that the different machine learning approaches and the re- spective antecedent selection strategies did not show substantial performance differences in the top ranks. We found larger performance differences between the mention-pair and the entity-mention model.
In error analysis, we demonstrated that the mention-pair model does indeed produce coreference chains with inconsistent morphological properties due to underspecification of certain German pronouns. In our entity-mention approach, such errors are avoided.
Chapter 6
Semantics for pronoun resolution
In this chapter, we explore distributional semantics as a device to determine the degree of compatibility between an antecedent candidate and a pronoun’s context. As we saw in the analysis of error examples in section 5.5.2, our approach to ranking candidates, which is primarily based on the discourse salience of the candidates, sometimes selects an antecedent that is either incompatible or less compatible with the pronoun’s context than the correct antecedent.
In one of our previous examples, the salience-based approach selected the antecedent “people” for the pronoun “them” in the verb-argument tuple “collect them”, although the correct antecedent “donations” was accessible. Obviously, “donations” is a more likely candidate for the direct object slot of the verb “to collect” than “people”. Thus, selectional preferences of verbs are potentially beneficial to resolving pronouns. However, successfully incorporating these preferences into a real-world pronoun resolution system has proven to be notoriously difficult (Kehler et al., 2004, Wunsch, 2010, inter alia). We explore two frameworks to model compatibility of antecedent candidates and a pro- noun’s context, a co-occurrence graph and word embeddings. The co-occurrence graph estimates compatibility by traversing weighted co-occurrence paths between nodes that denote nouns and verbs. The word embedding model represents words as vectors and estimates compatibility of words based on the cosine of their vectors.
We expand on previous work by estimating not only the compatibility between the verb argument slot of a pronoun and the antecedent candidate, but by also taking into account the additional verb argument of the verb governing the pronoun, i.e. the syntactic co- argument of the pronoun in cases of (di-)transitive verbs. We first outline the rationale for exploring verb semantics for pronoun resolution and then present our models for doing so.
Chapter 6. Semantics for pronoun resolution 134
6.1
Pronominalization as a discourse phenomenon
From a discourse perspective, pronoun resolution can be viewed as the task of determin- ing which entity is most likely to be mentioned at a given point in discourse, i.e. when a pronoun is encountered. That is, the task of pronoun resolution is to assess which of the previously mentioned entities in the discourse is likely to be discussed at the very point of encountering a pronoun.
In general, pronoun resolution approaches, including our own, work by modeling the salience of discourse entities based on a set of features that captures relevant aspects of their occurrences, such as grammatical functions etc. If a pronoun is encountered, the most salient entity is chosen as antecedent.
However, these models do not exclusively model pronominalization, but provide a general model of entity salience in discourse.1 For example, given we have established the
salience record of entities in a particular discourse, we could point to any position within the discourse and blank out the subsequent sentence. Querying the salience model, we could then determine which entities are likely to be mentioned in the subsequent sentence, based on the salience configuration at the point we have chosen, without any hint regarding the specifics of the subsequent sentence. If we knew that there is a pronoun in the next sentence, we could determine the entity in the current sentence which is most likely to be pronominalized in the next sentence. Neglecting many important details, this can be argued to be the general model for pronoun resolution in the majority of approaches.
That is, from the perspective of pronoun resolution, there is an overlooked aspect in this model, i.e. the local context in which a pronoun occurs. The majority of our features are concerned with bookkeeping the salience of the discourse entities, regardless of particular pronouns and their respective, specific contexts. If a pronoun is encountered at a certain point in discourse, the most salient entity at that point is looked up in the salience record and chosen as antecedent, ignoring the specific context that surrounds the pronoun. This is somewhat surprising, given that the antecedent of a pronoun has to be compatible with the context surrounding the pronoun, because, compared to its nominal antecedent, the pronoun is simply an altered linguistic manifestation of the same underlying entity. Therefore, it can be argued that the pronoun’s context itself emits certain expectations regarding the antecedent. Consider the following example:
(10) Er He
bellt. barks.
Chapter 6. Semantics for pronoun resolution 135
Although we are not given any discourse history of entities and their salience, we auto- matically infer, based on the selectional preference of the verb bellen, that Er is likely to refer to something canine-like. This likelihood is ignored in the purely salience-based model.
Thus, there exists a line of research dating back to the pioneering era of automated pronoun resolution that has attempted to incorporate the context surrounding the pro- noun in the antecedent selection process.2 The main road that researchers have taken
in this direction is to represent the context of a pronoun by the verb that governs the pronoun. The selectional preferences of that verb are then taken to rank the antecedent candidates.