• No results found

Addressing sparsity

6.2 A graph-based representation of verb-argument tuples

6.2.2 Addressing sparsity

We still face the sparsity problem, since we cannot expect to see all permissible verb- argument combinations even in a large corpus. In Tuggener and Klenner (2012), we proposed an approach that uses non-negative matrix factorization to estimate counts of unseen verb-argument combinations. Here, we explore three simpler approaches that are expressed naturally in the graph representation of word co-occurrences.

6

Taking the sum of all counts in the denominators would yield 1 2∗ (

261 300+

261

14738) = 0.4, which is rather low. We want combinations that are intuitively strong, like dog being the subject of bark to have a high compatibility, i.e. close to 1. Taking the max count in the denominator gets us closer than taking the sum. Also, the Jaccard and DICE coefficients yield rather low scores for the dog, bark, subj combination, i.e. Jacc = A∩BA∪B = 261

300+14738 = 0.0174 and DICE = 2∗|A+B|

|A|+|B| = 2∗261

Chapter 6. Semantics for pronoun resolution 142

6.2.2.1 Estimating compatibility through distributional siblings

First, we compare the seen argument fillers (i.e. the nouns args in a grammatical slot gfk for a verb vj) with the antecedent candidate noun under scrutiny, ai, and select the

compatibility of the argument arg ∈ args that is most similar to ai as the compatibility

rating of ai and the pronoun verb vj in slot gfk.

comp(vj, ai, gfk) ≈ comp(vj, arg, gfk)

where arg = arg max

args sim(ai, arg, gfk)

(6.7)

To determine the verb argument most similar to the antecedent candidate noun at hand, we need a similarity measure of nouns w.r.t. a specific grammatical relation, i.e. sim(·, ·, ·). For our purposes, we want to define noun similarity in relation to the verbs they occur with, since we want nouns to be of high similarity if they frequently occur with the same verbs w.r.t. the grammatical function of interest.

For example, suppose we have not seen “banana” as the direct object of “to eat”. We then want to find the direct object of “to eat” which is most similar to “banana”, given the grammatical relation direct object, e.g. “apple”. To achieve this, we model noun similarity based on second-order co-occurrence with verbs. As mentioned, we want nouns to be of high similarity if they frequently occur in the same grammatical argument slots of the same verbs.

A straight-forward approach for this purpose is to consider the number of verbs that the two nouns co-occur with in the given grammatical relation. This second-order co- occurrence then serves as the basis for calculating the distributional similarity of the nouns.

We calculate the similarity of two nouns ni, nj given a set of verbs v and an argument

slot gfk, as the ratio of verbs that they share as first-order co-occurrences divided by all

their individual first-order co-occurrences w.r.t. gfk, i.e.:

sim(ni, nj, gfk) = |∀v ∈ V | : |(v, ni, gfk)| > 0 ∧ |(v, nj, gfk)| > 0 |∀v ∈ V | : |(v, ni, gfk)| > 0 + ∀v ∈ V | : |(v, nj, gfk)| > 0 (6.8)

where the numerator counts the verbs that the two nouns share as first-order co- occurrences and the denominator counts all first-order co-occurrences of the two nouns.

Chapter 6. Semantics for pronoun resolution 143

In the graph representation, this corresponds to the count of verb nodes that the nouns share as neighbors, divided by the total number of neighboring nodes the two nouns connect to, given a specific grammatical relation. Figure 6.2 shows this overlap of first- order co-occurrences in the graph. Here, we determine similarity of Hund (dog) and F uchs (f ox) given the grammatical relation subject. The similarity of the two nouns is depicted by the ratio of nodes that both nouns connect to (in the center of the figure) divided by the total number of verb nodes they connect, given the subject relation. That is, the more verb nodes they share as neighbors, the more similar the two nouns are.

stehen gehen sterben fressen liegen reagieren fangen bringen vorkommen gehören schleichen verlassen tragen befinden brauchen verschwinden eindringen sehen beißen wissen laufen reißen Fuchs finden auftauchen tun suchen lassen leben halten Hund zeigen gelten bellen kommen holen sagen machen lernen bleiben bekommen 5 167 8 173 5 112 154 7 198 131 5 111 5 187 6 5 146 5 251 5 5 6 122 259 5 8 271 5 8 5 6 10 14 5 5 22 5 28 11 150 128 141 166 204 129 261 440 300 269 214 253

Figure 6.2: Excerpt of the co-occurrence graph, showing second-order co-occurrence of Hund and F uchs in subject position. Numbers on edges denote absolute counts.

However, there are nodes (verbs) that the nouns more strongly associate with than others. In the previous section, we have defined an association measure for nouns and verbs, i.e. comp(noun, verb, gram. f unct.). We can use this measure in our similarity measure to weight the importance of the nodes that two nouns share. That is, a shared node that both nodes strongly associate with should have more impact on the similarity measure than a node with lesser association strengths w.r.t. the nouns. Thus, we replace the counts in equation 6.8 by the sum of the compatibility scores:

Chapter 6. Semantics for pronoun resolution 144 sim(ni, nj, gfk) = P v∈V :|(v,ni,gfk)|>0∧|(v,nj,gfk)|>0comp(v, ni, gfk) + comp(v, nj, gfk) P v∈V :|(v,ni,gfk)|>0comp(v, ni, gfk) + P v∈V :|(v,nj,gfk)|>0comp(v, nj, gfk) (6.9)

where the numerator simply sums all edge weights (where the weights are calculated by comp(·, ·, ·)) from the nouns to the shared verbs, and the denominator sums all edge weights that the two nouns are connected to.

For our sparsity problem, where we have not seen a specific noun-verb combination, we can now identify the seen noun argument of the verb that is most similar to our target noun, based on equation 6.7.

Returning to our previous example, where we would like to estimate the compatibility of “banana” as the direct object (obja) of the verb “to eat” and where we have not seen the combination, we first identify all first-order co-occurrences of the verb “to eat” with the grammatical relation direct object, i.e. args. Then, we calculate sim(banana, arg, obja) for all these first-order co-occurrences arg ∈ args. The arg most similar to “banana”, say “apple” with a similarity of 0.45, then serves as the distributional sibling of “banana”, and we can take compatibility score comp(apple, eat, obja), which is 0.55, as the score for “banana”. However, since we have not seen “banana” as the direct object of “to eat” we want to exercise caution in taking over the score of “apple”. While in our example the similarity between the target noun and its distributional sibling is obvious, we might identify siblings that are less similar to the target noun. Therefore, we multiply the compatibility score of “apple” as the direct object of “to eat” with the similarity between “apple” and “banana”. This product then serves as the final compatibility score of the unseen pair:

comp(vj, ai, gfk) ≈ comp(vj, arg, gfk) ∗ sim(ai, arg, gfk)

where arg = arg max

args sim(ai, arg, gfk)

(6.10)

That is, if the target noun and its sibling are very similar, the compatibility score of the sibling will not be lowered significantly, but it will decrease with increasing dissimilarity between the target noun and its sibling.

Chapter 6. Semantics for pronoun resolution 145

6.2.2.2 Similarity to nbest arguments

As a second measure for compatibility between an antecedent candidate noun and a verb governing a pronoun at hand, we measure the similarity of the candidate noun to the n most strongly associated arguments of the verb in the grammatical function slot of the pronoun. We determine n to be the 10 most strongly associated arguments if there are more than 100 seen arguments in the specific grammatical slot of the verb. If there are less than 100, we take the top 10% of the arguments (three at least).

In our example, we would measure similarity of “banana” to the n most strongly associated direct objects of “to eat”, i.e. Fleisch (meat), Brot (bread), Obst (fruit), Gem¨use (vegetables), Eis (ice), Kleinigkeit (snack), Pizza, Schokolade (chocolate), Mit- tag (lunch), Salat (salad). The average similarity then serves as the compatibility score, in this case 0.41.

6.2.2.3 Compatibility of verbs

Thirdly, we measure the similarity of the verb governing the antecedent candidate and the verb governing the pronoun w.r.t. their grammatical functions, i.e. sim(vi, gfk, vj, gfl).

Since our similarity score is based on shared arguments, it can be interpreted as a mea- sure of how likely it is to see an argument of the antecedent verb as an argument of the pronoun verb. Assume for example the following sentences, where we want to resolve the last pronoun, i.e. sie41:

(12) Sie1 schenkt ihm eine Banane2. Er sch¨alt die Banane3

1 und isst sie41.

She presents him with a banana. He peels the banana1 and eats it1.

Our similarity measure between verbs is geared at estimating how likely it is for a noun that is peeled (Banane31 is the direct object of “to peel”) to be eaten (sie41 is the direct object of “to eat”) versus how likely it is for someone that presents someone with something (Sie1 is the subject of “to present”) to be eaten etc. In

this case sim(peel, obja, eat, obja) = 0.14. By contrast, subjects of “to present” are less likely to be eaten, i.e. sim(present, subj, eat, obja) = 0.04. Also, nouns that are presented to someone are less likely to be eaten than nouns that are peeled, i.e. sim(present, obja, eat, obja) = 0.10. Thus, the compatibility of the verb governing the antecedent candidate and the verb governing the pronoun can be an additional cue to identify the correct antecedent.

Chapter 6. Semantics for pronoun resolution 146