• No results found

Annotating Compositional Noun Phrases in Context

CHAPTER 4 : Semantic Containment in Compositional Noun Phrases

4.1 Annotating Compositional Noun Phrases in Context

In this section, we describe our methodology for annotating and analyzing modifier-noun compositions. Our focus is on characterizing modifier-noun (M H) compounds in a way that promotes better natural language inference by automatic systems. As discussed in Section 2.3, many factors contribute to the interpretation of an M H, including context, common sense assumptions, and cultural conventions. Rather than attempt to control for these confounding factors, we choose instead to embrace them and treat them as inseparable from the M H composition itself.

4.1.1. Focusing on Denotations vs. Focusing on Inferences

As we discussed in Section 2.3.2, there are two broad approaches to the study of natural language semantics. The first, commonly taken in linguistics, aims primarily to model the underlying denotations of words and phrases: e.g. where does the set of “imaginary cats” stand in relation to the set of “cats”? The second approach, predominate in NLP, aims primarily to make correct inferences about natural language statements. This latter approach is agnostic about the underlying representation of individual words beyond what is necessary to produce the right behavior in a given situation or on a given task. That is, the main concern from the point of view of an NLP system is not whether the set of “imaginary cats” is a subset of the set of“cats”, but rather: can we infer that a particular mention of“cat” is an “imaginary cat”? Or, relatedly, if we replace the phrase“imaginary

cat” with“cat” in a particular context, will it change the meaning of the utterance? In this thesis, we adopt this inference-focused approach. As a result, in our experimental design, rather then asking humans “Is any/every instance of M H an instance of H?” we instead ask “Is this statement that is true of M H also true of H?” We accept that this design openly conflates semantic inference with pragmatic reasoning, and that it prevents us from drawing conclusions about the underlying set theoretic relationship between the denotation ofM H and that of H. However, the benefit is that it enables us to explore the types of inferences that automatic systems will be expected to make in the “real world”. 4.1.2. Studying Composition through Atomic Edits

Our goal is to determine which of our five basic entailment relations, as defined in Sec- tion 2.4, is generated by composing M with H. To do this, we want to design a task for studying modifier-noun composition that is as simple as possible, while still capturing realistic complexities that exist in natural language inference. To the extent possible, we would like to isolate the effect of the modifier-noun composition on the meaning of the noun phrase. However, we want to avoid collecting annotations in the “laboratory” setting, for example by studying M H pairs out of context, or in contrived, overly-simplistic sentences (e.g. “Fido is a dog”). Our intention is to design a task that is not unnaturally easier or unnaturally harder than what is found in the real world. Thus, if humans exploit context in order to make inferences that may not be explicitly justified by formal reasoning, our automatic systems should learn to do the same.

We define a simplified RTE task, which is identical to the standard RTE task (Section 2.1.2) but has the additional constraint thatpand h differ only by the insertion of a single modifier. Specifically, if p = s, then h = e(s) where e = IN S(M) and M is a single modifier. To determine the relation generated by the modifier-noun composition for a given sentences, we must determine whether sentails or contradicts e(s) and similarly whether

2.1.2. That is, given a p/hpair, we must determine which of the three relationships holds:

p⇒h (entailment), p⇒ ¬h (contradiction), or p6⇒h (unknown). By determining the classification in both the forward (s → e(s)) and the reverse (e(s) → s) directions, we are able to determine which of the five basic entailment relations is generated by the insertion of the modifier in the chosen context (Table 24).

Equivalence ≡ s⇒e(s) e(s)⇒s

Forward Entailment @ s⇒e(s) e(s)6⇒s

Reverse Entailment A s6⇒e(s) e(s)⇒s

Independence # s6⇒e(s) e(s)6⇒s

Exclusion a s⇒ ¬e(s) e(s)⇒ ¬s

Table 24: Inference conditions used to determine which of the basic entailment relations is generated by the composition of M withH.

For example, if s=“She wore a dress” and e=IN S(“red”) then e(s) = “She wore a red dress”. In this case, sinces6⇒e(s) and e(s)⇒s, we can determine thatβ(e) isA.

4.1.3. Limitations of our Methodology

In the above-described simplified RTE task, we assume that the entailment relation that holds overall betweensande(s) is attributable wholly to the atomic edit (i.e. the inserted modifier). This is an over-simplification. In practice, several factors can cause the entail- ment relation that holds between the sentences overall to differ from the relation that is generated by the IN S(M) edit. For example, negation, quantifiers, or other downward- monotone operators can block or reverse entailments (“brown dog” entails “dog”, but “no brown dog” does not entail “no dog”). We make an effort to avoid selecting such sentences for our analysis (Section 4.2.1), but fully identifying and handling such cases is beyond the scope of this thesis. We acknowledge that downward monotone operators and other com- plicating factors (e.g. multiword expressions) are present in our data. However, based on manual inspection, they do not occur frequently enough to substantially effect our analyses.

4.1.4. Treating Entailment as a Continuum

Very often, humans draw conclusions about natural language based on “assumptions that seem plausible, rather than assumptions that are known to be true” (Kadmon (2001)). For example, given s and e(s) below, most readers would agree that, while it cannot be guaranteed thats⇒e(s), it seems artificially naive to says6⇒e(s)

s: A cat sitting on the ground looks out through a clear door screen.

e(s): A domestic cat sitting on the ground looks out through a clear door screen. While RTE has thus far always been treated as a discrete classification task by the NLP community (Section 2.1.2), systems are increasingly expected to make informal and proba- bilistic inferences like the one above (see Table 4 in Section 2.1.3). There is thus a strong case for treating entailment as a continuum rather than as a discrete classification. Doing so provides a clearer treatment for “edge case” inferences and is arguably better aligned with the way humans reason about language.

Therefore, when collecting humans annotations for the simplified RTE task just described in Section 4.1.2, we replace the hard three-way classification (entailment,contradiction, orunknown) with a softer 5-point scale in which 1 corresponds to definitecontradiction, 3 corresponds tounknown, and 5 corresponds to definiteentailment, but scores of 2 and 4 allow humans to specify likely (but not certain) contradiction and entailment, respectively. Allowing for weak judgments of probable entailments and contradictions allows us to more naturally capture inferences like that“cat”very likely entails“domestic cat” in the example above. When necessary, for example to interface with existing RTE systems, we collapse this 5-point scale to the standard three-way classification.