Annotating Factive Verbs

(1)

Annotating Factive Verbs

Alvin Grissom II, Yusuke Miyao

National Institute of Informatics, Japan

Hitotsubashi 2-1-2, Chiyoda-ku, Tokyo 101-8430 JAPAN [email protected], [email protected]

Abstract

We have created a scheme for annotating corpora designed to capture relevant aspects of factivity in verb-complement constructions. Factivity constructions are a well-known linguistic phenomenon that embed presuppositions about the state of the world into a clause. These embedded presuppositions provide implicit information about facts assumed to be true in the world, and are thus potentially valuable in areas of research such as textual entailment. We attempt to address both clear-cut cases of factivity and non-factivity, as well as account for the fluidity and ambiguous nature of some realizations of this construction. Our extensible scheme is designed to account for distinctions between claims, performatives, atypical uses of factivity, and the authority of the one making the utterance. We introduce a simple XML-based syntax for the annotation of factive verbs and clauses, in order to capture this information. We also provide an analysis of the issues which led to these annotative decisions, in the hope that these analyses will be beneficial to those dealing with factivity in a practical context.

Keywords:factivity, annotation, semantics

1. Introduction

This paper introduces our work on building an annotated corpus which captures the linguistic notion offactivityand addresses some of the challenges inherent to the task. Presuppositions are facts assumed to be true at the time of an utterance, and factivity is a kind of presupposition that a part of an utterance is presupposed to be true. There are many ways in which factivity can be realized; here, we fo-cus specifically onfactive verbs, which are verbs that take a clause as a complement and introduce a presupposition that the complement clause is true.

Inherently, factivity is an indispensable factor for the recog-nition of textual entailment, factuality, etc. However, to our knowledge, there are no corpora which annotate factivity in quite this way, while there are other kinds of semantically annotated corpora, such as PropBank and FactBank. We devise an annotation scheme which captures the relation-ship between a factive verb, its complement clause, and the degree to which it is factive. This work constitutes a first step towards the recognition of various kinds of factivity in real-world texts.

2. Problem

In the field of pragmatics, the notion of presupposition (Kiparsky, 1970; Beaver, 2001; Gazdar, 1979) has been studied extensively in linguistics literature. Presupposition is defined as a fact which is assumed to be true at the time of an utterance, but not explicitly mentioned. In natural language text, this information is implicit in the text, but not explicitly enumerated. Extracting presuppositions from text allows for the extraction of implicit information, po-tentially valuable for textual entailment systems.

(1) He reached the third door on the left.

Presupposition: There is a first door and a second door on the left.

(2) Cain does not work at a bank anymore. Presupposition: Cain once worked at a bank.

Presuppositions differ from other kinds of entailments in some interesting ways; for instance, presuppositions are be-lieved to be blocked by certain linguistic phenomena, such as negation, though there are known complications (Heim, 1990).

This work is focused on factive verbs and their comple-ments, a specific type of presupposition. When verbs func-tion factively, they presuppose the truth of the complement of the verb.

(3) • a. I regret [that] I came to school late.

• b. I came to school late, I regret.

⇒I came to school late.

(4) I propose [that] I came to school late.

In example (3) a., the complement I came to school late

is assumed to be true. Regardless of whether the person speaking is truthful, the presupposition is projected lexi-cally byregret. This is also true of the syntactic variant, (3) b. The same cannot be said of (4), however, since some-thing proposed is not necessarily somesome-thing done.

In the computational linguistics literature, there has not, to our knowledge, been much concentrated research into the notion of factive verbs and factivity annotation or classifica-tion. Factivity is largely determined lexically. For example, some verbs, such as emotive verbs (e.g., love, hate, regret) are nearly always factive (Giannakidou, 2006). However, a complication in this task is that factivity is sometimes not only lexically determined but also dependent on its syntac-tic realization, context, etc. Indeed, we have observed that, while there appears to be a factivity continuum – that is, some verbs are almost always used factively, while others never are – verbs may be coerced in certain situations to behave in counterintuitive ways, as is noted in the literature (Nairn et al., 2006).

Complicating annotation efforts somewhat are ambiguities.

(2)

(3)

(4)

of factivity, and 17 instances of non-factivity (e.g., hope,

think), out of the 800 test-hypothesis pairs.

To determine ambiguous cases, such as coercion, we are currently relying on intuition, but we believe that there may be ways of quantifying this, described in the section on fu-ture work. In addition, there are syntactic methods for de-tecting certain kinds of ambiguity, but they are beyond the scope of this paper. At present, tags marking ambiguity are essentially warnings, without much specific informa-tion. Since our scheme is XML-based, it is extensible with-out compromising backward compatibility, and some of the details of the attributes may vary based on what is deemed useful for a particular corpus, such as the broad category of “claim.” Based on our experience with the RTE-2 corpus, in general, the kinds of claims that warrant special atten-tion with regard to factivity are, at least, speech actsand

authoritativeclaims.

4. Related Work

There is a natural relationship between factivity and event extraction, as often, but not always, facts presupposed ex-plicitly refer to events. Pustejovsky et al. (Pustejovsky et al., 2003) discussTimeBank, which annotates events, times, and temporal relations, and events may be labeled for fac-tivity. FactBank (Saur´ı and Pustejovsky, 2009), an addi-tional layer atop TimeBank, annotates “factuality” of the clauses. The key difference between factuality and factiv-ity is this: factivfactiv-ity refers to a fact that is presupposed in an utterance, independent of the veracity of the utterance in the real world; factuality refers such veracity. While factive usages do generally betray a certain level of commitment to the factuality of a proposition, we are not directly con-cerned with factuality. Prasad et al. (Prasad et al., 2006) describe an annotation scheme for “abstract objects” (e.g., propositions) in the Penn Discourse Treebank, designed to capture, among other things, the degree of factuality of the objects.

Factivity has been considered an important factor in tex-tual entailment recognition, although this problem was not studied intensively in natural language processing litera-ture. Nairn et al. (Nairn et al., 2006) presented a detailed analysis of the notion of “relative polarity,” which concerns the relationship between the factuality of a matrix clause and its complement clause. This literature discusses factive verbs as a part of relative polarity, although it is primar-ily concerned with theoretical considerations and does not provide resources or tools for factive verbs. Clausen and Manning (Clausen and Manning, 2009) used lexical fac-tivity triggers in a tree structure for natural language in-ference. They note that lexically triggered presuppositions can expand the information available for solving a partic-ular problem without deep semantic interpretation. Their goal was neither annotation nor recognition of factivity, but inference based on prior knowledge of the factivity or an-tifactivity of a given verb. We believe that proper annota-tion of factivity, and presupposiannota-tion in general, will bene-fit these areas of research. Note that (Garoufi, 2007), in a study of textual entailment datasets, found that counter-factivity occurred very infrequently. We do not address it in our current annotation scheme; it is subsumed under the

notion of non-factivity.

5. Conclusion

We have devised a scheme for annotating factive verbs in reworld texts. Sufficiently developed corpora should al-low natural language processing researchers to develop new means of factivity detection and allow linguists to research these phenomena quantitatively. Following this scheme, the next step will be to account for other kinds of factivity, be-yond verbs and their complements.

6. Future Work

We would like to use this for textual entailment, as the be-havior of the entailments is predictable if one can determine the (non)factivity of an expression. In addition, we have done some preliminary work on automatically identifying factive verbs from a large corpus. We used UKWAC (Fer-raresi et al., 2008), a 2 billion word English corpus from the web, to search for certain syntactic patterns that we be-lieved were likely to yield a (non)factive verb. To do this, we created a program to search for syntactic patterns in a large corpus, and computed the odds that each verb would occur in the “factive” or “non-factive” category. The crite-ria used for determining these syntactic patterns are beyond the scope of this paper, but the following shows some fun-damental idea. By searching, for example, for a past par-ticiple, followed by any verb, and then the word that, we found that the verb in question was often factive.

• VHZ/have VVN/reportIN/that

• VHZ/have VVN/noticeIN/that

• VHZ/have VVN/suggestDT/that/

• VHZ/have VVN/concludeIN/that

Likewise, the following are also from our “factive” syntac-tic category:

• NN/industry MD/should VV/recogniseIN/that/

• NP/Microsoft MD/should/ VV/acknowledgeIN/that/

• NN/paragraph/ MD/should/ VB/beIN/that/

• NN/evidence MD/should/ VV/showIN/that/

The verbs yielded are not perfect – nor should be expect them to be – but they do tend to skew toward factivity or non-factivity. We have begun experimenting with measur-ing these occurrences of verbs in these syntactic frames statistically by making assumptions concerning which cat-egories are factive and which are not, as we believe that interesting and potentially useful linguistic insights may be gleaned from such an analysis.

7. References

(5)

R. Bar-Haim, I. Dagan, B. Dolan, L. Ferro, D. Giampic-colo, B. Magnini, and I. Szpektor. 2006. The second PASCAL recognising textual entailment challenge. In

Proceedings of the second PASCAL challenges workshop on recognising textual entailment, pages 1–9. Citeseer. D.I. Beaver. 2001. Presupposition and assertion in

dy-namic semantics. CSLI publications.

D. Clausen and C.D. Manning. 2009. Presupposed con-tent and entailments in natural language inference. In

Proceedings of the 2009 Workshop on Applied Textual Inference, pages 70–73. Association for Computational Linguistics.

A. Ferraresi, E. Zanchetta, M. Baroni, and S. Bernardini. 2008. Introducing and evaluating ukwac, a very large web-derived corpus of english. InProceedings of the 4th Web as Corpus Workshop (WAC-4) Can we beat Google, pages 47–54.

K. Garoufi. 2007. Towards a better understanding of ap-plied textual entailment.

G. Gazdar. 1979. Pragmatics: Implicature, presupposi-tion, and logical form. Academic Press New York. A. Giannakidou. 2006. Only, emotive factive verbs, and

the dual nature of polarity dependency. Language, 82(3):575–603.

I. Heim. 1990. Presupposition projection. In Reader for the Nijmegen Workshop on Presupposition, Lexical Meaning, and Discourse Processes, Univ. of Nijmegen. P. Kiparsky. 1970. Semantic rules in grammar.The Nordic

languages and modern linguistics, pages 262–285. R. Nairn, C. Condoravdi, and L. Karttunen. 2006.

Com-puting relative polarity for textual inference. In Proceed-ings of ICoS-5. Citeseer.

R. Prasad, N. Dinesh, A. Lee, A. Joshi, and B. Webber. 2006. Attribution and its annotation in the Penn Dis-course Treebank. Traitement Automatique des Langues, 47(2):43–63.

J. Pustejovsky, P. Hanks, R. Sauri, A. See, R. Gaizauskas, A. Setzer, D. Radev, B. Sundheim, D. Day, L. Ferro, et al. 2003. The TimeBank corpus. InCorpus Linguis-tics, volume 2003, page 40.

R. Saur´ı and J. Pustejovsky. 2009. FactBank: a corpus an-notated with event factuality. Language Resources and Evaluation, 43(3):227–268.