• No results found

Distinction Based on Specific Events or Predicates

In contrast to the Meyers et al. (1994)’s approach, the lexical resources PropBank and FrameNet approach the issue of argumenthood by considering verbs or event frames on an individual basis. The approach is not precisely to establish a set of detailed criteria that will apply to all instances in the data. Rather, for both PropBank and FrameNet, the goal is to come up with semantic frames for verbs, which will allow for consistent annotation.

4.2.1 English PropBank

The English PropBank, is a semantic resource that was developed for the task of semantic role labeling (SRL), which seeks to automatically identify and extract the different semantic relation- ships between words in a given text (Palmer et al., 2005; Kingsbury and Palmer, 2003). Resources that PropBank provides include two things: annotated data and a repository of lexical informa- tion. PropBank’s annotated data includes various types of text including newspaper, magazine, and weblog texts. The data is annotated with the lexical repository that defines verbs’ argument structures.

It is in the definitions of argument structures that the argument-adjunct treatment is found. The definitions of verbs are contained in what is called a Frame File that can include one or more Frameset for a given verb, and each Frameset corresponds to a different subcategorization frame of the verb. Here is an example:

Frameset id: give.01, transfer

Arg0: giver

Arg1: thing given

Arg2: entity given to

structure. For the verb give, the argument structure includes the numbered arguments (i.e. ARG0, ARG1 and ARG2) as described in the Frameset. The Frameset id makes a distinction between this use of the verb as in “John gave cookies to Mary” and the verb in “I am given to wonder what so great about it!”, which has a different argument structure, and this is assigned a different Frameset id leave.01. Thus, the annotation of the sentence “John gave cookies to Mary” would look like the following. With respect to the verb leave, John would be annotated as the ARG0, cookiesas ARG1 and for Mary as ARG2 according to the transfer Frameset.

[John]-ARG0 left [cookies]-ARG1 [for Mary]-ARG2.

In addition to the numbered arguments, PropBank defines modifier labels starting with ARGM. Accompanying the label ARGM will be one of the many function (semantic) labels that describe the modifier (e.g. TMP for temporal, LOC for locative). For example, in a sentence such as “John left cookies for Mary on Thursday”, on Thursday would be labeled as ARGM- TMP, with the temporal function tag, TMP, which defines the semantics of the modifier. Thus, the argument and adjunct distinction to a large extent lines up with the numbered arguments and the modifier labels.

However, numbered arguments do not always correspond to the general linguistic analysis of arguments. Consider the following Frameset and the corresponding examples:

(69) raise.01:‘go up quantifiably’ Arg0: Agent

Arg1: Thing rising

Arg2: Extent, amount risen Arg3: Start point

Arg4: End point

a. [John]-ARG0 raised [his hand]-ARG1 [to ask a question]-ARGM-PRP.

b. [Oil production]-ARG1 raised [by 11k barrels/day]-ARG2 [to 321k barrels]-ARG4.

quently, the likely analysis of the arguments for the verb would include the raiser, the AGENT

of the action, and the raised entity, theTHEMEof the action. However, PropBank includes three other participants for the verb. The reason behind this is that PropBank relies on the frequency of occurrence in a corpus to determine whether or not a constituent should be considered an ar- gument of the verb. The following is a statement found in the guidelines for PropBank Frameset creators or ‘framers’.

The problem of distinguishing arguments from modifiers is not an easy one. There are different syntactic, semantic, and other types of criteria which could be used, however, they do not usually agree on where to draw the line. Our approach is a practical one: a semantic role is being marked as an argument, if it frequently occurs in a corpus and is specific to a particular class of verbs. (Babko-Malaya, 2005)

Since the framing and annotation of the example (69) was based on the Wall Street Journal and the sentences such as (b) occur more frequently, PropBank frame creators included the extent, start and endpoint of the rising event. Moreover, PropBank frames are not necessarily the same across semantically similar types of verbs. Consider the following example:

(70) push.01:‘(try to) cause motion’

[John]-ARG0 pushed [the box]-ARG1 [into the hallway]-ARG2.

(71) kick.01:‘drive or impel with the foot’

[John]-ARG0 kicked [the box]-ARG1 [into the hallway]-ARGM-DIR.

In (70), the PP into the hallway is considered an argument of the verb push, while that is not so for the verb kick in (71). Given that the semantics of both PPs are implicit in the semantics of the verb (e.g. *John [pushed / kicked] the box into the hallway), and that both sentences behave in the same manner given the do so test (e.g. John [pushed / kicked] a box into the hallway, and Mary did so into her room), regardless whether they should be both adjuncts or both arguments, it stands to reason that the verbs should be treated in the same way in its argument structure. The likely explanation of the discrepancy between the two Framesets is that given the corpus the

framers were working with, likely push was far more often in found in company with the PATH OF MOTIONthan kick was.

Thus, PropBank treats the argument and adjunct distinction somewhat uniquely. Unlike Mey- ers et al. (1994), the PropBank does not seek to develop general criteria to distinguish between arguments and adjuncts. Rather, its task is to identify the semantic participants of an individual verb and describe its subcategorization frame using the frequency of occurrence with the verb to decide if an argument is numbered or not.

There is one other noteworthy point. As one can see, numbered argument labels are highly coarse grained. That is, they do not specify the semantics related to the identity of that argument, so there is no good way of telling what the thematic role of the label would be without specific examples. The effect of having a coarse grained label was evaluated in a paper by Yi et al. (2007). In this paper, the authors found that while ARG0 and ARG1 were fairly consistentlyAGENTand

PATIENT, respectively. However, they also noted that the thematic roles of the higher numbered arguments (i.e. ARG0-ARG4) varied widely. In an effort to see the effect of introducing more fine grained distinctions amongst PropBank labels, Yi et al. (2007) introduced to the numbered arguments thematic labels from VerbNet, a lexical resource containing more fine grained labels. Yi et al. (2007) observed an increase in the performance of automatic systems when PropBank’s ARG2 labels were paired up with thematic labels that more narrowly identified the semantics of the argument.

As a followup to this finding, PropBank is now in the process of introducing thematic infor- mation into ARG2 to ARG5 labels. Specifically, for all numbered arguments on or above 2, the numbered argument are being paired with the function or semantic tags found in ARGMs. For example, given the verb raise, we would now see:

(72) raise.01:‘go up quantifiably’ Arg0: Agent

Arg2-EXT: Extent, amount risen Arg3-DIR: Start point

Arg4-DIR: End point

This generally coincides with Dowty’s analysis. Instead of just providing numbered ar- gument labels, they would be paired with or modifier function tags or, effectively, adjunct-like information. Perhaps with a different process, it allows for the exactly same capability Dowty suggests: the NLP community has the ability to do both argument and adjunct semantics over the same data and same labels.

4.2.2 FrameNet

Much like PropBank, FrameNet is a lexical resource that provides semantic representations of predicates, including verbs, nouns, and adjectives. In FrameNet, each sense of a word is paired with a semantic frame, which is defined as a schematic representation of event or situation and is described though the use of participant roles called frame elements (FEs) (Fillmore and Petruck 2003, p.359; Ruppenhofer et al. 2010, p.5). An FE can either be a considered core or non-core to the semantic frame. For example, currently there are 35 verbs and nouns whose senses are associated with the semantic frame Ingestion, which, in turn, is described by two core FEs,

INGESTORandINGESTIBLE, and nine non-core FEs includingINSTRUMENTandMANNER19. Closer in concept to PropBank than COMLEX Syntax, FrameNet does not make the argu- ment and adjunct distinction on the basis of a pre-established set of semantic and syntactic criteria that apply evenly to all predicates. Rather, the notion of argumenthood and argument structure is approached through the concept of semantic coreness of the participant roles involved in a semantic frame.

A core FE “instantiates a conceptually necessary component of a frame, while making the

19For the purposes of this paper, semantic frames will be in Courier font, core FEs will be in

BOLDED SMALL CAPSand non-core FEs will be inREGULAR SMALL CAPS.

frame unique and different from other frames” (Ruppenhofer et al., 2010, p.19) and is considered essential to the meaning of a semantic frame (Fillmore and Petruck, 2003, p.359). Core FEs are thought to have a certain set of properties. Core FEs generally have to either be always expressed overtly expressed or implicit in the interpretation when omitted. Consider the following examples(Ruppenhofer et al., 2010, p.19-20) that invoke the Ingestion frame:

(73) [Mary]-INGESTORdevoured [the soup]-INGESTIBLE[with a spoon]-INSTRUMENT. (74) [John]-INGESTOReats [her lunch]-INGESTIBLE[at school]-PLACE[everyday]-TIME. (75) [Marc]-INGESTOReats [every three hours]-TIME.

The semantic frame Ingestion identifies the INGESTOR and INGESTIBLE as they are both necessary participants in the act of devouring or eating. Even when theINGESTORis not explicit in the sentence (75), it still has to be understood from the context. FEs that are not explicit in the sentence, but nevertheless implicitly or conceptually instantiated as a participant FE (i.e. if present would fill a semantic role), are considered to be null instantiated elements, they are recognized and annotated accordingly. For example, sentence (75) would be annotated as the following:

(76) [Marc]-INGESTOReats [every three hours]-TIME[INI20INGESTIBLE].

A non-core FE, also called a peripheral, does not characterize a frame uniquely like a core FE does, nor does it “introduce additional, independent or distinct events from the main reported event” allowing it to be “instantiated in any semantically appropriate frame” (Ruppenhofer et al., 2010, p.20). Moreover, all of the circumstantial modifiers such as TIME, PLACE, and MANNER

20INI stands for Indefinite Null Instantiation, instantiation of an FE. It is to be distinguished from the Definite Null Instantiation (DNI). DNI is an FE whose identity is fully recoverable from the context ( e.g. Stephanie contributed $20, whereRECIPIENT, a charitable organization, is the missing FE.). INI is an FE whose identity itself is not recov- erable from the context but the type of the FE is available (e.g. objects of transitive verbs that are used intransitively) (Fillmore and Petruck 2003, p.360; c.f. Ruppenhofer et al. 2010, p.24-26 for complete discussion of null FEs).

are strictly considered non-core FEs in any given frame. In examples (73)-(75), INSTRUMENT, TIME, andPLACE are defined by the frame as instantiating non-core FEs.

Thus, generally speaking, the concept of argument coincides with FrameNet’s core FE and that of adjunct coincides with the non-core FE. However, it is not always the case that all of FrameNet’s core FEs necessarily will be ones that would be classified as arguments in the general linguistic literature. Here is an example of the frame Practice:

(77) [Mary]-AGENTpracticed [piano]-ACTION[for John’s wedding]-OCCASION. (78) [Mary]-AGENTpracticed [typing]-ACTION[for a raise]-PURPOSE.

Generally speaking for verb practice, two arguments are often specified: the agent of prac- tice and the practiced action. In addition to the two arguments, FrameNet specifies a third core FE – theOCCASIONfor the practice as seen in example (77). This adverbial phrase would be con- sidered a modifier, hence an adjunct, in general linguistic literature since it expresses the purpose of an action21. However, as seen (78), the frame Practice also includes a non-corePURPOSE

FE, which tells us that FrameNet makes a finer distinction between the two types of phrases22.

In addition to the core and non-core FEs, there is a third FE called an extra-thematic FE, which on the whole is unique to FrameNet. Extra-thematic FEs are defined as FEs that introduce “additional, independent or distinct” events and situations that do not belong in the main semantic frame (Ruppenhofer et al., 2010, p.20,97). In other words, extra-thematic FEs acts as elements that combine one frame with another frame in such a way that an external frame is embedded into the main frame or an external frame is used to describe one of the FEs in the main frame. In the following examples, only the extra thematic FE is labeled and the verb is italicized:

21According to the do so test, the phrase is an adjunct: John practiced piano for Mary’s wedding, and Marc did so for his recital.

22Interestingly, PropBank leaves both types of purpose phrases to ARGM (i.e. ARGM-PRP). Also, in addition to the agent (ARG0) and theme of practice (ARG1), it also adds anINSTRUMENTas a third argument (ARG2) for sentences like “[Mary]-ARG0 practiced [the sonata]-ARG1 [on her piano]-ARG2 [for John’s wedding]-ARGM-PRP.

(79) The ferry that Kenneth was on was hijacked [twice]-ITERATION.

(80) The mousse was chilled [around the edges]-SUBREGION.

(81) Someone even mowed the lawn [for Camile]-BENEFICIARY. (82) Every pet I’ve had has croaked [on me]-MALEFICIARY.

(83) Pat put the butter in the fridge [along with the margarine]-COPARTICIPANT.

In example (79), the FE twice describes the number of times the event instantiated by the frame Piracyoccurs (Ruppenhofer et al., 2010, p.20,99). In example (79), the extra-thematic SUB-

REGIONfocuses on a subregion of one of the participants that the predicate specifically describes (i.e. the edges of the mousse was chilled). In the case of (81) and (82), the extra-thematic FE de- scribes the beneficiary (i.e. benefiting entity) or the maleficiary (i.e. harmed entity) on which the semantics of the frame acts. Finally in (83), the extra-thematic indicates a co-participating entity in the Placing frame. That is, the FE is not a part of the Placing frame. It simply combines the frame with another conceptual frame in which there is a mention of a second participant.

4.2.3 Summary

For both PropBank and FrameNet, the aim is to identify the proper thematic roles that are necessary in the event or state described by the verb. PropBank relies on frequency of occurrence to decide which constituents should be counted as numbered arguments, and FrameNet decides on the conceptual elements a semantic frame needs to identify, i.e., the core FEs. In both cases, rather than relying on a set of criteria for the identification of the arguments, the task is done on a verb by verb or frame by frame basis.

The strength of this approach is that, for any given verb of a particular usage there is a good chance that the framed arguments are correct. Also, both resources will be able to account for the unique and unusual usages of predicates, provided that there are frames associated with the predicates. The downside of this approach is it is less generalizable across verbs as each new instance of a verb or frame would have to be evaluated in its own right.

5

Conclusion

In this paper, we have analyzed the current status of argument and adjunct distinction in the linguistics literature and explored how thisdistinction is handled in NLP resources. Ideally, the best solution for any NLP application would be to have a specific set of features or criteria that makes a clean distinction between constituents that are arguments and those that are adjuncts. However, it is clear that there is no one set of criteria that suffices for distinctions across all verbs. If Dowty is actually correct in that all elements in the sentence can be given both an argument and an adjunct analysis, it would suggest that the quest for cleanly identifying the distinction might be misguided.

References

Olga Babko-Malaya. Guidelines for propbank framers. Unpublished manual, September 2005.

Ann Bies, Mark Ferguson, Karen Katz, and Robert MacIntyre. Bracketing Guidelines for Treebank II Style Penn Treebank Project, 1995. URL http://www.sfs.uni-tuebingen.de/˜dm/07/ autumn/795.10/ptb-annotation-guide/root.html.

Andrew Carnie. Syntax: A Generative Introduction. Blackwell, 2006. Noam Chomsky. The Minimalist Program. The MIT Press, 1995.

Michael Collins. Head-Driven Statistical Models for Natural Language Processing. PhD thesis, University of Pennsylvania, 1999.

Elizabeth A. Cowper. A Concise Introduction to Syntactic Theory. The University of Chicago Press, 1992. Peter W. Culicover and Ray Jackendoff. Simpler Syntax. Oxford University Press, 2005.

Steve DeNeefe and Kevin Knight. Synchronous tree adjoining machine translation. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2009.

Dmitriy Dligach and Martha Palmer. Word sense disambiguation with automatically retrieved semantic knowledge. International Journal of Semantic Computing (IJSC), 2(3):365–380, 2008.

David Dowty. On the semantic content of the notion ‘thematic role’. Properties, Types and Meaning, 2: 69–130, 1989.

David Dowty. The Dual Analysis of Adjuncts/Complements in Categorial Grammar. Mouton de Gruyter, 2003.

Charles J. Fillmore. The case for case. In Emmon Bach and R. Harms, editors, Universals in Linguistic Theory. Holt, Rinehart, and Winston, New York, 1968.

Charles J. Fillmore. Under the circumstances. In Proceedings of the 20th Annual Meeting of the Berkeley Linguistics Society, Berkeley, California, 1994.

Charles J. Fillmore and Miriam R. L. Petruck. Framenet glossary. International Journal of Lexicography, 16(3):235–250, 2003.

Jean Mark Gawron. Lexical Representations and the Semantics of Complementation. New York: Garland Publisher, 1988.

Ralph Grishman, Catherine Macleod, and Adam Meyers. Comlex syntax: Building a computational lexi- con. In Proceedings of the International Conference on Computational Linguistics (COLING), Kyoto, Japan, 1994.

Liliane Haegeman. Introduction to Government and Binding Theory. Blackwell, Oxford UK and Cam- bridge USA, 1 edition, 1991.

Jorge Hankamer and Ivan Sag. Deep and surface anaphora. Linguistic Inquiry, 7(3):391–426, 1976. Donald Hindle and Mats Rooth. Structural ambiguity and lexical relations. Computational Linguistics, 19

(1), March 1993.

Ray Jackendoff. Semantic Interpretation in Generative Grammar. The MIT Press, Cambridge, Mas- sachusetts, 1972.

Ray Jackendoff. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford University Press, 2002.

Paul Kingsbury and Martha Palmer. Propbank: The next level of treebank. In Proceedings of Treebanks and Lexical Theories, V¨axj¨o, Sweden, 2003.

Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. A large-scale classification of english verbs. Language Resources and Evaluation Journal, 42(1):21–40, 2008.

Jean-Pierre Koenig, Gail Mauner, and Breton Brienvenue. Arguments for adjuncts. Cognition, 89:67–103, 2003.

George Lakoff and John R. Ross. Why you can’t do so into the sink. In J. McCawley, editor, Notes from the Linguistic Underground, volume 7 of Syntax and Semantics. Academic press, New York, 1976. Catherine Macleod, Ralph Grishman, and Adam Meyers. Comlex syntax: A large syntactic dictionary for

natural language processing. Computers and the Humanities, 31(6):459–481, 1998a.

Catherine Macleod, Ralph Grishman, and Adam Meyers. Comlex syntax reference manual version 3.0. Technical report, Computer Science Department, New York University, 1998b.

Paola Merlo and Eva Esteve Ferrer. The notion of argument in prepositional phrase attachment. Compu- tational Linguistics, 32(3):341–378, 2006.

Adam Meyers, Catherine Macleod, and Ralph Grishman. Standardization of the complement adjunct distinction. In Proceedings of the 7th EURALEX International Congress, Goteborg, Sweden, 1994. Philip H. Miller. Clitics and Constituents in Phrase Structure Grammar. Garland, New York, 1992. Martha Palmer, Dan Guildea, and Paul Kingsbury. The proposition bank: An annotated corpus of semantic

roles. Computational Linguistics, 31(1):71–105, March 2005.

Carl Pollard and Ivan A. Sag. Head-Driven Phrase Structure Grammar. University of Chicago Press,

Related documents