Partitioning events into types - Capturing events in their diversity

2.5 Capturing events in their diversity

2.5.1 Partitioning events into types

The challenge presented by diversity is exhibited in the transition from muc to more general type-based extraction in the ace programme. muc was defined by its selectiveness; it targeted a “fixed and closely circumscribed subject domain” (Yangarber and Grishman, 1997) for each evaluation (for instance, management succession and aircraft accident . Through iterative refinement of templates and detailed annotation specifications, this yielded human inter-annotator agreement from 70 to 90 percent (Will, 1993) with the best system in each evaluation performing in the 50-60% F1 range (Chinchor, 1998b).16 The ace programme

was the natural descendent of muc evaluations, in terms of its tasks and participants; it further specified the extraction of entities and other values, entity coreference considered by muc-6 (Sundheim, 1995) and entity-entity relations of muc-7 (Chinchor, 1998a), before

considering events (Doddington et al., 2004).

One outcome of muc was the understanding that targeted applications could utilise small, portable, textually fine-grained components, to be determined and benchmarked sepa- rately. (Grishman and Sundheim, 1996). In an attempt to parallel entity and relation extraction, ace thus targeted more general notions of event extraction than application-specific scenario templates considered in muc. Hence the first ace attempt at event annotation considered five very broad event types (LDC, 2003):

• destruction/damage; • creation/improvement;

• transfer of possession or control; • movement; and

• interaction of agents.

The edition for 10-11 September 2010.

16_{Note this agreement is calculated over detailed templates rather than whether an event of the target type} is present.

2.5. Capturing events in their diversity 27

Event type Event subtype

Life Be-Born, Marry, Divorce, Injure, Die Movement Transport

Transaction Transfer-Ownership, Transfer-Money

Business Start-Org, Merge-Org, Declare-Bankruptcy, End-Org Conflict Attack, Demonstrate

Contact Meet, Phone-Write

Personnel Start-Position, End-Position, Nominate, Elect

Justice Arrest-Jail, Release-Parole, Trial-Hearing, Charge-Indict, Sue, Convict, Sentence, Fine, Execute, Extradite, Acquit, Appeal, Pardon

Table 2.2: Event types and subtypes in the ace05evaluation (NIST, 2005).

The annotation guide (LDC, 2003) provides arrest and winning an election as examples of

transfer of possession; presumably hijacking falls into this category as well, rather than with

other attack-like events in destruction/damage, where tsunami might also reside. So while these categories capture ontological families of event and may represent a vast proportion of newsworthy events, such broad types naturally hide distinguishing features of event semantics. This pilot annotation produced much lower inter-annotator agreement than entity or relation detection tasks (Strassel et al., 2004) and so the schema was reinvented for future evaluations. ace05 introduced events into the evaluation, categorising those of interest into eight

more thematic types which break down further into 33 sub-types17 listed in Table 2.2. It distinguishes, for instance, birth of a person and the creation of an organization where the 2003 schema did not, but it does not completely cover that earlier typology. For example, it cannot mark the creation of an interesting artifact.

To consider the heterogeneity of ace05event sybtypes we plot the frequency of each in an

annotated corpus (Walker et al., 2006) against the average length of its coreference chains. As shown in Figure 2.8, frequencies of event subtypes vary from two justice:pardon to 1119

conflict:attack events. The distribution of the conflict type is also clearly imbalanced between

its two constituent subtypes, with attack over ten times more frequent than demonstrate. The infrequent types are too too scarce for supervising a learnt extractor, while the most frequent types are impractically broad for application, with annotated movement:transport instances include withdrawal of troops, climbing Mount Everest, a Mars Rover voyage, swimming and weapons smuggling. Even so, numerous interesting events are missed by the schema, from natural disasters to construction to legislation and other publication. The frequency variation is notably present in a corpus that was not sampled randomly from its sources, but selected to ensure suﬃcient instances of targeted types within a corpus of predetermined size.18 Variation

All systems known to the author focus on the sub-types, ignoring the broader groupings.

100 101 102 103 1 1.2 1.4 1.6 1.8 2 2.2 Pardon DivorceRelease-Parole Transport Attack Event frequency References p er ev en t Business Conflict Contact Justice Life Movement Personnel Transaction

Figure 2.8: Frequencies of event subtypes in all 600 ace05 training documents.

Evaluation attack transport die meet injure charge-indict

# gold references 984 472 392 160 87 85

Annotator 1 84 78 89 80 89 89

Annotator 2 88 85 92 79 88 87

Inter-annotator 73 61 82 64 76 76

Naughton et al. Trigger-based 25 20 80 65 65 80

Naughton et al. svm 70 40 75 70 60 80

Table 2.3: Human and system (Naughton et al., 2010) performance (F1) on a sentence-level

event type identification task, over six frequent event types in the newswire portion of the ace05corpus (Walker et al., 2006).

on the other axis, the number of references per distinct event, indicates a few categories of event subtypes: divorce and release-parole are both subjects of documents, with a number of references to the same generic concept of such events, rather than specific referents; in contrast, pardon’s annotations tend to be single references in passing; while attack exhibits a mix of single focal events and cases where a number of distinct events of that type are mentioned in an article.

By reducing ace05 event detection to a sentence-level classification task Naughton et al.

(2010) illustrate the diﬃculty of identifying such broad event types. Despite its relative infre- quency, a homogeneous type like charge-indict is reliably recognised by the human annotators

such as phone number and events as in Table 2.2; some of these may substantially bias the corpus domain. Type-targeted sampling was first adopted for the 2005 evaluation in place of random sampling (Walker et al., 2006), and follows from muc evaluations where corpora were selected to match the target event domain: the muc-3corpus includes only documents matching topical keywords (Chinchor et al., 1993); muc-6collates an

2.5. Capturing events in their diversity 29

(see Table 2.3), while broader types of event such as meet and transport are recognised with reasonably high precision, resulting in high annotator F1 with respect to the final corpus,

but lower recall, such that the annotators fail to mark the same sentences, presumably due to sub-salient references. Using support vector machines (svm), Naughton et al. (2010) are able to approach inter-annotator performance well for most types, but perform half as well for transport as for charge-indict ; for the latter type, using a small list of trigger words is equally eﬀective, while for the former trigger terms perform only half as well as svm.19 The

attack type is also notable for being identifiable with a machine-learning model, but not with

a word list, suggesting that unlike four of the six types that Naughton et al. (2010) consider, this type is lexically diverse.20

Having reviewed two (correlated) attempts to schematise broad-coverage event types, the extreme variability within and across types suggests that this approach does not readily gen- eralise to the breadth of events. Although we again consider such a typology in Section 3.1, the data presented here suggest that this approach is flawed: while considering a few pre- scribed event types may be suited to specific applications, alternatives must be considered for more general event processing.

In document Grounding event references in news (Page 43-46)