• No results found

2.2 Temporal Information in the IE Process

2.2.2 Temporal Annotation Guidelines

The development of temporal annotation standards and corpora has a long history. Of note is the TimeBank corpus [Pustejovsky et al., 2003c], which contains 183 news articles annotated with temporal information, events, times and temporal links between events and times. This corpus was developed in multiple iterations, and prior analyses of the annotated data and the annotation standard aided the evolution of both. For example, [Boguraev and Ando, 2007] presented an extensive analysis of the TimeBank reference corpus in terms of development support of TimeML-compliant analytics, which helped advance the state of the art in temporal annotation. Indeed, iterative application of an annotation standard and examination of the resulting annotated data are critical steps in the MATTER development cycle, used for construction of annotation standards [Pustejovsky, 2006, Pustejovsky and Stubbs, 2012].

ISO SemAF

ISO SemAF (Semantic Annotation framework) 8601 [ISO, 2007] is an international standard for representing dates and times using a combined representation in the format YYYY- MM-DD’T’hh:mm:ss. Durations can also be represented using the format PnYnMnDTnHnMnS (or PnW) – P is a designator for the period, n is the value for each date and time element, and Y, M, D, W, H, M, S are the duration designators for number of years, months, days, weeks, etc. Parts of day, weekend, seasons, decades and centuries were introduced as new concepts in the calendar.

The extended version of ISO 8601 also provides underspecified or unknown values, by using a placeholder character X for those calendar field values when the context does not allow the values to be specified (e.g. 2012-01-XX for January, 2012), and values for temporal expressions that refer to the past, the present or the future (e.g. “nowadays", “lately", “recently") defined by alphabetical tokens PAST_REF, PRESENT_REF and FUTURE_REF for referencing to the past, the present and the future respectively [Kolomiyets, 2012].

STAG

STAG (Sheffield Temporal Annotation Guidelines) [Verhagen, 2004] was proposed as a means to annotate events, time expressions and the relations between them, classifying events in four groups: occurrences, perception events, reporting events and aspectuals. All events and time expressions are related with one of these three tags: relatedToEvent, relatedToTime, and relType. For the first two, values reference other events or time expressions. The last one contains a restricted value to identify the relation type (BEFORE, AFTER, INCLUDES, IS_INCLUDED or SIMULTANEOUS). This last relation type is intended to be fuzzy and include all kinds of overlaps.

Example: The plane crashed on Wednesday The plane

<event eid=9 class=OCCURRENCE tense=PAST relatedToTime=5 relType=IS INCLUDED>crashed</event>

<timex tid=5>Wednesday</timex>

TIDES

TIDES (Translingual Information Detection, Extraction, and Summarization ) is defined as a set of annotation guidelines for time expressions with a canonicalised representation of the

times they refer to [Verhagen, 2004]. TIDES is a standard that specifies kinds of markable and not markable expressions, and how to capture the semantics of temporal expressions. TIDES introduced the <TIMEX2> tag and a set of tag attributes to identify temporal expressions in text and their related information [Ferro et al., 2005]. Moreover, TIDES standard also describes how to estimate normalised values of temporal expressions using different kinds of temporal units [Kolomiyets, 2012]. In TIDES, the <TIMEX2> tag is intended to support a variety of applications, and temporal expressions are considered stand-alone targets to be annotated and extracted [Verhagen, 2004].

Table 2.5 desbribes the attributes of <TIMEX2> tag: Examples:

<TIMEX2 VAL="2001-12-31">Last day of 2001</TIMEX2> <TIMEX2 VAL="2014-03-21">today</TIMEX2>

Table 2.5: <TIMEX2> tag attributes as specified in TIDES [Kolomiyets, 2012].

Attribute Function Example

VAL Normalised value VAL=’2010-01-20’

MOD Temporal modifier MOD=’APPROX’

ANCHOR_VAL Normalised form of the anchoring time expression ANCHOR_VAL=’2010-01-21’ ANCHOR_DIR Relative direction between VAL and ANCHOR_VAL ANCHOR_DIR=’BEFORE’

SET Used for expressions denoting sets of times SET=’YES’

COMMENT Annotator’s comment COMMENT=’autogenerated’

TimeML

TimeML2 [Pustejovsky et al., 2003a] is an expressive language for temporal information annotation, designed to connect the processes of temporal analysis of a text with a representation and formal meaning of time. It is a specification language for event and temporal expressions in natural language text able to capture distinct phenomena in temporal markup, to anchor events to temporally denoting expressions, and to order relative event expressions.

TimeMLis a metadata standard scheme for markup of events and their temporal anchoring, being able to link an event to a time, and recognizing some temporal adverbials, such as temporal prepositions (e.g. “for", “during", “on", “at") and connectives (e.g. “before", “after", “while") [Mani, 2003]. As a general annotation scheme, TimeML provides an XML-compliant markup language and annotation scheme for times and events, capable of capturing all salient temporal information in a text [Verhagen, 2004].

TimeML captures temporal semantics in text, focused on systematic anchoring events to the times, and their relative order to each other. TimeML adopted the core of the STAG and remained compliant to the TIDES time expression annotation, keeping the notions of temporal object and temporal relation as central points in TimeML. Temporal objects express: time expression and events, marked up with the <TIMEX3> and <EVENT> tags respectively [Kolomiyets, 2012].

TimeML is based on four major tags:

• <TIMEX3> was introduced for annotating temporal expressions in text, extending the TIDES <TIMEX2> attributes;

• <EVENT> is used for annotating events and states in text, comprising tensed and untensed verbs, nominalisations, adjectives, predicative clauses and prepositional phrases;

• <SIGNAL> annotates textual elements used to make relations holding two temporal elements, such as temporal prepositions and conjunctions, prepositions signalling modality (“to"), and special characters (“-" and “/") that can denote ranges.

• <LINK> enables encoding different types of relations between temporal elements to establishing temporal ordering: BEFORE, AFTER, INCLUDES, IS_INCLUDED, DURING, DURING_INV, SIMULTANEOUS, IAFTER, IBEFORE,

IDENTITY, BEGINS, ENDS, BEGUN_BY, ENDED_BY

TimeML distinguishes three kinds of temporal links used to encode temporal relations between events and time expressions: a) <TLINK> encodes temporal relations proper, b) <ALINK> encodes aspectual relations, and c) <SLINK> encodes modality, negation and factuality. Instead of annotating such temporal relations on the event itself, they are annotated in a separate non-input-consuming tag that links events and time expressions to each other [Verhagen, 2004].

Example: Paul taught on Friday

Paul <EVENT eid=e1 class=OCCURRENCE>taught</EVENT> on <TIMEX3 tid=t1>Friday</TIMEX3>

<TLINK eventInstanceID=ei1 relatedToTime=t1 relType=is included/>

In TimeML, TLINKs have a relation type attribute relType valued with one of fourteen different relation types described in Table 2.6. Such relations are intended to be mutually exclusive, but the guidelines do acknowledge that especially the simultaneous relation can be a bit fuzzy [Verhagen, 2004].

Table 2.6: TimeML <TLINK> relation types [Verhagen, 2004].

Relation Type Description

simultaneous Events that happen at the same time or so close that distinguishing their times makes no temporal interpretation difference

before,after Used for temporal precedence of events and times ibefore,iafter One event is immediately before or after the other

includes,is includes For relations between the temporal expression and the event holds,held by Like the simultaneous relation, differing by the fact that they are

relations between an event and a particular time

begins,begun by A relation between one event and the start time of a period ends,ended by A relation between one event and the end time of a period

identity Annotated as a tlink even though it is not a temporal relation proper

The Unknown relation was added to be used when it is often not possible to specify a temporal relation between two random events in a text, and the user is forced to provide a temporal relation then, making a distinction between relations that have not yet been considered by the annotator and relations that were considered but have no value. Furthermore, TimeML has no Overlap relation, motivated by the observation that this relation does not naturally occur in real texts [Verhagen, 2004].