• No results found

3.2 Annotation schemes

3.2.2 The TIDES TIMEX2

The TIDES TIMEX2 is an annotation scheme for marking the extent of English time expressions and representing their values according to the ISO-8601 (ISO8601:2004, 2004) standard format. It was developed to support research

activities under the DARPA TIDES (Translingual Information Detection, Extraction and Summarisation) research program (TIDES, 2002), and the Automatic Content Extraction program (ACE, 1999).

The TIMEX2 annotation scheme extends the MUC-7 scheme by widening the range of markable expressions, and by replacing the TIMEX TYPE attribute with a set of attributes that specify in more detail the semantic representation of a time expression. In addition, the TIDES TIMEX2 scheme is compliant in terms of the format used to represent time values with the ISO-8601 standard.

TIMEX2 was originally developed during the year 2000 under the TIDES program, and was first documented in Ferro et al. (2000). It has then undergone several revisions yielding newer versions of the guidelines described in Ferro et al. (2001, 2003), with the latest version being presented in Ferro et al. (2005).

The latest annotation guidelines describe a wide set of markable time expressions, including mostly the temporal expressions presented in Section 2.3. According to the guidelines, the full extent of a TE should either be a noun, adjective, adverb or any of the corresponding phrases (noun, adjectival or adverbial phrases). The temporal expression cannot be a prepositional phrase or a clause, so it cannot start with a preposition or a subordinating conjunction (e.g. after Friday and before they meet on Monday are disallowed as temporal expressions, only Friday and Monday being correct markables). Premodifiers of temporal expressions such as determiners, and postmodifiers such as prepositional phrases or subordinate clauses should be included in the time expression. The appositives that may appear after a TE are not to be included in the expression’s tag, but, if they contain temporal trigger words, they are to be tagged separately.

POINTS IN TIME VAL= “YYYY-MM-DDThh:mm:ss”

<TIMEX2 VAL=“2004-02-23T15:00”>3 p.m. Monday</TIMEX2>

Anchored expressions T = ISO time-of-day designator VAL=“YYYY-WOY-D”

<TIMEX2 VAL=“2004-W10”>next week</TIMEX2> Week-based format

VAL=“token”

<TIMEX2 VAL=“PRESENT REF”>now</TIMEX2> Tokens that replace theentire value of VAL VAL=“YYYY-*token*”

<TIMEX2 VAL=“2003-FA”>Fall 2003</TIMEX2>

VAL=“YYYY-MM-DDT*token*”

<TIMEX2 VAL=“2004-02-24TMO”>Tuesday morning</TIMEX2>

VAL=“WOY-*token*”

<TIMEX2 VAL=“W09-WE”>this weekend</TIMEX2>

Tokens that replace particular positions in the value of VAL

DURATIONS VAL=“PnYnMnDTnHnMnS”

<TIMEX2 VAL=“P1H”>one hour long</TIMEX2>

VAL=“PnW”

<TIMEX2 VAL=“P3W”>three weeks</TIMEX2>

Expressions answering the question how long

Table 3.1: Possible formats of the TIMEX2 attribute VAL

conjunction (e.g. today and tomorrow morning ) or disjunction (e.g. six

months or a year from now ) of time expressions, the points should be tagged

separately, even if they share modifiers.

The tag element used to mark up time expressions is TIMEX2, and its attributes are: VAL, MOD, ANCHOR VAL, ANCHOR DIR, SET and COMMENT. The TIMEX2 tag attributes are presented below together with their use.

The VAL attribute is used for any expression that indicates a point or interval on a calendar/clock or that can be identified as an unanchored duration. The placeholder character “X” is used when parts of the value are unknown. The possible formats of VAL are captured in Table 3.1.

The value of VAL can include certain tokens relevant in the representation of time points and durations that can occupy the entire value of VAL, or tokens covering only parts of the value. These tokens are listed in Table 3.2.

The MOD attribute is used together with other attributes when the time expression includes a modifier that changes or clarifies the interpretation of VAL in some way. MOD captures the semantics of quantifier modifiers (e.g.

TOKENS COVERING THE WHOLE VALUE OF VAL Token Markable expressions Non-markable expressions

PAST REF past yesterday former lately long ago medieval before previously earlier beforehand once PRESENT REF now today current, currently present, presently nowadays

(at) this (point in) time (at) the present time (at) the present moment

immediately instantly forthwith

FUTURE REF future tomorrow ahead after soon, sooner shortly later eventually subsequent

TOKENS OCCUPYING ONLY ONE POSITION IN VAL

Token Expressions Position

MO MI AF DT EV NI morning midday afternoon

daytime or working hours evening night Hour WE weekend Day SP SU FA WI Qn H1 H2 spring summer

fall, autumn, fall term/semester

winter

n-th quarter (n = 1..4)

first half (of year) second half (of year)

Month

Table 3.2: Tokens that may appear in the value of the TIMEX2 attribute VAL

approximately, no more than) and lexicalized aspect markers (e.g. early, start of ), but not the semantics of prepositions or other terms outside the temporal

expression. The tokens representing possible values for MOD, together with expressions that trigger them are presented in Table 3.3.

The attributes ANCHOR VAL and ANCHOR DIR are always used together to indicate the orientation and anchoring of certain durations with respect to other points or periods of time. The value of the ANCHOR VAL attribute is the normalisation of the anchoring date or time in ISO format, while the value of the ANCHOR DIR attribute shows the orientation of the duration with respect to the date or time denoted by ANCHOR VAL. The possible values of ANCHOR DIR are: WITHIN, STARTING, ENDING, AS OF, BEFORE, AFTER. For example,

TYPE OF EXPRESSIONS VALUES OF MOD EXPRESSIONS POINTS IN TIME BEFORE AFTER ON OR BEFORE ON OR AFTER

more than ... ago less than ... ago no less than ... ago no more than ... ago DURATIONS

LESS THAN MORE THAN EQUAL OR LESS EQUAL OR MORE

less than ... (long), nearly more than ... (long) no more than at least POINTS AND DURATIONS

START MID END APPROX

early, dawn, start, beginning middle, mid-

end, late

about, around, approximately

Table 3.3: Tokens that may represent the value of the TIMEX2 attribute MOD

given the expression the three months ending May 31, ANCHOR VAL would be assigned the value 2010-05-31, and ANCHOR DIR the value ENDING.

The SET attribute is used in the representation of expressions denoting sets of time, i.e., times that recur regularly or irregularly (e.g. every Tuesday, numerous

weeks, some Thursdays) and its only value is YES.

The COMMENT attribute was introduced so that annotators can insert remarks about why they made a specific decision for ambiguous expressions, or to signal certain cases of doubt.

The TIMEX2 annotation guidelines are the most refined annotation specifications developed so far for any temporal entity, therefore the resulting annotated corpus described in detail in Section 3.3.1 is very reliable. In addition, this also enables the development of automatic systems achieving good performance for the task of TIMEX2 annotation (Section 4.4 provides more details of the results obtained by automatic systems performing TIMEX2 annotation). However, the TIMEX2 annotation scheme is concerned only with time expressions, and to be able to build a temporal representation of a given text one needs ways to represent not only temporal expressions, but also the information related to events and temporal relations holding among temporal expressions and events. This need is addressed by STAG, an annotation scheme

that enables the annotation of the three most important temporal phenomena - temporal expressions, events and temporal relations - in a given text.