3.6 Temporal relation identification
4.3.2 Checking syntactic correctness
The second stage of the system is concerned with generating syntactically valid TEs. Considering that the full extent of a TE should be a well-formed syntactic constituent, there is a need for a module that checks the syntactic well-formedness of the entities identified at the previous stage. This module, apart from checking the syntactic correctness of the identified TEs, should also modify their extent so that they adhere to the TIMEX2 specifications defining the correct extent of a TE.
As already mentioned in Section 2.3.1, the full extent of a TE should either be a noun, adjective, adverb or any of the corresponding phrases (noun, adjectival or adverbial phrases). TEs cannot be prepositional phrases or clauses, so they cannot start with a preposition or a subordinating conjunction (e.g. after Friday, before they meet on Monday are disallowed as temporal expressions).
Premodifiers of temporal expressions such as determiners (e.g. a great day), and postmodifiers such as prepositional phrases or subordinate clauses should be included in the time expression (e.g. the year of the elections, the year when he
started University). The appositives that may appear after a TE are not to be
included in the expressions tag, but, if they contain trigger words, they are to be tagged separately. In the case of temporal range expressions (from 1990 to 1999 ), and conjunctions (today and tomorrow morning) or disjunctions (six months or a
year from now ) of time expressions, the points should be tagged separately, even
within the same TE, and in such contexts where more indicators are present, the number and full extent of the corresponding TEs are determined using the following rules defined in the TIMEX2 annotation guidelines (Ferro et al., 2005):
• one TE is created if there are no intervening words between the temporal terms
that qualify a unit of time (e.g. <twelve o’clock midnight>), if the terms are connected with the preposition of (e.g. <the evening of December, 31>) or if the prepositions to, till, after, in are used for expressing a certain point of time in a day. In these cases, but also in the case of the “MONTH DAY, YEAR” format, the expression containing all the terms should be tagged as a single unit.
• multiple TEs with embedding appear in two cases. One is when the larger
TE denotes an offset to another TE included in it. In this case two tags are created with the one corresponding to the anchoring phrase contained within the extent of the tag of the complete phrase (e.g. <two weeks from
<next Tuesday>>). The second case is characterized by the larger TE being
a possessive construction. If both the possessive phrase and the phrase that it modifies are time-denoting expressions, then two tags are created, and the possessive phrase tag is contained within the extent of the complete phrase tag (e.g. <<This year>’s spring>).
• multiple TEs without embedding are created in cases other than those described
above, meaning that temporal phrases appearing in close proximity (like appositive phrases, range expressions, and conjoined expressions) are tagged as independent phrases. Although tagged independently in terms of the extent, there is a dependency in terms of the value. The expression with finer granularity inherits the value of the coarser-grained expression. This inheritance
happens regardless of the relative ordering of the two expressions (e.g. <8.00
pm> on <Friday>).
According to these TIMEX2 specifications, the functionality of the module that checks the syntactic correctness of the TEs identified at the previous stage is as follows. Firstly, the input text is parsed using Connexor’s FDG parser (Tapanainen and Jarvinen, 1997). This parser returns information on a word’s part of speech, morphological lemma and its functional dependencies on surrounding words, and this syntactic information is used by the system with the assumption that it is 100% correct. However the evaluation and error analysis presented in Section 4.4 show that this process introduces errors as well. Secondly, errors introduced by the rule-based TE identification module are corrected by using syntactic information. Such errors include:
• TEs starting with a determiner that is syntactically dependent on a noun that
follows the TE. In these cases the determiner should be removed from the TE (e.g. the rule-based TE identification module provides as output for the noun phrase the night shift the TE the night, but syntactic information indicates that
the is actually linked to the noun shift rather than night, and as a consequence
the determiner is eliminated from the TE).
• verbs wrongly annotated as TEs due to being homographs with certain lexical
triggers (e.g. the verb present could be mistaken due to the same spelling for the noun or adjective present referring to the present time). These cases are removed from the set of TEs previously identified.
• TEs that can be extended to their left with pre-modifiers that syntactically
in the sentence It was a long night, but after considering syntactic information it is extended to the entire NP headed by the trigger word, yielding the expression
a long night).
• TEs that can be extended to their right with post-modifiers such as
prepositional phrases or relative clauses syntactically dependent on the head of the expression (e.g. the TE an evening is initially annotated in the sentence
It was an evening he will never forget, but syntactic information leads to the
inclusion of the relative clause in the extent of the final TE an evening he will
never forget).
• embedded TEs that are identified by the rule-based TE-identifier either as two
separate TEs that should be annotated as one TE embedding another TE (e.g.
<two weeks from <next Tuesday>>), or detects only the larger TE, without
annotating the embedded one (<<this year>’s spring>).
This section has focused on using syntactic information in order to check and correct the extent of the TEs identified at the pattern-matching stage. However, some problems of a semantic nature cannot be solved either using patterns, or syntactic information. This is the case of the adverb then, capable of manifesting several semantic values, of which only the anaphoric one should be labelled as a time expression. A novel methodology developed as part of this research to disambiguate each usage of then, and only annotate the anaphoric cases, is presented in the following section.