Outline of the Thesis - Processing temporal information in unstructured documents

extends the meaning representations output by the grammar with the information coming from the temporal extractor. This combination illustrates an application of temporal processing and leads to an enhanced representation of time in the meaning representations output by a computational grammar for

the deep processing of Portuguese. This part is described in Chapter5.

1.6 Outline of the Thesis

This thesis is organized in the following manner.

Chapter 2 presents the related work. Here some fundamental concepts about

the way time is mentioned in natural language are introduced. The chapter starts by presenting the work on which temporal processing is based, drawing from the fields of Linguistics, Logic and Artificial Intelligence. These disciplines have been concerned with the way that temporality is conveyed in natural language and with reasoning about time. It then addresses more recent work, specifically in the area of temporal information processing, which has flourished with the recent development of annotation standards, annotated data sets, evaluation competitions and a large body of research based on these resources.

Chapter 3 describes TimeBankPT. This data set is used for the development

and testing of the technology presented in the following chapters. To develop Time- BankPT, an existing resource of English data with temporal annotations was trans- lated to Portuguese, adapting the existing annotations. We explain how this adap- tation was carried out, and we also explain the format and meaning of the temporal annotations that are used in the original data and in TimeBankPT. A quantitative comparison between the original English corpus and TimeBankPT is also presented. Finally, we check whether the size of TimeBankPT is adequate, and describe an au- tomated error mining procedure that was applied to the corpus in order to guarantee consistent annotations.

Chapter 4 focuses on the most difficult and interesting problem of temporal

processing: classifying temporal relations between various kinds of elements (events and times). The approach taken in this chapter is to use machine learning techniques to tackle this issue. The chapter presents a series of different classifier features that are tested with the purpose of improving this task. We explore many different types

of information, from morphology and syntax to semantics and even pragmatics, presenting motivating examples for trying them out, and discussing how they are implemented and then tested.

Chapter5 is about applications. The first part of this chapter presents an effort

to replicate for Portuguese the remaining tasks of temporal processing. Together with the temporal relation classifiers developed in the previous chapter, the result of this is full temporal annotation for Portuguese, materialized in a temporal extraction system. A second contribution of this chapter is the expansion of an existing deep computational grammar for Portuguese with a temporal module. Because this module does not make it possible to extract as much temporal information from input text with the grammar as what the temporal extraction system can, the two are combined, extending the output of the grammar with information coming from the temporal extractor.

Finally, Chapter6summarizes the main achievements of this study and discusses

Chapter 2

Related Work

This chapter presents the work on which temporal processing is based, as well as some recent work on the computational processing of time phenomena in natural language. A large contribution comes from the fields of Linguistics and Logic, which have focused on the issues of time in natural language and temporal reasoning for decades now. The field of Artificial Intelligence also produced work that is relevant to our problem.

Temporal information processing has flourished quite recently. The present cen- tury has seen the development of annotation standards, annotated data sets, evaluation competitions and a large body of research based on these resources.

2.1 Outline

This chapter is organized in the following way. In Section 2.2 we present some of

the foundational work on the topics of tense, aspect and temporal reasoning. It draws from related areas, like Linguistics, Logic and Artificial Intelligence. We then turn our attention to the computational processing of time phenomena in natural

language, mentioning some of the early approaches in Section 2.3.

Recent years have seen the appearance several competitions and the development and maturing of annotation schemes and data sets relevant to this task. In

Section 2.4 we talk about TERN 2004, which was a competition focusing on the

English and Chinese. In Section 2.5 we discuss TimeML, the current de facto an- notation standard for temporal phenomena, as well as available corpora annotated with it. In addition to time expressions, TimeML covers the annotation of events mentioned in text, as well as the temporal relations holding between these events and the times and dates mentioned in the same text. The following sections, Sec-

tion2.6 and Section2.7, are about the two TempEval competitions, that made use

of similarly annotated data. They attracted participants working on English and Spanish. There have been efforts on the temporal processing of other languages.

Section2.8 lists some of the more recent corpora annotated with time phenomena.

Many of them feature new languages.

These data sets have fueled much of the recent research on temporal processing. They have been used not only by the participants of these competitions (TERN 2004,

TempEval, etc.), but also by much of the work published outside them. Section2.9

presents some of the more recent approaches to the problem of temporal information processing.

In document Processing temporal information in unstructured documents (Page 33-36)