• No results found

Lecture Notes in Artificial Intelligence 5190

N/A
N/A
Protected

Academic year: 2021

Share "Lecture Notes in Artificial Intelligence 5190"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture Notes in Artificial Intelligence 5190

Edited by R. Goebel, J. Siekmann, and W. Wahlster

Subseries of Lecture Notes in Computer Science

(2)

António Teixeira

Vera Lúcia Strube de Lima Luís Caldas de Oliveira Paulo Quaresma (Eds.)

Computational

Processing of the

Portuguese Language

8th International Conference, PROPOR 2008 Aveiro, Portugal, September 8-10, 2008

Proceedings

1 3

(3)

Series Editors

Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany

Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors

António Teixeira

Universidade de Aveiro, Dep. de Electrónica, Telecomunicações e Informática, and Instituto de Engenharia Electrónica e Telemática de Aveiro (IEETA)

3810-193 Aveiro, Portugal E-mail: [email protected] Vera Lúcia Strube de Lima

Pontifícia Universidade Católica do Rio Grande do Sul Faculdade de Informática, Grupo PLN

90619-900 Porto Alegre, RS, Brazil E-mail: [email protected] Luís Caldas de Oliveira

Universidade Técnica de Lisboa, and INESC-ID, L2F

1000 Lisboa, Portugal E-mail: [email protected] Paulo Quaresma

Universidade de Évora, Departamento de Informática 7000-671 Évora, Portugal

E-mail: [email protected]

Library of Congress Control Number: 2008933855 CR Subject Classification (1998): H.3.1, H.5.2, I.2.1, I.2.7 LNCS Sublibrary: SL 7 – Artificial Intelligence

ISSN 0302-9743

ISBN-10 3-540-85979-9 Springer Berlin Heidelberg New York ISBN-13 978-3-540-85979-6 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law.

Springer is a part of Springer Science+Business Media springer.com

© Springer-Verlag Berlin Heidelberg 2008 Printed in Germany

Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12513574 06/3180 5 4 3 2 1 0

(4)

Preface

The International Conference on Computational Processing on Portuguese, for- merly the Workshop on Computational Processing of the Portuguese Language – PROPOR – is the main event in the area of Natural Language Processing that focuses on Portuguese and the theoretical and technological issues related to this specific language. The meeting has been a very rich forum for the interchange of ideas and partnerships for the research communities dedicated to the automated processing of the Portuguese language.

This year’s PROPOR, the first one to adopt the International Conference la- bel, followed workshops held in Lisbon, Portugal (1993), Curitiba, Brazil (1996), Porto Alegre, Brazil (1998), ´Evora, Portugal (1999), Atibaia, Brazil (2000), Faro, Portugal (2003) and Itatiaia, Brazil (2006).

The constitution of a steering committee (PROPOR Committee), an interna- tional program committee, the adoption of high-standard refereing procedures and the support of the prestigious ACL and ISCA international associations demonstrate the steady development of the field and of its scientific community.

A total of 63 papers were submitted to PROPOR 2008. Each submitted paper received a careful, triple-blind review by the program committee or by their commitment. All those who contributed are mentioned on the following pages. The reviewing process led to the selection of 21 regular papers for oral presentation and 16 short papers for poster sessions.

The workshop and this book were structured around the following main top- ics: Speech Analysis; Ontologies, Semantics and Anaphora Resolution; Speech Synthesis; Machine Learning Applied to Natural Language Processing; Speech Recognition and Natural Language Processing Tools and Applications. Short papers and related posters were organized according to the two main areas of PROPOR: Natural Language Processing and Speech Technology.

This year’s PROPOR had two important novelties: one was the fact that the two main areas of the conference were more equally represented and the other was the inclusion of a special session dedicated to Applications of Por- tuguese Speech and Language Technologies. The special session, promoted by the Microsoft Language Development Center (MLDC), provided an opportunity for university and industrial communities working on portuguese natural lan- guage processing and speech technology to report their most recent products, systems, resources or tools for Portuguese. Two satellite events were also or- ganized in association with PROPOR: the Second HAREM Workshop, Named Entity Recognition in Portuguese, and the workshop “Ten years of Linguateca”.

We would like to express here our thanks to all members of our technical program committee and additional reviewers, as listed on the following pages.

We are especially grateful to our invited speakers, Tanja Schultz (Univer- sity of Karlsruhe and CMU) and Chris Quirk (Microsoft), for their invaluable

(5)

VI Preface

contribution, which undoubtedly increased the interest in the conference and its quality.

We are indebted to the PROPOR 2008 secretary, Anabela Viegas, for all her support.

We would like to publicly acknowledge the institutions and companies with- out which this conference would not have been possible: Universidade de Aveiro, Institute of Electronics and Telematics Engineering of Aveiro (IEETA), Associa- tion for Computational Linguistics (ACL), International Speech Communication Association (ISCA), ISCA Special Interest Group on Iberian Language (SIG-IL), Funda¸c˜ao para a Ciˆencia e a Tecnologia (FCT), Microsoft, Springer, !UZ Tech- nologies, DESIGNEED and Grande Hotel da Curia.

June 2008 Ant´onio Teixeira

Vera L´ucia Strube de Lima Lu´ıs Caldas de Oliveira Paulo Quaresma

(6)

Organization

Conference Chair

Ant´onio Teixeira DETI/IEETA, Universidade de Aveiro, Portugal

Program Co-chairs

Vera L´ucia Strube

de Lima Pontif´ıcia Universidade Cat´olica do Rio Grande do Sul, Brazil

Lu´ıs Caldas de Oliveira L2F/INESC-ID, IST, Portugal

Publication Chair

Paulo Quaresma Universidade de ´Evora, Portugal

Program Committee

Alexandre Agustini Pontif´ıcia Universidade Cat´olica do Rio Grande do Sul, Brazil

Sandra Aluisio Universidade de S˜ao Paulo, Brazil Am´alia Andrade CLUL, Universidade de Lisboa, Portugal Jorge Baptista Universidade do Algarve, Portugal

Pl´ınio Barbosa Universidade Estadual de Campinas, Brazil Dante Barone Universidade Federal do Rio Grande do Sul,

Brazil

Steven Bird University of Melbourne, Australia

Antonio Bonafonte Universitat Polit`ecnia de Catalunya, Spain Ant´onio Branco Universidade de Lisboa, Portugal

Lu´ıs Caldas de Oliveira INESC-ID/IST, Portugal

Nick Campbell NiCT/ATR, Japan

Diamantino Caseiro INESC-ID, Portugal Berthold Crysmann Bonn University, Germany

Ga¨el Dias Universidade da Beira Interior, Portugal Bento Dias da Silva Universidade Estadual Paulista, Brazil Marcelo Finger IME- USP, Brazil

Diamantino Freitas Faculdade de Engenharia, Universidade do Porto, Portugal

Pablo Gamallo Universidade de Santiago de Compostela, Spain

(7)

VIII Organization

Caroline Hag`ege Xerox Research Centre Europe, France Julia Hirschberg Columbia University, USA

Isabel Hub Faria Universidade de Lisboa, Portugal Tracy Holloway King Palo Alto Research Center, USA

Eric Laporte Universit´e Paris-Est Marne-la-Vall´ee, France Gabriel Lopes Faculdade de Ciˆencias e Tecnologia,

Universidade Nova de Lisboa, Portugal Saturnino Luz Trinity College Dublin, Ireland

L´ucia Machado Rino Dep. de Computa¸c˜ao, Universidade Federal de S˜ao Carlos, Brazil

Sandra Madureira Pontif´ıcia Universidade Cat´olica de S˜ao Paulo, Brazil

Belinda Maia Faculdade de Letras, Universidade do Porto, Portugal

Ranniery Maia ATR Spoken Language Communication Labs, Japan

Nuno Mamede INESC-ID/IST, Portugal Jean-Luc Minel MoDyCo, CNRS, France

Climent Nadeu Universitat Polit`ecnica de Catalunya, Spain Jo˜ao Neto INESC-ID/IST, Portugal

Viviane Moreira Orengo Universidade Federal do Rio Grande do Sul, Brazil

Manuel Palomar Universidad de Alicante, Spain Fernando Perdig˜ao Universidade de Coimbra, Portugal Carlos Prolo Pontif´ıcia Universidade Cat´olica do Rio

Grande do Sul, Brazil

Paulo Quaresma Universidade de ´Evora, Portugal

Violeta Quental Pontif´ıcia Universidade Cat´olica do Rio de Janeiro, Brazil

Elisabete Ranchhod Universidade de Lisboa, Portugal Fernando Gil

Resende Jr. Universidade Federal do Rio de Janeiro, Brazil Ant´onio Ribeiro IPSC, Italy

Irene Rodrigues Departamento de Inform´atica, Universidade de ´Evora, Portugal

Solange Rossato University of Grenoble 3, France

Diana Santos SINTEF, Norway

Lu´ıs Seabra Lopes DETI/IEETA, Universidade de Aveiro, Portugal

Ant´onio Serralheiro INESC-ID and Academia Militar, Portugal Vera Strube de Lima Pontif´ıcia Universidade Cat´olica do Rio

Grande do Sul, Brazil

Ant´onio Teixeira DETI/IEETA, Universidade de Aveiro, Portugal

(8)

Organization IX Ana Maria

Tramunt Iba˜nos Pontif´ıcia Universidade Cat´olica do Rio Grande do Sul, Brazil

Isabel Trancoso INESC-ID/IST, Portugal Jo˜ao Veloso Universidade do Porto, Portugal Renata Vieira UNISINOS, Brazil

Aline Villavicencio Universidade Federal do Rio Grande do Sul, Brazil

F´abio Violaro Universidade Estadual de Campinas, Brazil Maria das

Gra¸cas Volpe Nunes Universidade de S˜ao Paulo, Brazil Dina Wonsever Universidad de la Republica, Uruguay Nestor Yoma Universidad de Chile, Chile

Additional Reviewers

Petra Wagner Bonn University, Germany Lu´ısa Coheur INESC-ID, Portugal Jos´e Adri´an

Rodr´ıguez Fonollosa Universitat Polit`ecnica de Catalunya, Spain Thiago Pardo Universidade de S˜ao Paulo, Brazil

(9)

Table of Contents

Speech Analysis

Event Detection by HMM, SVM and ANN: A Comparative Study . . . . 1 Carla Lopes and Fernando Perdig˜ao

Frication and Voicing Classification . . . . 11 Luis M.T. Jesus and Philip J.B. Jackson

A Spoken Dialog System Speech Interface Based on a Microphone

Array . . . . 21 Gustavo Esteves Coelho, Ant´onio Joaquim Serralheiro, and

Jo˜ao Paulo Neto

Ontologies, Semantics and Anaphora Resolution

PAPEL: A Dictionary-Based Lexical Ontology for Portuguese . . . . 31 Hugo Gon¸calo Oliveira, Diana Santos, Paulo Gomes, and Nuno Seco Comparing Window and Syntax Based Strategies for Semantic

Extraction . . . . 41 Pablo Gamallo Otero

The Mitkov Algorithm for Anaphora Resolution in Portuguese . . . . 51 Amanda Rocha Chaves and Lucia Helena Machado Rino

Semantic Similarity, Ontologies and the Portuguese Language: A Close

Look at the subject . . . . 61 Juliano Baldez de Freitas, Vera L´ucia Strube de Lima, and

Josiane Fontoura dos Anjos Brandolt

Speech Synthesis

Boundary Refining Aiming at Speech Synthesis Applications . . . . 71 Monique V. Nicodem, Sandra G. Kafka, Rui Seara Jr., and Rui Seara Evolutionary-Based Design of a Brazilian Portuguese Recording Script

for a Concatenative Synthesis System . . . . 81 Monique Vit´orio Nicodem, Izabel Christine Seara, Daiana dos Anjos,

Rui Seara Jr., and Rui Seara

DIXI – A Generic Text-to-Speech System for European Portuguese . . . . . 91 S´ergio Paulo, Lu´ıs C. Oliveira, Carlos Mendes, Lu´ıs Figueira,

Renato Cassaca, C´eu Viana, and Helena Moniz

(10)

XII Table of Contents

European Portuguese Articulatory Based Text-to-Speech: First

Results . . . . 101 Ant´onio Teixeira, Catarina Oliveira, and Pl´ınio Barbosa

Machine Learning Applied to Natural Language Processing

Statistical Machine Translation of Broadcast News from Spanish to

Portuguese . . . . 112 Raquel S´anchez Mart´ınez, Jo˜ao Paulo da Silva Neto, and

Diamantino Ant´onio Caseiro

Combining Multiple Features for Automatic Text Summarization

through Machine Learning . . . . 122 Daniel Saraiva Leite and Lucia Helena Machado Rino

Some Experiments on Clustering Similar Sentences of Texts in

Portuguese . . . . 133 Eloize Rossi Marques Seno and Maria das Gra¸cas Volpe Nunes

Portuguese Part-of-Speech Tagging Using Entropy Guided

Transformation Learning . . . . 143 C´ıcero Nogueira dos Santos, Ruy L. Milidi´u, and Ra´ul P. Renter´ıa

Learning Coreference Resolution for Portuguese Texts . . . . 153 Jos´e Guilherme C. de Souza, Patricia Nunes Gon¸calves, and

Renata Vieira

Speech Recognition and Applications

Domain Adaptation of a Broadcast News Transcription System for the

Portuguese Parliament . . . . 163 Lu´ıs Neves, Ciro Martins, Hugo Meinedo, and Jo˜ao Neto

Automatic Classification and Transcription of Telephone Speech in

Radio Broadcast Data . . . . 172 Alberto Abad, Hugo Meinedo, and Jo˜ao Neto

A Platform of Distributed Speech Recognition for the European

Portuguese Language . . . . 182 Jo˜ao Miranda and Jo˜ao P. Neto

Natural Language Processing Tools and Applications

Supporting e-Learning with Language Technology for Portuguese . . . . 192 Mariana Avel˜as, Ant´onio Branco, Rosa Del Gaudio, and

Pedro Martins

(11)

Table of Contents XIII ParaMT: A Paraphraser for Machine Translation . . . . 202

Anabela Barreiro

POSTERS

Natural Language Processing

Second HAREM: New Challenges and Old Wisdom . . . . 212 Diana Santos, Cl´audia Freitas, Hugo Gon¸calo Oliveira, and

Paula Carvalho

Floresta Sint´a(c)tica: Bigger, Thicker and Easier . . . . 216 Cl´audia Freitas, Paulo Rocha, and Eckhard Bick

The Identification and Description of Frozen Prepositional Phrases

through a Corpus-Oriented Study . . . . 220 Milena Garr˜ao, Violeta Quental, Nuno Caminada, and Eckhard Bick CorrefSum: Referencial Cohesion Recovery in Extractive Summaries . . . . 224

Patr´ıcia Nunes Gon¸calves, Renata Vieira, and Lucia Helena Machado Rino

Answering Portuguese Questions . . . . 228 Lu´ıs Fernando Costa and Lu´ıs Miguel Cabral

XisQuˆe: An Online QA Service for Portuguese . . . . 232 Ant´onio Branco, Lino Rodrigues, Jo˜ao Silva, and Sara Silveira

Using Semantic Prototypes for Discourse Status Classification . . . . 236 Sandra Collovini, Luiz Carlos Ribeiro Jr., Patricia Nunes Gon¸calves,

Vinicius Muller, and Renata Vieira

Using System Expectations to Manage User Interactions . . . . 240 Filipe M. Martins, Ana Mendes, Joana Paulo Pardal,

Nuno J. Mamede, and Jo˜ao P. Neto

Speech and Language Processing

Adaptive Modeling and High Quality Spectral Estimation for Speech

Enhancement . . . . 244 Lu´ıs Coelho and Daniela Braga

On the Voiceless Aspirated Stops in Brazilian Portuguese . . . . 248 Mariane Antero Alves, Izabel Christine Seara,

Fernando Santana Pacheco, Simone Klein, and Rui Seara

(12)

XIV Table of Contents

Comparison of Phonetic Segmentation Tools for European

Portuguese . . . . 252 Lu´ıs Figueira and Lu´ıs C. Oliveira

Spoltech and OGI-22 Baseline Systems for Speech Recognition in

Brazilian Portuguese . . . . 256 Nelson Neto, Patrick Silva, Aldebaro Klautau, and Andre Adami

Development of a Speech Recognizer with the Tecnovoz Database . . . . 260 Jos´e Lopes, Cl´audio Neves, Arlindo Veiga, Alexandre Maciel,

Carla Lopes, Fernando Perdig˜ao, and Lu´ıs S´a

Dynamic Language Modeling for the European Portuguese . . . . 264 Ciro Martins, Ant´onio Teixeira, and Jo˜ao Neto

An Approach to Natural Language Equation Reading in Digital Talking

Books . . . . 268 Carlos Juzarte Rolo and Ant´onio Joaquim Serralheiro

Topic Segmentation in a Media Watch System . . . . 272 Rui Amaral and Isabel Trancoso

Author Index . . . . 277

References

Related documents

In Query Languages Supporting Descriptive Rule Mining: A Comparative Study we provide a comparison of features of available relational query languages for data mining, such as DMQL

Hugo Fuks, Catholic University of Rio de Janeiro, Brazil Stephan Lukosch, FernUniversit¨ at in Hagen, Germany..

Alexander Pokahr University of Hamburg, Germany Enrico Pontelli New Mexico State University, USA Alessandro Ricci University of Bologna, Italy. Chiaki Sakama Wakayama

Methodology & Results: In Experiment 1, health older adults and college students were randomly assigned into either premise diversity task group or premise monotonicity task

Antonio Moreno Sandoval Universidad Aut´ onoma de Madrid, Spain Climent Nadeu Camprubi Universitat Polit` ecnica de Catalunya, Spain Juan Luis Navarro Mesa Universidad de Las Palmas

The conference also hosted four workshops and the award ceremony of the Fourth ASP Competition, held and organized prior to the conference by Mario Alviano, Francesco Calimeri,

Ken Satoh National Institute of Informatics and Sokendai, Japan Satoshi Tojo Japan Advanced Institute of Science and

Koen Hindriks Vrije Universiteit Amsterdam, The Netherlands Rolf Hoffmann Darmstadt University of Technology, Germany Sviatlana H öhn University of Luxembourg, Luxembourg