Lecture Notes in Artificial Intelligence 5190
Edited by R. Goebel, J. Siekmann, and W. Wahlster
Subseries of Lecture Notes in Computer Science
António Teixeira
Vera Lúcia Strube de Lima Luís Caldas de Oliveira Paulo Quaresma (Eds.)
Computational
Processing of the
Portuguese Language
8th International Conference, PROPOR 2008 Aveiro, Portugal, September 8-10, 2008
Proceedings
1 3
Series Editors
Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany
Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors
António Teixeira
Universidade de Aveiro, Dep. de Electrónica, Telecomunicações e Informática, and Instituto de Engenharia Electrónica e Telemática de Aveiro (IEETA)
3810-193 Aveiro, Portugal E-mail: [email protected] Vera Lúcia Strube de Lima
Pontifícia Universidade Católica do Rio Grande do Sul Faculdade de Informática, Grupo PLN
90619-900 Porto Alegre, RS, Brazil E-mail: [email protected] Luís Caldas de Oliveira
Universidade Técnica de Lisboa, and INESC-ID, L2F
1000 Lisboa, Portugal E-mail: [email protected] Paulo Quaresma
Universidade de Évora, Departamento de Informática 7000-671 Évora, Portugal
E-mail: [email protected]
Library of Congress Control Number: 2008933855 CR Subject Classification (1998): H.3.1, H.5.2, I.2.1, I.2.7 LNCS Sublibrary: SL 7 – Artificial Intelligence
ISSN 0302-9743
ISBN-10 3-540-85979-9 Springer Berlin Heidelberg New York ISBN-13 978-3-540-85979-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media springer.com
© Springer-Verlag Berlin Heidelberg 2008 Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12513574 06/3180 5 4 3 2 1 0
Preface
The International Conference on Computational Processing on Portuguese, for- merly the Workshop on Computational Processing of the Portuguese Language – PROPOR – is the main event in the area of Natural Language Processing that focuses on Portuguese and the theoretical and technological issues related to this specific language. The meeting has been a very rich forum for the interchange of ideas and partnerships for the research communities dedicated to the automated processing of the Portuguese language.
This year’s PROPOR, the first one to adopt the International Conference la- bel, followed workshops held in Lisbon, Portugal (1993), Curitiba, Brazil (1996), Porto Alegre, Brazil (1998), ´Evora, Portugal (1999), Atibaia, Brazil (2000), Faro, Portugal (2003) and Itatiaia, Brazil (2006).
The constitution of a steering committee (PROPOR Committee), an interna- tional program committee, the adoption of high-standard refereing procedures and the support of the prestigious ACL and ISCA international associations demonstrate the steady development of the field and of its scientific community.
A total of 63 papers were submitted to PROPOR 2008. Each submitted paper received a careful, triple-blind review by the program committee or by their commitment. All those who contributed are mentioned on the following pages. The reviewing process led to the selection of 21 regular papers for oral presentation and 16 short papers for poster sessions.
The workshop and this book were structured around the following main top- ics: Speech Analysis; Ontologies, Semantics and Anaphora Resolution; Speech Synthesis; Machine Learning Applied to Natural Language Processing; Speech Recognition and Natural Language Processing Tools and Applications. Short papers and related posters were organized according to the two main areas of PROPOR: Natural Language Processing and Speech Technology.
This year’s PROPOR had two important novelties: one was the fact that the two main areas of the conference were more equally represented and the other was the inclusion of a special session dedicated to Applications of Por- tuguese Speech and Language Technologies. The special session, promoted by the Microsoft Language Development Center (MLDC), provided an opportunity for university and industrial communities working on portuguese natural lan- guage processing and speech technology to report their most recent products, systems, resources or tools for Portuguese. Two satellite events were also or- ganized in association with PROPOR: the Second HAREM Workshop, Named Entity Recognition in Portuguese, and the workshop “Ten years of Linguateca”.
We would like to express here our thanks to all members of our technical program committee and additional reviewers, as listed on the following pages.
We are especially grateful to our invited speakers, Tanja Schultz (Univer- sity of Karlsruhe and CMU) and Chris Quirk (Microsoft), for their invaluable
VI Preface
contribution, which undoubtedly increased the interest in the conference and its quality.
We are indebted to the PROPOR 2008 secretary, Anabela Viegas, for all her support.
We would like to publicly acknowledge the institutions and companies with- out which this conference would not have been possible: Universidade de Aveiro, Institute of Electronics and Telematics Engineering of Aveiro (IEETA), Associa- tion for Computational Linguistics (ACL), International Speech Communication Association (ISCA), ISCA Special Interest Group on Iberian Language (SIG-IL), Funda¸c˜ao para a Ciˆencia e a Tecnologia (FCT), Microsoft, Springer, !UZ Tech- nologies, DESIGNEED and Grande Hotel da Curia.
June 2008 Ant´onio Teixeira
Vera L´ucia Strube de Lima Lu´ıs Caldas de Oliveira Paulo Quaresma
Organization
Conference Chair
Ant´onio Teixeira DETI/IEETA, Universidade de Aveiro, Portugal
Program Co-chairs
Vera L´ucia Strube
de Lima Pontif´ıcia Universidade Cat´olica do Rio Grande do Sul, Brazil
Lu´ıs Caldas de Oliveira L2F/INESC-ID, IST, Portugal
Publication Chair
Paulo Quaresma Universidade de ´Evora, Portugal
Program Committee
Alexandre Agustini Pontif´ıcia Universidade Cat´olica do Rio Grande do Sul, Brazil
Sandra Aluisio Universidade de S˜ao Paulo, Brazil Am´alia Andrade CLUL, Universidade de Lisboa, Portugal Jorge Baptista Universidade do Algarve, Portugal
Pl´ınio Barbosa Universidade Estadual de Campinas, Brazil Dante Barone Universidade Federal do Rio Grande do Sul,
Brazil
Steven Bird University of Melbourne, Australia
Antonio Bonafonte Universitat Polit`ecnia de Catalunya, Spain Ant´onio Branco Universidade de Lisboa, Portugal
Lu´ıs Caldas de Oliveira INESC-ID/IST, Portugal
Nick Campbell NiCT/ATR, Japan
Diamantino Caseiro INESC-ID, Portugal Berthold Crysmann Bonn University, Germany
Ga¨el Dias Universidade da Beira Interior, Portugal Bento Dias da Silva Universidade Estadual Paulista, Brazil Marcelo Finger IME- USP, Brazil
Diamantino Freitas Faculdade de Engenharia, Universidade do Porto, Portugal
Pablo Gamallo Universidade de Santiago de Compostela, Spain
VIII Organization
Caroline Hag`ege Xerox Research Centre Europe, France Julia Hirschberg Columbia University, USA
Isabel Hub Faria Universidade de Lisboa, Portugal Tracy Holloway King Palo Alto Research Center, USA
Eric Laporte Universit´e Paris-Est Marne-la-Vall´ee, France Gabriel Lopes Faculdade de Ciˆencias e Tecnologia,
Universidade Nova de Lisboa, Portugal Saturnino Luz Trinity College Dublin, Ireland
L´ucia Machado Rino Dep. de Computa¸c˜ao, Universidade Federal de S˜ao Carlos, Brazil
Sandra Madureira Pontif´ıcia Universidade Cat´olica de S˜ao Paulo, Brazil
Belinda Maia Faculdade de Letras, Universidade do Porto, Portugal
Ranniery Maia ATR Spoken Language Communication Labs, Japan
Nuno Mamede INESC-ID/IST, Portugal Jean-Luc Minel MoDyCo, CNRS, France
Climent Nadeu Universitat Polit`ecnica de Catalunya, Spain Jo˜ao Neto INESC-ID/IST, Portugal
Viviane Moreira Orengo Universidade Federal do Rio Grande do Sul, Brazil
Manuel Palomar Universidad de Alicante, Spain Fernando Perdig˜ao Universidade de Coimbra, Portugal Carlos Prolo Pontif´ıcia Universidade Cat´olica do Rio
Grande do Sul, Brazil
Paulo Quaresma Universidade de ´Evora, Portugal
Violeta Quental Pontif´ıcia Universidade Cat´olica do Rio de Janeiro, Brazil
Elisabete Ranchhod Universidade de Lisboa, Portugal Fernando Gil
Resende Jr. Universidade Federal do Rio de Janeiro, Brazil Ant´onio Ribeiro IPSC, Italy
Irene Rodrigues Departamento de Inform´atica, Universidade de ´Evora, Portugal
Solange Rossato University of Grenoble 3, France
Diana Santos SINTEF, Norway
Lu´ıs Seabra Lopes DETI/IEETA, Universidade de Aveiro, Portugal
Ant´onio Serralheiro INESC-ID and Academia Militar, Portugal Vera Strube de Lima Pontif´ıcia Universidade Cat´olica do Rio
Grande do Sul, Brazil
Ant´onio Teixeira DETI/IEETA, Universidade de Aveiro, Portugal
Organization IX Ana Maria
Tramunt Iba˜nos Pontif´ıcia Universidade Cat´olica do Rio Grande do Sul, Brazil
Isabel Trancoso INESC-ID/IST, Portugal Jo˜ao Veloso Universidade do Porto, Portugal Renata Vieira UNISINOS, Brazil
Aline Villavicencio Universidade Federal do Rio Grande do Sul, Brazil
F´abio Violaro Universidade Estadual de Campinas, Brazil Maria das
Gra¸cas Volpe Nunes Universidade de S˜ao Paulo, Brazil Dina Wonsever Universidad de la Republica, Uruguay Nestor Yoma Universidad de Chile, Chile
Additional Reviewers
Petra Wagner Bonn University, Germany Lu´ısa Coheur INESC-ID, Portugal Jos´e Adri´an
Rodr´ıguez Fonollosa Universitat Polit`ecnica de Catalunya, Spain Thiago Pardo Universidade de S˜ao Paulo, Brazil
Table of Contents
Speech Analysis
Event Detection by HMM, SVM and ANN: A Comparative Study . . . . 1 Carla Lopes and Fernando Perdig˜ao
Frication and Voicing Classification . . . . 11 Luis M.T. Jesus and Philip J.B. Jackson
A Spoken Dialog System Speech Interface Based on a Microphone
Array . . . . 21 Gustavo Esteves Coelho, Ant´onio Joaquim Serralheiro, and
Jo˜ao Paulo Neto
Ontologies, Semantics and Anaphora Resolution
PAPEL: A Dictionary-Based Lexical Ontology for Portuguese . . . . 31 Hugo Gon¸calo Oliveira, Diana Santos, Paulo Gomes, and Nuno Seco Comparing Window and Syntax Based Strategies for Semantic
Extraction . . . . 41 Pablo Gamallo Otero
The Mitkov Algorithm for Anaphora Resolution in Portuguese . . . . 51 Amanda Rocha Chaves and Lucia Helena Machado Rino
Semantic Similarity, Ontologies and the Portuguese Language: A Close
Look at the subject . . . . 61 Juliano Baldez de Freitas, Vera L´ucia Strube de Lima, and
Josiane Fontoura dos Anjos Brandolt
Speech Synthesis
Boundary Refining Aiming at Speech Synthesis Applications . . . . 71 Monique V. Nicodem, Sandra G. Kafka, Rui Seara Jr., and Rui Seara Evolutionary-Based Design of a Brazilian Portuguese Recording Script
for a Concatenative Synthesis System . . . . 81 Monique Vit´orio Nicodem, Izabel Christine Seara, Daiana dos Anjos,
Rui Seara Jr., and Rui Seara
DIXI – A Generic Text-to-Speech System for European Portuguese . . . . . 91 S´ergio Paulo, Lu´ıs C. Oliveira, Carlos Mendes, Lu´ıs Figueira,
Renato Cassaca, C´eu Viana, and Helena Moniz
XII Table of Contents
European Portuguese Articulatory Based Text-to-Speech: First
Results . . . . 101 Ant´onio Teixeira, Catarina Oliveira, and Pl´ınio Barbosa
Machine Learning Applied to Natural Language Processing
Statistical Machine Translation of Broadcast News from Spanish to
Portuguese . . . . 112 Raquel S´anchez Mart´ınez, Jo˜ao Paulo da Silva Neto, and
Diamantino Ant´onio Caseiro
Combining Multiple Features for Automatic Text Summarization
through Machine Learning . . . . 122 Daniel Saraiva Leite and Lucia Helena Machado Rino
Some Experiments on Clustering Similar Sentences of Texts in
Portuguese . . . . 133 Eloize Rossi Marques Seno and Maria das Gra¸cas Volpe Nunes
Portuguese Part-of-Speech Tagging Using Entropy Guided
Transformation Learning . . . . 143 C´ıcero Nogueira dos Santos, Ruy L. Milidi´u, and Ra´ul P. Renter´ıa
Learning Coreference Resolution for Portuguese Texts . . . . 153 Jos´e Guilherme C. de Souza, Patricia Nunes Gon¸calves, and
Renata Vieira
Speech Recognition and Applications
Domain Adaptation of a Broadcast News Transcription System for the
Portuguese Parliament . . . . 163 Lu´ıs Neves, Ciro Martins, Hugo Meinedo, and Jo˜ao Neto
Automatic Classification and Transcription of Telephone Speech in
Radio Broadcast Data . . . . 172 Alberto Abad, Hugo Meinedo, and Jo˜ao Neto
A Platform of Distributed Speech Recognition for the European
Portuguese Language . . . . 182 Jo˜ao Miranda and Jo˜ao P. Neto
Natural Language Processing Tools and Applications
Supporting e-Learning with Language Technology for Portuguese . . . . 192 Mariana Avel˜as, Ant´onio Branco, Rosa Del Gaudio, and
Pedro Martins
Table of Contents XIII ParaMT: A Paraphraser for Machine Translation . . . . 202
Anabela Barreiro
POSTERS
Natural Language Processing
Second HAREM: New Challenges and Old Wisdom . . . . 212 Diana Santos, Cl´audia Freitas, Hugo Gon¸calo Oliveira, and
Paula Carvalho
Floresta Sint´a(c)tica: Bigger, Thicker and Easier . . . . 216 Cl´audia Freitas, Paulo Rocha, and Eckhard Bick
The Identification and Description of Frozen Prepositional Phrases
through a Corpus-Oriented Study . . . . 220 Milena Garr˜ao, Violeta Quental, Nuno Caminada, and Eckhard Bick CorrefSum: Referencial Cohesion Recovery in Extractive Summaries . . . . 224
Patr´ıcia Nunes Gon¸calves, Renata Vieira, and Lucia Helena Machado Rino
Answering Portuguese Questions . . . . 228 Lu´ıs Fernando Costa and Lu´ıs Miguel Cabral
XisQuˆe: An Online QA Service for Portuguese . . . . 232 Ant´onio Branco, Lino Rodrigues, Jo˜ao Silva, and Sara Silveira
Using Semantic Prototypes for Discourse Status Classification . . . . 236 Sandra Collovini, Luiz Carlos Ribeiro Jr., Patricia Nunes Gon¸calves,
Vinicius Muller, and Renata Vieira
Using System Expectations to Manage User Interactions . . . . 240 Filipe M. Martins, Ana Mendes, Joana Paulo Pardal,
Nuno J. Mamede, and Jo˜ao P. Neto
Speech and Language Processing
Adaptive Modeling and High Quality Spectral Estimation for Speech
Enhancement . . . . 244 Lu´ıs Coelho and Daniela Braga
On the Voiceless Aspirated Stops in Brazilian Portuguese . . . . 248 Mariane Antero Alves, Izabel Christine Seara,
Fernando Santana Pacheco, Simone Klein, and Rui Seara
XIV Table of Contents
Comparison of Phonetic Segmentation Tools for European
Portuguese . . . . 252 Lu´ıs Figueira and Lu´ıs C. Oliveira
Spoltech and OGI-22 Baseline Systems for Speech Recognition in
Brazilian Portuguese . . . . 256 Nelson Neto, Patrick Silva, Aldebaro Klautau, and Andre Adami
Development of a Speech Recognizer with the Tecnovoz Database . . . . 260 Jos´e Lopes, Cl´audio Neves, Arlindo Veiga, Alexandre Maciel,
Carla Lopes, Fernando Perdig˜ao, and Lu´ıs S´a
Dynamic Language Modeling for the European Portuguese . . . . 264 Ciro Martins, Ant´onio Teixeira, and Jo˜ao Neto
An Approach to Natural Language Equation Reading in Digital Talking
Books . . . . 268 Carlos Juzarte Rolo and Ant´onio Joaquim Serralheiro
Topic Segmentation in a Media Watch System . . . . 272 Rui Amaral and Isabel Trancoso
Author Index . . . . 277