• No results found

Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop

N/A
N/A
Protected

Academic year: 2020

Share "Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Proceedings of the

Fourth Conference on

Computational Natural Language Learning

and of the

Second Learning Language in Logic Workshop

Held in cooperation with ICGI-2000

(2)

Proceedings of the

Fourth Conference on

Computational Natural Language Learning

and of the

Second Learning Language in Logic Workshop

Held in cooperation with ICGI-2000

(3)

Order additional copies from:

Association for Computational Linguistics

75 Paterson Street

New Brunswick, NJ 08901 USA

+1-732-342-9100 phone

(4)

Preface

The joint

Second Learning Language in Logic (LLL-2000) Workshop

and

Fourth Conference on

Computational Natural Language Learning (CoNLL-2000)

took place September 13-14, 2000, at the Instituto Superior T6cnico in Lisbon, Portugal and have been co-organized with the 5th International Colloquium on Grammatical Inference (ICGI-2000).

This volume contains the papers presented during this joint event. More information is available on-line from

h t t p : / / w w w . iri. f r / ~ c n / L L L - 2 0 0 0 / a n d h t t p : / / i c g - w w w , u i a . ac. b e / c o n l l 2 0 0 0 / .

We would like to thank all the authors for submitting their papers and thus making these proceedings possible. We address special thanks to the members of the program committees for their great work which contributed to the high quality of these proceedings. We wish to extend our gratitude to the invited speakers for presenting us with their views on innovative results in Natural Language Processing and Machine Learning.

We are also grateful to the Local Chair Arlindo Oliveira, the members of the Organizing Committee, Ana Fred and Ana T. Freitas, and all other individuals who helped in the organization of this event.

Finally, we would like to thank the sponsors of LLL-2000 and CoNLL-2000 for their generous financial and moral support: the Network of Excellence in Inductive Logic Programming (ILPNet2), the Network of Excellence in Machine Learning (MLNet3), the Computational Linguistics in Flanders research community (CLIF), and SIGNLL (ACL's SIG on Natural Language Learning).

Claire Cardie Walter Daelemans Claire N6dellec Efik Tjong Kim Sang

° o .

(5)

SPONSORS:

CLIF (Computational Linguistics in Flanders)

ILPNet2 (Network of Excellence in Inductive Logic Programming)

MLNet3 (Network of Excellence in Machine Learning)

SIGNLL (ACL's SIG for Natural Language Learning)

INVITED SPEAKERS:

J6rg-Uwe Kietz

Dan Roth

ORGANIZERS:

Claire Cardie (CoNLL)

Walter Daelemans (CoNLL)

Claire N6dellec (LLL)

Erik Tjong Kim Sang (CoNLL)

LOCAL ARRANGEMENTS CHAIR:

Arlindo Oliveira

CoNLL PROGRAM COMMITTEE:

Thorsten Brants

James Cussens

Raymond Mooney

John Nerbonne

Miles Osborne

David Powers

Ronan Reilly

Antal van den Bosch

(Universit~it des Saarlandes)

(University of York)

(University of Texas at Austin)

(University of Groningen)

(University of Edinburgh)

(Flinders University)

(University College Dublin)

(Tilburg University)

(6)

LLL PROGRAM COMMITTEE:

Pieter Adriaans

Roberto Basili

Gilles Bisson

Henrik Bostr0m

Gosse Bouma

James Cussens

Tomaz Erjavec

Daniel Kayser

Suresh Manandhar

Guenter Neumann

Steve Pulman

Christer Samuelsson

Stefan Wrobel

(Syllogic and University of Amsterdam, the Netherlands)

(University of Roma, Italy)

(INRIA, Grenoble, France)

(University of Stockholm, Sweden)

(University of Groningen, the Netherlands)

(University of York, United Kingdom)

(Institute Jozef Stefan, Slovenia)

(LIPN, Universit Paris-Nor& France)

(University of York, United Kingdom)

(DFKI, Saarbrcken, Germany)

(University of Cambridge, United Kingdom)

(Xerox Research Center Europe, Grenoble, France)

(University of Magdeburg, Germany)

FURTHER INFORMATION:

CoNLL and SIGNLL

Walter Daelemans

CNTS Language Technology Group

University of Antwerp (UIA)

Universiteitsplein 1 (building A)

B-2610 Antwerpen, Belgium

e-mail: [email protected]

LLL

Claire N6dellec

Laboratoire de Recherche en informatique (LRI)

UMR 8623 CNRS

Bat 490, Universit6 Paris-Sud

F-91405 Orsay cedex, France

e-mail: [email protected]

(7)

Table of C o n t e n t s

C o N L L - 2 0 0 0 I n v i t e d

Paper

Learning in Natural Language: Theory and Algorithmic Approaches

D a n R o t h . . . 1 C o N L L - 2 0 0 0

Papers

Corpus-Based Grammar Specialization

Nicola C a n c e d d a and C h r i s t e r Samuelsson . . . 7

Pronunciation by Analogy in Normal and Impaired Readers

R.I. D a m p e r a n d Y. M a r c h a n d . . . 13

The Role of Algorithm Bias vs Information Source in Learning Algorithms for Morphosyntactic

Disambiguation

G u y De P a u w a n d W a l t e r Daelemans . . . 19

Increasing our Ignorance of Language: Identifying Language Structure in an Unknown 'Signal'

J o h n Elliott, Eric Atwell a n d Bill W h y t e . . . 25

A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation

G e r a r d Escudero, Lluis M£rquez a n d G e r m a n Rigau . . . 31

Incorporating Position Information into a Maximum Entropy/Minimum Divergence

Translation Model

George Foster . . . 37

Memory-Based Learning for Article Generation

G u i d o Minnen, Francis B o n d a n d A n n C o p e s t a k e . . . 43

Overfitting Avoidance for Stochastic Modeling of Attribute- Value Grammars

Tony Mullen a n d Miles O s b o r n e . . . 49

Learning Distributed Linguistic Classes

S t e p h a n R a a i j m a k e r s . . . 55

Modeling the Effect of Cross-Language Ambiguity on Human Syntax Acquisition

W i l l i a m G r e g o r y Sakas . . . 61

Knowledge-Free Induction of Morphology Using Latent Semantic Analysis

P a t r i c k Schone a n d Daniel J u r a f s k y . . . 67

Using Induced Rules as Complex Features in Memory-Based Language Learning

Antal van d e n Bosch . . . 73

(8)

C o N L L - 2 0 0 0 S h o r t P a p e r s

Using Perfect Sampling in Parameter Estimation of a Whole Sentence Maximum Entropy

Language Model

F. A m a y a a n d J.M. Benedi . . . 79

Experiments on Unsupervised Learning for Extracting Relevant Fragments from Spoken Dialog

Corpus

K o n s t a n t i n Biatov . . . 83

Generating Synthetic Speech Prosody with Lazy Learning in Tree Structures

L a u r e n t Blin a n d L a u r e n t Miclet . . . 87

Inducing Syntactic Categories by Context Distribution Clustering

A l e x a n d e r Clark . . . 91

ALLiS: a Symbolic Learning System for Natural Language Learning

Herv~ Ddjean . . . 95

Combining Text and Heuristics for Cost-Sensitive Spam Filtering

Jose M. GSmez Hidalgo a n d E n r i q u e P u e r t a s Sanz . . . 99

Genetic Algorithms for Feature Relevance Assignment in Memory-Based Language Processing

A n n e Kool, Walter Daelemans a n d J a k u b Zavrel . . . 103

Shallow Parsing by Inferencing with Classifiers

Vasin P u n y a k a n o k a n d D a n R o t h . . . 107

Minimal Commitment and Full Lexical Disambiguation: Balancing Rules and Hidden Markov

Models

P a t r i c k Ruch, R o b e r t Baud, P i e r r e t t e Bouillon and G i l b e r t R o b e r t . . . 111

Learning IE Rules for a Set of Related Concepts

J. T u r m o a n d H. R o d r l g u e z . . . 115

A default First Order Family Weight Determination Procedure for WPD V Models

Hans van H a l t e r e n . . . 119

A Comparison of PCFG Models

Jose Luis Verdfi-Mas, Jorge C a l e r a - R u b i o a n d Rafael C. C a r r a s c o . . . 123

(9)

C o N L L - 2 0 0 0 S h a r e d Task P a p e r s

Introduction to the CoNLL-2000 Shared Task: Chunking

E r i k F. T j o n g K i m S a n g a n d S a b i n e B u c h h o l z . . . 127

Learning Syntactic Structures with XML

H e r v ~ D ~ j e a n . . . 133

A Context Sensitive Maximum Likelihood Approach to Chunking

C h r i s t e r J o h a n s s o n . . . 136

Chunking with Maximum Entropy Models

R o b K o e l i n g . . . 139

Use of Support Vector Learning for Chunk Identification

T a k u K u d o h a n d Yuji M a t s u m o t o . . . 142

Shallow Parsing as Part-of-Speech Tagging

Miles O s b o r n e . . . 145

Improving Chunking by Means of Lexical-Contextual Information in Statistical Language

Models

F e r r a n P l a , A n t o n i o M o l i n a a n d N a t i v i d a d P r i e t o . . . 148

Text Chunking by System Combination

E r i k F. T j o n g K i m S a n g . . . 151

Chunking with WPD V Models

H a n s v a n H a l t e r e n . . . 154

Single-Classifier Memory-Based Phrase Chunking

J o r n V e e n s t r a a n d A n t a l v a n d e n B o s c h . . . 157

Phrase Parsing with Rule Sequence Processors: an Application to the Shared CoNLL Task

M a r c V i l a i n a n d D a v i d D a y . . . 160

Hybrid Text Chunking

G u o D o n g Z h o u , J i a n Su a n d T o n g G u a n T e y . . . 163

(10)

LLL-2000 Invited Paper

Extracting a Domain-Specific Ontology from a Corporate Intranet

JSrg-Uwe Kietz, R a p h a e l Volz and Alexander Maedche . . . . . . 167 L L L - 2 0 0 0

Papers

Learning from a Substructural Perspective

P i e t e r A d r i a a n s and Erik de Haas . . . 176

Incorporating Linguistics Constraints into Inductive Logic Programming

J a m e s Cussens a n d S t e p h e n P u l m a n . . . 184

Learning from Parsed Sentences with INTHELEX

F. Esposito, S. Ferilli, N. Fanizzi and G. Semeraro . . . 194

Inductive Logic Programming for Corpus-Based Acquisition of Semantic Lexicons

Pascale S~billot, P i e r r e t t e Bouillon a n d CEcile Fabre . . . 199

The Acquisition of Word Order by a Computational Learning System

Aline Villavicencio . . . 209

Recognition and Tagging of Compound Verb Groups in Czech

E v a Z ~ k o v ~ , Lubo~ Popellnsk~ a n d Milo~ Nepil . . . 219

(11)

Fourth Conference on

Computational Natural Language Learning

(12)

Preface

CoNLL-2000 is the fourth in a series of meetings organized by SIGNLL, the ACL's SIG on Natural Language Learning. Previous meetings were organized in Madrid, Sydney, and Bergen, co-located with different, but always computational linguistics-oriented, events. We are pleased that this time we could combine efforts with the grammar induction and inductive logic programming for language processing communities.

It is the explicit wish of the SIGNLL board to have the CoNLL meeting address all aspects of computational natural language learning, including issues that are not regularly discussed at computational linguistics meetings, such as computational models of human language acquisition, computational models of the origins and evolution of language, biologically-inspired learning methods, etc.

We are thrilled by the quality and quantity of the submissions, which allowed us to set up an intense but rewarding program with one invited talk, 12 long talks, and joint paper sessions with LLL-2000 and ICGI-2000. On top of that, we introduced two innovations: there are 12 bullet presentations, short talks accompanied by a poster presentation, and a shared task session in which 11 authors report on how their machine learning method performed on our shared task - - the identification of syntactic constituents in text (chunking). In this part of the proceedings, you will find 37 papers providing a useful record of all presentations.

You can find out more about SIGNLL and its activities at h t t p : / / w w w . a c l w e b , o r g / s i g n l l / .

(13)

Second Learning Language in Logic Workshop

(14)

Preface

LLL-2000 is the follow-up of the first LLL workshop held in 1999 in Bled (Slovenia), and co-located with the International Conference on Machine Learning and the International Conference on Logic Programming. This year LLL was integrated with the Fourth Conference on Language Learning (CoNLL) and the Fifth International Colloquium on Grammatical Inference (ICGI) with which LLL shares strong common scientific interests in language learning. The registration to ICGI, CoNLL and LLL was a joint registration so that registrants could freely move belLween the three events.

As in the first edition, LLL has attracted pluridisciplinary submissions from the three research fields

-- Natural Language Processing (NLP), Machine Learning and Computational Logic, demonstrating the

growing interest in NLP methods based on ILP or non-classic logics, and hybrid methods. Relational learning more and more appears as complementary to data analysis in many NLP domains. Relational learning and logic-based learning prove here again their capacity to learn complex structured linguistic resources and knowledge such as ontology and grammar from corpora and explicit background knowledge. The scientific program of LLL-2000 consisted of one invited talk by Jrrg-Uwe Kietz on the acquisition of ontology and seven paper presentations. Six of them are reported here and the paper by Christophe Costa Florencio, accepted for presentation by both LLL and ICGI, has been published in the ICGI proceedings. The joint sessions with ICGI and CoNLL included one invited talk by Dan Roth and paper and poster presentations.

(15)

A u t h o r I n d e x

A d r i a a n s , P i e t e r . . . 176

A m a y a , F . . . 79

Atwell, E r i c . . . 25

B a u d , R o b e r t . . . 111

Benedi, J . M . . . 79

B i a t o v , K o n s t a n t i n . . . 83

Blin, L a u r e n t . . . 87

B o n d , F r a n c i s . . . 43

Bouillon, P i e r r e t t e . . . 111, 199 B u c h h o l z , S a b i n e . . . 127

C a l e r a - R u b i o , J o r g e . . . 123

C a n c e d d a , N i c o l a . . . 7

C a r r a s c o , R a f a e l C . . . 123

Clark, A l e x a n d e r . . . 91

C o p e s t a k e , A n n . . . 43

C u s s e n s , J a m e s . . . 184

D a e l e m a n s , W a l t e r . . . 19, 103 D a m p e r , R . I . . . 13

Day, D a v i d . . . 160

De H a a s , E r i k . . . 176

De P a u w , G u y . . . 19

D~jean, Herv~ . . . 95, 133 E l l i o t t , J o h n . . . 25

E s c u d e r o , G e r a r d . . . 31

E s p o s i t o , F . . . 194

Fabre, CEcile . . . 199

Fanizzi, N . . . 194

Ferilli, S . . . 194

Foster, G e o r g e . . . 37

G S m e z H i d a l g o , Jose M . . . 99

J o h a n s s o n , C h r i s t e r . . . 136

J u r a f s k y , D a n i e l . . . 67

Kietz, J S r g - U w e . . . 167

K o e l i n g , R o b . . . 139

Kool, A n n e . . . 103

K u d o h , T a k u . . . 142

M a e d c h e , A l e x a n d e r . . . 167

M a r c h a n d , Y . . . . . . 13

M ~ r q u e z , Lluis . . . 31

M a t s u m o t o , Yuji . . . 142

Miclet, L a u r e n t . . . 87

M i n n e n , G u i d o . . . 43

M o l i n a , A n t o n i o . . . 148

Mullen, T o n y . . . 49

Nepil, Milo~ . . . 219

O s b o r n e , Miles . . . 49, 145 P l a , F e r r a n . . . 148

P o p e l i n s k ~ , L u b o g . . . 219

P r i e t o , N a t i v i d a d . . . 148

P u e r t a s S a n z , E n r i q u e . . . 99

P u l m a n , S t e p h e n . . . 184

P u n y a k a n o k , V a s i n . . . 107

R a a i j m a k e r s , S t e p h a n . . . 55

R i g a u , G e r m a n . . . 31

R o b e r t , G i l b e r t . . . 111

R o d r l g u e z , H . . . 115

R o t h , D a n . . . 1, 107 R u c h , P a t r i c k . . . 111

Sakas, W i l l i a m G r e g o r y . . . 61

S a m u e l s s o n , C h r i s t e r . . . 7

Schone, P a t r i c k . . . 67

S~billot, P a s c a l e . . . 199

S e m e r a r o , G . . . 194

Su, J i a n . . . 163

Tey, T o n g G u a n . . . 163

T j o n g K i m S a n g , E r i k F . . . 127, 151 T u r m o , J . . . 115

V a n H a l t e r e n , H a n s . . . 119, 154 V a n d e n B o s c h , A n t a l . . . 73, 157 V e e n s t r a , J o r n . . . 157

V e r d d - M a s , J o s e L u i s . . . 123

Vilain, M a r c . . . 160

Villavicencio, A l i n e . . . 209

Volz, R a p h a e l . . . 167

W h y t e , Bill . . . 25

Z£~kov£, E v a . . . 219

Zavrel, J a k u b . . . 103

References

Related documents