• No results found

1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

N/A
N/A
Protected

Academic year: 2020

Share "1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Proceedings of the

1999 Joint SIGDAT Conference

o n

Empirical Methods in Natural

Language Processing

and

Very Large Corpora

Sponsored by

The Association for Computational Linguistics

SIGDAT

LEXIS-NEXIS, a Division of Reed Elsevier, Inc.

Hong Kong University of Science and Technology

Edited by

Pascale Fung

and

Joe Zhou

(2)
(3)

Proceedings of the

1999 Joint SIGDAT Conference

o n

Empirical Methods in Natural

Language Processing

and

Very Large Corpora

Sponsored by

The Association for Computational Linguistics

SIGDAT

LEXIS-NEXIS, a Division of Reed Elsevier, Inc.

Hong Kong University of Science and Technology

Edited by

Pascale Fung

and

Joe Zhou

(4)

© 1999, Association for Computational Linguistics

Order additional copies from:

Association for Computational Linguistics

75 Paterson Street, Suite 9

New Brunswick, NJ 08901 USA

+ 1-732-342-9100 phone

(5)

SPONSORS:

The Association for Computational Linguistics (ACL)

SIGDAT (ACL's SIG for Linguistic Data and Corpus-based Approaches to NLP)

LEXIS-NEXIS, a Division of Reed Elsevier, Inc.

Hong Kong University of Science and Technology, Human Language Technology Center

INVITED SPEAKERS:

Kenneth W. Church (AT&T Labs-Research)

Richard Schwartz (BBN Technologies)

ORGANIZERS:

Pascale Fung, Chair

Joe Zhou, Co-chair

PROGRAM COMMITTEE:

Jing-Shin Chang

Ken Church

Ido Dagan

Marti Hearst

Huang, Changning

Pierre Isabelle

Lillian Lee

David Lewis

Dan Melamed

Mehryar Mohri

Masaaki Nagata

Richard Sproat

Andreas Stolcke

Ralph Weischedel

Dekai Wu

David Yarowsky

(Behavior Design Corp.)

(AT&T Labs-Research)

(Bar-Ilan University)

(UC-Berkeley)

(Microsoft Research China)

(Xerox Research Europe)

(Comell University)

(AT&T Labs-Research)

(West Group)

(AT&T Labs-Research)

(NTT)

(AT&T Labs-Research)

(SRI International)

(BBN)

(HKUST)

(Johns Hopkins University)

ADDITIONAL REVIEWERS:

Srinivas Bangalore

Rebecca Bruce

Michael Collins

Gregory Grefenstette

Vasileios Hatzivassiloglou

David Hull

Peter Jackson

Christian Jacquemin

Liu, Xiaohu-

Sung Hyon Myaeng

Shimei Pan

Ted Pederson

Roberto Pieraccini

Ellen Riloff

Hinrich Shtitze

Yannis Stylianou

Zhao, Jun

(AT&T Labs-Research)

(Univ. of North Carolina)

(AT&T Labs - Research)

(Xerox Research Europe)

(Columbia University)

(Xerox Research Europe)

(West Group)

(LIMSI)

(HKUST)

(Chunguam National Univ.)

(Columbia University)

(Cal Poly)

(AT&T Labs-Research)

(University of Utah)

(Xerox PARC)

(AT&T Labs-Research)

(HKUST)

FURTHER INFORMATION:

Pascale Fung

Human Language Technology Center

Department of Electrical and Electronic Engineering

University of Science and Tehnology (HKUST)

Clear Water Bay, Kowloon

Hong Kong

Email: [email protected]

. ° .

| / |

Joe Zhou

LEXIS-NEXIS, a Division of Reed Elsevier

9555 Springboro Pike

Dayton, OH 45342

USA

(6)

CONFERENCE PROGRAM

Monday, June 21

8:45-9:00

Welcome

9:00-9:40

INVITED SPEECH

What's Happened Since the First SIGDAT Meeting ?

Kenneth W. Church (AT&T Labs-Research)

9:40-9:50

Short Break

9:50-10:10

Text-Translation Alignment: Three Languages are Better than Two

Michel Simard

10:10-10:30

Mapping Multilingual Hierarchies Using Relaxation Labeling

J. Daud6, L. Padr6 and G. Rigau

10:30-10:50

Improved Alignment Models for Statistical Machine Translation

Franz Josef Och, Christoph Tillmann, and Hermann Ney

10:50-11:10

Cross-Language Information Retrieval for Technical Documents

Atsushi Fujii and Tetsuya Ishikawa

11:10-11:30

Break

11:30-11:50

Boosting Applied to Tagging and PP Attachment

Steven Abney, Robert E. Schapire and Yoram Singer

11:50-12:10

Applying Extrasentential Context to Maximum Entropy Based Tagging with

a Large Semantic and Syntactic Tagset

Ezra Black, Andrew Finch and Ruigiang Zhang

12:10-12:30

Improving POS Tagging Using Machine-Learning Techniques

Llufs Mhrquez, Horacio Rodrfguez, Josep Carmona and Josep Montolio

12:30-14:00

LUNCH

14:00-14:20

Determining the Specificity of Nouns From Text

Sharon A. Caraballo and Eugene Charniak

14:20-14:40

Retrieving Collocations From Korean Text

Seonho Kim, Zooil Yang, Mansuk Song and Jung-Ho Ahn

14:40-15:00

Noun Phrase Coreference as Clustering

Claire Cardie and Kiri Wagstaff

15:00-15:20

Break

15:20-15:40

Language Independent Named Entity Recognition Combining Morphological and

Contextual Evidence

Silviu Cucerzan and David Yarowsky

15:40-16:00

Unsupervised Models for Named Entity Classification

Michael Collins and Yoram Singer

16:00-16:20

Hybrid Disambiguation of Prepositional Phrase Attachment and Interpretation

Sven Hartrumpf

16:20-16:40

HMM Specialization with Selective Lexicalization

Jin-Dong Kim, Sang-Zoo Lee and Hae-Chang Rim

(7)

Tuesday, June 22

9:00-9:40

INVITED SPEECH

Why Doesn't Natural Language Come Naturally?

Richard Schwartz (BBN Technologies)

9:40-9:50

Short Break

9:50-10:10

POS Tags and Decision Trees for Language Modeling

Peter A. Heeman

10:10-10:30

An Information-Theoretic Empirical Analysis of Dependency-Based Feature

Types for Word Prediction Models

Dekai Wu, Zhao Jun and Sui Zhifang

10:30-10:50

Word Informativeness and Automatic Pitch Accent Modeling

Shimei Pan and Kathleen McKeown

10:50-11:10

Learning Discourse Relations with Active Data Selection

Tadashi Nomoto and Yuji Matsumoto

11:10-11:30

Break

11:30-11:50

A Learning Approach to Shallow Parsing

Marcia Mufioz, Vasin Punyakanok, Dan Roth and Dav Zimak

11:50-12:10

12:10-12:30

Guiding a Well-Founded Parser with Corpus Statistics

Amon Seagull and Lenhart Schubert

Exploiting Diversity in Natural Language Processing: Combining Parsers

John Henderson and Eric Bnll

12:30-14:00

LUNCH

14:00-15:10

Panel Discussion

The Future of Language Technologies: Research, Development and Marketing

Ken Church (AT&T), Pierre Isabelle (Xerox Europe), Roberto Pieraccini (AT&T),

John Rausch (Lexis-Nexis), Keh-Yih Su (Behavior Design Corp.), Raphael Wong (Intel)

15:10-15:20

Short Break

15:20-15:40

15:40-16:00

Lexical Ambiguity and Information Retrieval Revisted

Julio Gonzalo, Anselmo Pefias and Felisa Verdejo

Detecting Text Similarity over Short Passages: Exploring Linguistic Feature

Combinations via Machine Learning

Vasileios Hatzivassiloglou, Judith L. Klavans and Eleazar Eskin

16:00-16:20

16:20-16:40

Automated Construction of Weighted String Similarity Measures

J6rg Tiedernann

(8)
(9)

TABLE OF CONTENTS

What's Happened Since the First SIGDAT Meeting ?

( I N V I T E D T A L K )

Kenneth W a r d Church . . . . . . 1

Text-Translation Alignment: Three Languages are Better than Two

Michel S i m a r d . . . 2

Mapping Multilingual Hierarchies Using Relaxation Labeling

J. Daud6, L. Padr6 and G. Rigau . . . 12

Improved Alignment Models for Statistical Machine Translation

Franz J o s e f Och, C h n s t o p h Tillmann, and H e r m a n n N e y . . . , . . . 20

Cross-Language Information Retrieval for Technical Documents

Atsushi Fujii and T e t s u y a I s h i k a w a . . . 29

Boosting Applied to Tagging and PP Attachment

Steven Abney, R o b e r t E. Schapire and Yoram Singer . . . 38

Applying Extrasentential Context to Maximum Entropy Based Tagging With A Large

Semantic And Syntactic Tagset

E z r a Black, A n d r e w Finch and Ruigiang Z h a n g . . . : .. . . 46

Improving POS Tagging Using Machine-Learning Techniques

Llufs Mbxquez, H o r a c i o Rodrfguez, Josep C a r m o n a and J o s e p Montolio . . . 53

Determining the Specificity of Nouns From Text

Sharon A. Caraballo and Eugene Charniak . . . . . . 63

Retrieving Collocations From Korean Text

Seonho K i m , Zooil Yang, M a n s u k Song and J u n g - H o A h n . . . 71

Noun Phrase Coreference as Clustering

Claire Cardie and Kifi W a g s t a f f . . . . . . . 82

Language Independent Named Entity Recognition Combining Morphological and

Contextual Evidence

Silviu Cucerzan and David Yarowsky . . . 90

Unsupervised Models for Named Entity Classification

Michael Collins and Yoram Singer . . . 100

Hybrid Disambiguation of Prepositional Phrase Attachment and Interpretation

Sven H a r t r u m p f . . . 111

HMM Specialization with Selective Lexicalization

J i n - D o n g K i m , S a n g - Z o o L e e and H a e - C h a n g R i m . . . 121

Why Doesn't Natural Language Come Naturally?

( I N V I T E D T A L K )

Richard Schwartz . . . 128

POS Tags and Decision Trees for Language Modeling

Peter A. H e e m a n . . . 129

(10)

An Information-Theoretic Empirical Analysis of Dependency-Based Feature Types

for Word Prediction Models

Dekai Wu, Zhao Jun and Sui Zhifang . . . 138

Word Informativeness and Automatic Pitch Accent Modeling

Shimei Pan and Kathleen McKeown . . . 148

Learning Discourse Relations with Active Data Selection

Tadashi Nomoto and Yuji Matsumoto . . . 158

A Learning Approach to Shallow Parsing

Marcia Mufioz, Vasin Punyakanok, Dan Roth and Dav Zimak . . . 168

Guiding a Well-Founded Parser with Corpus Statistics

Amon Seagull and Lenhart Schubert . . . 179

Exploiting Diversity in Natural Language Processing: Combining Parsers

John Henderson and Eric Bfill . . . 187

Lexical Ambiguity and Information Retrieval Revisted

Julio Gonzalo, Anselmo Pefias and Felisa Verdejo . . . 195

Detecting Text Similarity over Short Passages: Exploring Linguistic Feature

Combinations via Machine Learning

Vasileios Hatzivassiloglou, Judith L. Klavans and Eleazar Eskin . . . 203

Automated Construction of Weighted String Similarity Measures

J6rg Tiedemann . . . 213

Taking the Load Off the Conference Chairs: Towards a Digital Paper Routing Assistant

David Yarowsky and Radu Florian . . . 220

PP-attachment: A Committee Machine Approach

Martha A. Alegre, Josep M. Sopena and Agusti Lloberas . . . 231

Cascaded Grammatical Relation Assignment

Sabine Buchholz, Jorn Veenstra and Walter Daelemans . . . 239

Automatically Merging Lexicons that have Incompatible Part-of-Speech Categories

Daniel Ka-Leung Chan and Dekai Wu . . . 247

An lterative Approach to Estimating Frequencies over a Semantic Hierarchy

Stephen Clark and David Weir . . . 258

Using Subcategorization to Resolve Verb Class Ambiguity

Maria Lapata and Chris Brew . . . 266

Improving Brill's POS Tagger for an Agglutinative Language

Betita Megyesi . . . 275

Corpus-Based Learning for Noun Phrase Coreference Resolution

Wee Meng Soon, Hwee Tou Ng and Chung Yong Lim . . . 285

Corpus-Based Approach for Nominal Compound Analysis for Korean Based on

Linguistic and Statistical Information

Juntae Yoon, Key-Sun Choi and Mansuk Song . . . 292

(11)

A U T H O R I N D E X

S t e v e n A b n e y . . . 38

J u n g - H o A h n . . . 71

M a r t h a A . A l e g r e . . . 231

E z r a B l a c k . . . 46

C h r i s B r e w . . . 266

E r i c B r i l l . . . 187

S a b i n e B u c h h o l z . . . 239

S h a r o n A . C a r a b a l l o . . . 63

C l a i r e C a r d i e . . . 82

J o s e p C a r m o n a . . . 53

D a n i e l K a - L e u n g C h a n . . . 2 4 7 E u g e n e C h a r n i a k . . . 63

K e y - S u n C h o i . . . 292

K e n n e t h W a r d C h u r c h . . . 1

S t e p h e n C l a r k . . . 258

M i c h a e l C o l l i n s . . . 100

S i l v i u C u c e r z a n . . . 90

W a l t e r D a e l e m a n s . . . 239

J . D a u d @ . . . 12

E l e a z a r E s k i n . . . 203

A n d r e w F i n c h . . . 46

R a d u F l o r i a n . . . 220

A t s u s h i F u j i i . . . 29

J u l i o G o n z a l o . . . 195

S v e n H a r t r u m p f . . . 111

V a s i l e i o s H a t z i v a s s i l o g l o u . . . 203

P e t e r A . H e e m a n . . . 129

J o h n H e n d e r s o n . . . 187

T e t s u y a I s h i k a w a . . . 29

J i n - D o n g K i m . . . 121

S e o n h o K i m . . . 71

J u d i t h L . K l a v a n s . . . 203

M a r i a L a p a t a . . . 266

S a n g - Z o o L e e . . . 121

C h u n g Y o n g L i m . . . 285

A g u s t i L l o b e r a s . . . 231

L l u i s M ~ r q u e z . . . 53

Y u j i M a t s u m o t o . . . 158

K a t h l e e n M c K e o w n . . . 148

B e £ t a M e g y e s i . . . 275

J o s e p M o n t o l i o . . . 53

M a r c i a M u f i o z . . . 168

H e r m a n n N e y . . . 20

H w e e T o u N g . . . 285

T a d a s h i N o m o t o . . . 158

F r a n z J o s e f O c h . . . 20

L. P a d r 5 . . . 12

S h i m e i P a n . . . 148

A n s e l m o P e f i a s . . . 195

V a s i n P u n y a k a n o k . . . 168

G . R i g a u . . . 12

H a , e - C h a n g R i m . . . 121

H o r a c i o R o d r i g u e z . . . 53

D a n R o t h . . . 168

R o b e r t E . S c h a p i r e . . . : . . . 38

L e n h a r t S c h u b e r t . . . 179

R i c h a r d S c h w a r t z . . . 128

A m o n S e a g u l l . . . 179

M i c h e l S i m a r d . . . 2

Y o r a m S i n g e r . . . 38, 100 M a n s u k S o n g . . . 71, 292 W e e M e n g S o o n . . . 285

J o s e p M . S o p e n a . . . 231

S u i Z h i f a n g . . . 138

J S r g T i e d e m a n n . . . 213

C h r i s t o p h T i l l m a n n . . . 20

J o r n V e e n s t r a . . . 239

F e l i s a V e r d e j o . . . 195

K i r i W a g s t a f f . . . 82

D a v i d W e i r . . . 258

D e k a i W u . . . 138, 247 Z o o i l Y a n g . . . 71

D a v i d Y a r o w s k y . . . 90, 220 J u n t a e Y o o n . . . 292

R u i g i a n g Z h a n g . . . 46

Z h a o J u n . . . 138

D a y Z i m a k . . . 168

(12)

References

Related documents

ƒ develop a cost savings/cost avoidance model that illustrates the monetary burden methamphetamine use has on various state systems and the dollars that would be saved by

We test the Expectations Hypothesis (EH) plus Rational Expectations (RE) in the Brazilian term-structure of interest rates, using maturities ranging from 2 months to 12 months,

Unlike photogrammetric inspection, ultrasonic inspection requires that the transducer make contact with the asset surface with appropriate force to ensure adequate coupling of

In our study, both α -TOS and 2-DG, when administered as single agents, were limited to inhibit the proliferation of HT29, HeLa and A549 tumor cells, and when administered in

relativi al commercio estero della Comunita euro- pea e al commercio tra gli Stati membri; i dati elaborati in base aile suddivisioni statistiche della

American imports rose 20/o in this period whilst Japan's imports fell by 12% (July to September). The Community's trade balance has been improving throughout 1985 and in the

Proactively support and encourage much greater commercialisation of the output of research through more timely and efficient technology

The goal of this research is to provide recommendations for the key requirements specification of an information system that supports planned and unplanned use of the shared meeting