Proceedings of the
1999 Joint SIGDAT Conference
o n
Empirical Methods in Natural
Language Processing
and
Very Large Corpora
Sponsored by
The Association for Computational Linguistics
SIGDAT
LEXIS-NEXIS, a Division of Reed Elsevier, Inc.
Hong Kong University of Science and Technology
Edited by
Pascale Fung
and
Joe Zhou
Proceedings of the
1999 Joint SIGDAT Conference
o n
Empirical Methods in Natural
Language Processing
and
Very Large Corpora
Sponsored by
The Association for Computational Linguistics
SIGDAT
LEXIS-NEXIS, a Division of Reed Elsevier, Inc.
Hong Kong University of Science and Technology
Edited by
Pascale Fung
and
Joe Zhou
© 1999, Association for Computational Linguistics
Order additional copies from:
Association for Computational Linguistics
75 Paterson Street, Suite 9
New Brunswick, NJ 08901 USA
+ 1-732-342-9100 phone
SPONSORS:
The Association for Computational Linguistics (ACL)
SIGDAT (ACL's SIG for Linguistic Data and Corpus-based Approaches to NLP)
LEXIS-NEXIS, a Division of Reed Elsevier, Inc.
Hong Kong University of Science and Technology, Human Language Technology Center
INVITED SPEAKERS:
Kenneth W. Church (AT&T Labs-Research)
Richard Schwartz (BBN Technologies)
ORGANIZERS:
Pascale Fung, Chair
Joe Zhou, Co-chair
PROGRAM COMMITTEE:
Jing-Shin Chang
Ken Church
Ido Dagan
Marti Hearst
Huang, Changning
Pierre Isabelle
Lillian Lee
David Lewis
Dan Melamed
Mehryar Mohri
Masaaki Nagata
Richard Sproat
Andreas Stolcke
Ralph Weischedel
Dekai Wu
David Yarowsky
(Behavior Design Corp.)
(AT&T Labs-Research)
(Bar-Ilan University)
(UC-Berkeley)
(Microsoft Research China)
(Xerox Research Europe)
(Comell University)
(AT&T Labs-Research)
(West Group)
(AT&T Labs-Research)
(NTT)
(AT&T Labs-Research)
(SRI International)
(BBN)
(HKUST)
(Johns Hopkins University)
ADDITIONAL REVIEWERS:
Srinivas Bangalore
Rebecca Bruce
Michael Collins
Gregory Grefenstette
Vasileios Hatzivassiloglou
David Hull
Peter Jackson
Christian Jacquemin
Liu, Xiaohu-
Sung Hyon Myaeng
Shimei Pan
Ted Pederson
Roberto Pieraccini
Ellen Riloff
Hinrich Shtitze
Yannis Stylianou
Zhao, Jun
(AT&T Labs-Research)
(Univ. of North Carolina)
(AT&T Labs - Research)
(Xerox Research Europe)
(Columbia University)
(Xerox Research Europe)
(West Group)
(LIMSI)
(HKUST)
(Chunguam National Univ.)
(Columbia University)
(Cal Poly)
(AT&T Labs-Research)
(University of Utah)
(Xerox PARC)
(AT&T Labs-Research)
(HKUST)
FURTHER INFORMATION:
Pascale Fung
Human Language Technology Center
Department of Electrical and Electronic Engineering
University of Science and Tehnology (HKUST)
Clear Water Bay, Kowloon
Hong Kong
Email: [email protected]
. ° .
| / |
Joe Zhou
LEXIS-NEXIS, a Division of Reed Elsevier
9555 Springboro Pike
Dayton, OH 45342
USA
CONFERENCE PROGRAM
Monday, June 21
8:45-9:00
Welcome
9:00-9:40
INVITED SPEECH
What's Happened Since the First SIGDAT Meeting ?
Kenneth W. Church (AT&T Labs-Research)
9:40-9:50
Short Break
9:50-10:10
Text-Translation Alignment: Three Languages are Better than Two
Michel Simard
10:10-10:30
Mapping Multilingual Hierarchies Using Relaxation Labeling
J. Daud6, L. Padr6 and G. Rigau
10:30-10:50
Improved Alignment Models for Statistical Machine Translation
Franz Josef Och, Christoph Tillmann, and Hermann Ney
10:50-11:10
Cross-Language Information Retrieval for Technical Documents
Atsushi Fujii and Tetsuya Ishikawa
11:10-11:30
Break
11:30-11:50
Boosting Applied to Tagging and PP Attachment
Steven Abney, Robert E. Schapire and Yoram Singer
11:50-12:10
Applying Extrasentential Context to Maximum Entropy Based Tagging with
a Large Semantic and Syntactic Tagset
Ezra Black, Andrew Finch and Ruigiang Zhang
12:10-12:30
Improving POS Tagging Using Machine-Learning Techniques
Llufs Mhrquez, Horacio Rodrfguez, Josep Carmona and Josep Montolio
12:30-14:00
LUNCH
14:00-14:20
Determining the Specificity of Nouns From Text
Sharon A. Caraballo and Eugene Charniak
14:20-14:40
Retrieving Collocations From Korean Text
Seonho Kim, Zooil Yang, Mansuk Song and Jung-Ho Ahn
14:40-15:00
Noun Phrase Coreference as Clustering
Claire Cardie and Kiri Wagstaff
15:00-15:20
Break
15:20-15:40
Language Independent Named Entity Recognition Combining Morphological and
Contextual Evidence
Silviu Cucerzan and David Yarowsky
15:40-16:00
Unsupervised Models for Named Entity Classification
Michael Collins and Yoram Singer
16:00-16:20
Hybrid Disambiguation of Prepositional Phrase Attachment and Interpretation
Sven Hartrumpf
16:20-16:40
HMM Specialization with Selective Lexicalization
Jin-Dong Kim, Sang-Zoo Lee and Hae-Chang Rim
Tuesday, June 22
9:00-9:40
INVITED SPEECH
Why Doesn't Natural Language Come Naturally?
Richard Schwartz (BBN Technologies)
9:40-9:50
Short Break
9:50-10:10
POS Tags and Decision Trees for Language Modeling
Peter A. Heeman
10:10-10:30
An Information-Theoretic Empirical Analysis of Dependency-Based Feature
Types for Word Prediction Models
Dekai Wu, Zhao Jun and Sui Zhifang
10:30-10:50
Word Informativeness and Automatic Pitch Accent Modeling
Shimei Pan and Kathleen McKeown
10:50-11:10
Learning Discourse Relations with Active Data Selection
Tadashi Nomoto and Yuji Matsumoto
11:10-11:30
Break
11:30-11:50
A Learning Approach to Shallow Parsing
Marcia Mufioz, Vasin Punyakanok, Dan Roth and Dav Zimak
11:50-12:10
12:10-12:30
Guiding a Well-Founded Parser with Corpus Statistics
Amon Seagull and Lenhart Schubert
Exploiting Diversity in Natural Language Processing: Combining Parsers
John Henderson and Eric Bnll
12:30-14:00
LUNCH
14:00-15:10
Panel Discussion
The Future of Language Technologies: Research, Development and Marketing
Ken Church (AT&T), Pierre Isabelle (Xerox Europe), Roberto Pieraccini (AT&T),
John Rausch (Lexis-Nexis), Keh-Yih Su (Behavior Design Corp.), Raphael Wong (Intel)
15:10-15:20
Short Break
15:20-15:40
15:40-16:00
Lexical Ambiguity and Information Retrieval Revisted
Julio Gonzalo, Anselmo Pefias and Felisa Verdejo
Detecting Text Similarity over Short Passages: Exploring Linguistic Feature
Combinations via Machine Learning
Vasileios Hatzivassiloglou, Judith L. Klavans and Eleazar Eskin
16:00-16:20
16:20-16:40
Automated Construction of Weighted String Similarity Measures
J6rg Tiedernann
TABLE OF CONTENTS
What's Happened Since the First SIGDAT Meeting ?
( I N V I T E D T A L K )Kenneth W a r d Church . . . . . . 1
Text-Translation Alignment: Three Languages are Better than Two
Michel S i m a r d . . . 2
Mapping Multilingual Hierarchies Using Relaxation Labeling
J. Daud6, L. Padr6 and G. Rigau . . . 12
Improved Alignment Models for Statistical Machine Translation
Franz J o s e f Och, C h n s t o p h Tillmann, and H e r m a n n N e y . . . , . . . 20
Cross-Language Information Retrieval for Technical Documents
Atsushi Fujii and T e t s u y a I s h i k a w a . . . 29
Boosting Applied to Tagging and PP Attachment
Steven Abney, R o b e r t E. Schapire and Yoram Singer . . . 38
Applying Extrasentential Context to Maximum Entropy Based Tagging With A Large
Semantic And Syntactic Tagset
E z r a Black, A n d r e w Finch and Ruigiang Z h a n g . . . : .. . . 46
Improving POS Tagging Using Machine-Learning Techniques
Llufs Mbxquez, H o r a c i o Rodrfguez, Josep C a r m o n a and J o s e p Montolio . . . 53
Determining the Specificity of Nouns From Text
Sharon A. Caraballo and Eugene Charniak . . . . . . 63
Retrieving Collocations From Korean Text
Seonho K i m , Zooil Yang, M a n s u k Song and J u n g - H o A h n . . . 71
Noun Phrase Coreference as Clustering
Claire Cardie and Kifi W a g s t a f f . . . . . . . 82
Language Independent Named Entity Recognition Combining Morphological and
Contextual Evidence
Silviu Cucerzan and David Yarowsky . . . 90
Unsupervised Models for Named Entity Classification
Michael Collins and Yoram Singer . . . 100
Hybrid Disambiguation of Prepositional Phrase Attachment and Interpretation
Sven H a r t r u m p f . . . 111
HMM Specialization with Selective Lexicalization
J i n - D o n g K i m , S a n g - Z o o L e e and H a e - C h a n g R i m . . . 121
Why Doesn't Natural Language Come Naturally?
( I N V I T E D T A L K )Richard Schwartz . . . 128
POS Tags and Decision Trees for Language Modeling
Peter A. H e e m a n . . . 129
An Information-Theoretic Empirical Analysis of Dependency-Based Feature Types
for Word Prediction Models
Dekai Wu, Zhao Jun and Sui Zhifang . . . 138
Word Informativeness and Automatic Pitch Accent Modeling
Shimei Pan and Kathleen McKeown . . . 148
Learning Discourse Relations with Active Data Selection
Tadashi Nomoto and Yuji Matsumoto . . . 158
A Learning Approach to Shallow Parsing
Marcia Mufioz, Vasin Punyakanok, Dan Roth and Dav Zimak . . . 168
Guiding a Well-Founded Parser with Corpus Statistics
Amon Seagull and Lenhart Schubert . . . 179
Exploiting Diversity in Natural Language Processing: Combining Parsers
John Henderson and Eric Bfill . . . 187
Lexical Ambiguity and Information Retrieval Revisted
Julio Gonzalo, Anselmo Pefias and Felisa Verdejo . . . 195
Detecting Text Similarity over Short Passages: Exploring Linguistic Feature
Combinations via Machine Learning
Vasileios Hatzivassiloglou, Judith L. Klavans and Eleazar Eskin . . . 203
Automated Construction of Weighted String Similarity Measures
J6rg Tiedemann . . . 213
Taking the Load Off the Conference Chairs: Towards a Digital Paper Routing Assistant
David Yarowsky and Radu Florian . . . 220
PP-attachment: A Committee Machine Approach
Martha A. Alegre, Josep M. Sopena and Agusti Lloberas . . . 231
Cascaded Grammatical Relation Assignment
Sabine Buchholz, Jorn Veenstra and Walter Daelemans . . . 239
Automatically Merging Lexicons that have Incompatible Part-of-Speech Categories
Daniel Ka-Leung Chan and Dekai Wu . . . 247
An lterative Approach to Estimating Frequencies over a Semantic Hierarchy
Stephen Clark and David Weir . . . 258
Using Subcategorization to Resolve Verb Class Ambiguity
Maria Lapata and Chris Brew . . . 266
Improving Brill's POS Tagger for an Agglutinative Language
Betita Megyesi . . . 275
Corpus-Based Learning for Noun Phrase Coreference Resolution
Wee Meng Soon, Hwee Tou Ng and Chung Yong Lim . . . 285
Corpus-Based Approach for Nominal Compound Analysis for Korean Based on
Linguistic and Statistical Information
Juntae Yoon, Key-Sun Choi and Mansuk Song . . . 292
A U T H O R I N D E X
S t e v e n A b n e y . . . 38
J u n g - H o A h n . . . 71
M a r t h a A . A l e g r e . . . 231
E z r a B l a c k . . . 46
C h r i s B r e w . . . 266
E r i c B r i l l . . . 187
S a b i n e B u c h h o l z . . . 239
S h a r o n A . C a r a b a l l o . . . 63
C l a i r e C a r d i e . . . 82
J o s e p C a r m o n a . . . 53
D a n i e l K a - L e u n g C h a n . . . 2 4 7 E u g e n e C h a r n i a k . . . 63
K e y - S u n C h o i . . . 292
K e n n e t h W a r d C h u r c h . . . 1
S t e p h e n C l a r k . . . 258
M i c h a e l C o l l i n s . . . 100
S i l v i u C u c e r z a n . . . 90
W a l t e r D a e l e m a n s . . . 239
J . D a u d @ . . . 12
E l e a z a r E s k i n . . . 203
A n d r e w F i n c h . . . 46
R a d u F l o r i a n . . . 220
A t s u s h i F u j i i . . . 29
J u l i o G o n z a l o . . . 195
S v e n H a r t r u m p f . . . 111
V a s i l e i o s H a t z i v a s s i l o g l o u . . . 203
P e t e r A . H e e m a n . . . 129
J o h n H e n d e r s o n . . . 187
T e t s u y a I s h i k a w a . . . 29
J i n - D o n g K i m . . . 121
S e o n h o K i m . . . 71
J u d i t h L . K l a v a n s . . . 203
M a r i a L a p a t a . . . 266
S a n g - Z o o L e e . . . 121
C h u n g Y o n g L i m . . . 285
A g u s t i L l o b e r a s . . . 231
L l u i s M ~ r q u e z . . . 53
Y u j i M a t s u m o t o . . . 158
K a t h l e e n M c K e o w n . . . 148
B e £ t a M e g y e s i . . . 275
J o s e p M o n t o l i o . . . 53
M a r c i a M u f i o z . . . 168
H e r m a n n N e y . . . 20
H w e e T o u N g . . . 285
T a d a s h i N o m o t o . . . 158
F r a n z J o s e f O c h . . . 20
L. P a d r 5 . . . 12
S h i m e i P a n . . . 148
A n s e l m o P e f i a s . . . 195
V a s i n P u n y a k a n o k . . . 168
G . R i g a u . . . 12
H a , e - C h a n g R i m . . . 121
H o r a c i o R o d r i g u e z . . . 53
D a n R o t h . . . 168
R o b e r t E . S c h a p i r e . . . : . . . 38
L e n h a r t S c h u b e r t . . . 179
R i c h a r d S c h w a r t z . . . 128
A m o n S e a g u l l . . . 179
M i c h e l S i m a r d . . . 2
Y o r a m S i n g e r . . . 38, 100 M a n s u k S o n g . . . 71, 292 W e e M e n g S o o n . . . 285
J o s e p M . S o p e n a . . . 231
S u i Z h i f a n g . . . 138
J S r g T i e d e m a n n . . . 213
C h r i s t o p h T i l l m a n n . . . 20
J o r n V e e n s t r a . . . 239
F e l i s a V e r d e j o . . . 195
K i r i W a g s t a f f . . . 82
D a v i d W e i r . . . 258
D e k a i W u . . . 138, 247 Z o o i l Y a n g . . . 71
D a v i d Y a r o w s k y . . . 90, 220 J u n t a e Y o o n . . . 292
R u i g i a n g Z h a n g . . . 46
Z h a o J u n . . . 138
D a y Z i m a k . . . 168