• No results found

Sixth Workshop on Very Large Corpora

N/A
N/A
Protected

Academic year: 2020

Share "Sixth Workshop on Very Large Corpora"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Proceedings of the

Sixth Workshop

Oil_

Very Large Corpora

Sponsored by

The Association for Computational Linguistics

ACL's S I G D A T

West Group

Edited by Eugene Charniak

15th-16th August 1998

Universitd de Montrdal

Montreal, Quebec, Canada

(2)
(3)

Proceedings of the

Sixth Workshop

o n

Very Large Corpora

Sponsored by

The Association for Computational Linguistics

ACL's SIGDAT

West Group

Edited by Eugene Charniak

15th-16th August 1998

Universit~ de Montreal

Montreal, Quebec, Canada

(4)
(5)

© 1998 Universitd de Montrdal.

Current A C L members m a y order copies of this and other ACL-related proceedings from:

Association for Computational Lingustics (ACL) 75 Paterson Street, Suite 9

N e w Brunswick, NJ 08901 U S A Tel: +1-732-342-9100

Fax: +1-732-873--0014

rasmusse@cs, rutgers, edu

All other orders (from libraries, institutions and other individuals) should be addressed to:

Morgan Kaufrnann Publishers 340 Pine Street, 6th Floor San Francisco, CA 94104 USA Tel: +1-650-392-2665, ext. 231 Fax: +I-650-982-2665

(6)
(7)

S P O N S O R S :

The Association for Computational Linguistics

SIGDAT, ACL's Special Interest Group for Linguistic Data and Corpus-based Approaches to NLP

West Group

I N V I T E D

S P E A K E R S :

Jan Pedersen

Ellisa Newport

C O N F E R E N C E

C H A I R :

Eugene Charniak

P R O G R A M C O M M I T T E E :

Steven Abney

Eric Brill

Ted Briscoe

Rebecca Bruce

Claire Cardie

Bob Carpenter

Glen Carroll

Ken Church

Michael Collins

Joshua Goodman

Vasilis Hatzivassiloglou

Mark Johnson

Andrew Kehler

John Lafferty

Lillian Lee

Christopher Manning

Dan Melamed

Scott Miller

Raymond Mooney

James Pustejovsky

Lance Ramshaw

Adwait Rathnaparkhi

Ellen Riloff

Hinrich Schfitze

Ralph Weischedel

Janyce Wiebe

Dekai Wu

David Yarowsky

F U R T H E R I N F O R M A T I O N :

Eugene Charniak

Department of Computer Science

115 Waterman Street

Brown University

Box 1910, Providence RI 02912

USA

(8)

S e s s i o n 1

9:00-9:25

9:25-9:50

9:50-10:15

10:15-10:40

10:40-11:15

S e s s i o n

2

11:15-11:40

11:40--12:05

12:05-12:30

12:30-2:00

2:00-2:55

S e s s i o n 3

2:55-3:20

3:20-3:45

3:45-4:20

S e s s i o n 4

4:20-4:45

Programme

S a t u r d a y A u g u s t 15, 1998

Bayesian Stratified Sampling to Assess Corpus Utility

Judith Hochberg, Clint Scovel, Timothy Thomas and Sam Hall

Los Alamos National Laboratory

Encoding Linguistic Corpora

Nancy Ide

Vassar College

Using a Probabilistic Translation Model for Cross-Language Information Retrieval

Jian-Yun Nie, Pierre Isabelle, Pierre Plamondon and George Foster

Universit~ de MontrEal

Using Suffix Arrays to Compute Term Frequency and Document Frequency for All

Substrings in a Corpus

Mikio Yamamoto and Kenneth W. Church

I

University of Tsukuba and AT&T Labs-Research

Break

Semantic Tagging using a Probabilistic Context Free Grammar

Michael Collins and Scott Miller

University of Pennsylvania and BBN Technologies

An Empirical Approach to Conceptual Case Frame Acquisition

Ellen Riloff and Mark Schmelzenbach

University of Utah

Semantic Lexicon Acquisition for Learning Natural Language Interfaces

Cynthia A. Thompson and Raymond J. Mooney

University of Texas

Lunch

I n v i t e d Talk:

The Role of NLP in an Internet Search Engine

Jan Pedersen

Director of Advanced Technology, Infoseek

The Effect of Topological Structure on Hierarchical Text Categorization

Stephen D'Alessio, Keitha Murray, Robert Schiaffino and Aaron Kershenbaum

Iona College and Polytechnic University

Refining the Automatic Identification of Conceptual Relations in Large-scale Corpora

Alex Collier, Mike Pacey and Antoinette Renouf

University of Liverpool

Break

Generalized Unknown Morpheme Guessing for Hybrid POS Tagging of Korean

Jeongwon Cha, Geunbae Lee and Jong-Hyeok

Lee

Pohang University

(9)

4:45-5:10

5:10-5:35

5:35-6:00

Session 5

9:00-9:25

9:25-9:50

9:50-10:15

10:15-10:40

10:40-11:15

Session

6

11:15-11:40

11:40-12:05

12:05-12:30

12:30-2:00

2:00-2:55

Session

7

2:55--3:20

Language Identification With Confidence Limits

David Elworthy

Canon Research Centre Europe

Aligning Tagged Bitezts

Raquel Martlnez, Joseba Abaitua and Arantza Casillas

Universidad Complutense de Madrid, Universidad de Deusto, and Universidad de

Alcai~ de Henares

Towards Unsupervised Extraction of Verb Paradigms .from Large Corpora

Cornelia H. Park?s, Alexander M. Malek and Mitchell P. Marcus

University of Pennsylvania

Sunday August 16, 1998

Can Subcategorisation Probabilities Help a Statistical Parser?

John Carroll, Guido Minnen and Ted Briscoe

University of Sussex and University of Cambridge

Edge-Based Best-First Chart Parsing

Eugene Charniak, Sharon Goldwater and Mark Johnson

Brown University

What Grammars tell us about Corpora: the Case of Reduced Relative Clauses

Paola Merlo and Suzanne Stevenson

University of Pennsylvania and Rutgers University

A Maximum-Entropy Partial Parser for Unrestricted Text

Wojciech Skut and Thorsten Brants

Universit£t des Saarlandes

Break

Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recog-

nition

Andrew Borthwick, John Sterling, Eugene Agichtein and Ralph Grishman

New York University

A Statistical Approach to Anaphora Resolution

Niyu Ge, John Hale and Eugene Charniak

Brown University

A Decision Tree Method .for Finding and Classifying Names in Japanese Texts

Satoshi Sekine, Ralph.Grishman and Hiroyuki Shinnou

New York University and Ibaraki University

Lunch

Invited Talk:

Statistical Language Learning in Biological Devices

Ellisa Newport

George Eastman Professor of Brain and Cognitive Sciences, University of Rochester

POS Tagging versus Classes in Language Modeling

Peter A. Heeman

Oregon Graduate Institute

° * *

(10)

3:20-3:45

3:45-4:20

Session 8

4:20-4:45

4:45-5:10

5:10-5:35

5:35-6:00

Automatic Acquisition of Phrase Grammars .for Stochastic Language Modeling

Giuseppe Riccardi and Sriniivas Bangalore

AT&T Labs- Research

Break

Linear Segmentation and Segment Significance

Min-Yen Kan, Judith L. Klavans and Kathleen R. McKeown

Columbia University

Improving Summarization ti~rough Rhetorical Parsing Tuning

Daniel Marcu

University of Southern CaliJbrnia

Discourse Parsing: A Decision Tree Approach

Tadashi Nomoto and Yuji lViatsumoto

Hitachi Ltd. and Nara Institute of Science and Technology

Mapping Collocational Properties into Machine Learning Features

Janyce M. Wiebe, Kenneth J. McKeever, and Rebecca F. Bruce

New Mexico State University, and Southern Methodist University

(11)

Table of Contents

Programme . . .

ii

Table of Contents . . .

v

Author Index . . .

vii

W o r k s h o p P a p e r s

Judith Hochberg, Clint Scovel, Timothy Thomas and Sam Hall

Bayesian Stratified Sampling to Assess Corpus Utility . . . 1

Nancy Ide

Encoding Linguistic Corpora . . . . . . 9

Jian-Yun Nie, Pierre Isabelle, Pierre Plamondon and George Foster

Using a Probabilistic ~D'anslation Model for Cross-Language Information Retrieval . . . .

18

Mikio Yamamoto and Kenneth W. Church

Using Suffix Arrays to Compute Terra Frequency and Document Frequency f o r All Sub- strings in a Corpus . . .

28

Michael Collins and Scott Miller

Semantic Tagging using a Probabilistic Context Free Grammar . . .

38

Ellen Riloff and Mark Schmelzenbach

A n Empirical Approach to Conceptual Case Frame Acquisition . . .

49

Cynthia A. Thompson and Raymond J. Mooney

Semantic Lexicon Acquisition f o r Learning Natural Language Interfaces . . .

57

Stephen D'Alessio, Keitha Murray, Robert Schiaffino and Aaron Kershenbaum

The Effect of Topological Structure on Hierarchical Text Categorization . . .

66

Alex Collier, Mike Pacey and Antoinette Renouf

Refining the Automatic Identification of Conceptual Relations in Large-scale Corpora . .

76

Jeongwon Cha, Geunbae Lee and Jong-Hyeok Lee

Generalized Unknown Morpheme Guessing for Hybrid P O S Tagging of Korean . . .

85

David Elworthy

Language Identification with Confidence Limits . . .

94

Raquel Martinez, Joseba Abaitua and Arantza Casillas

Aligning Tagged Bitexts . . .

102

Cornelia H. Parkes, Alexander M. Malek and Mitchell P. Marcus

Towards Unsupervised Extraction of Verb Paradigms from Large Corpora . . .

110

John Carroll, Guido Minnen and Ted Briscoe

Can Subcategorisation Probabilities Help a Statistical Parser? . . .

118

Eugene Charniak, Sharon Goldwater and Mark Johnson

Edge-Based Best-First Chart Parsing . . . . . .

127

Paola Merlo and Suzanne Stevenson

What Grammars tell us about Corpora: the Case of Reduced Relative Clauses . . .

134

Wojciech Skut and Thorsten Brants

A Maximum-Entropy Partial Parser for Unrestricted Text . . .

143

Andrew Borthwick, John Sterling, Eugene Agichtein and Ralph Grishman

Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition

152

Niyu Ge, John Hale and Eugene Charniak

A Statistical Approach to Anaphora Resolution . . .

161

Satoshi Sekine, Ralph Grishman and Hiroyuki Shinnou

A Decision Tree Method f o r Finding and Classifying Names in Japanese Texts . . .

171

(12)

Peter A. Heeman

P O S Tagging versus Classes in Language Modeling . . .

179

Giuseppe Riccardi and Srinivas Bangalore

Automatic Acquisition of Phrase Grammars for Stochastic Language Modeling . . .

188

Min-Yen Kan, Judith L. Klavans and Kathleen R. McKeown

Linear Segmentation and Segment Significance . . .

197

Daniel Marcu

Improving Summarization through Rhel,~orical Parsing Tuning . . .

206

Tadashi Nomoto and Yuji Matsumoto

Discourse Parsing: A Decision Tree Approach . . .

216

Janyce M. Wiebe, Kenneth J. McKeever and Rebecca F. Bruce

Mapping CoIlocational Properties into Machine Learning Features . . .

225

(13)

Author Index

A b a l t u a , J . . . 102

A g i c h t e i n , E . . . 152

B a n g a l o r e , S . . . 188

B o r t h w i c k , A . . . 152

B r a n t s , T . . . 143

B r i s c o e , T . . . 118

B r u c e , R . F . . . 225

C a r r o l l , J . . . 118

Casillas, A . . . 102

C h a , J . . . 85

C h a r n i a k , E . . . 127, 161 C h u r c h , K . W . . . . . . 28

Collier, A . . . 76

Collins, M . . . 38

D ' A l e s s i o , S . . . 66

E l w o r t h y , D . . . 94

F o s t e r , G . . . 18

G e , N . . . 161

G o l d w a t e r , S . . . 127

G r i s h m a n , R . . . 152, 171 H a l e , J . . . 161

Hall, S . . . 1

H e e m a n , P . A . . . 179

H o c h b e r g , J . . . 1

I d e , N . . . 9

I s a b e l l e , P. . . . 18

J o h n s o n , M . . . 127

K a n , M . - Y . . . 197

K e r s h e n b a u m , A . . . 66

K l a v a n s , J . L . . . 197

L e e , G . . . 85

Lee, J . - H . . . 85

M a l e k , A . M . . . 110

M a r c u , D . . . 206

M a r c u s , M . P . . . 110

M a r t i n e z , R . . . 102

M a t s u m o t o , Y . . . 216

M c K e e v e r , K . J . . . 225

M c K e o w n , K.lq.: . . . 197

Merlo, P . . . 134

Miller, S . . . . . . 38

M i n n e n , G . . . 118

M o o n e y , R . J . . . 57

M u r r a y , K . . . 66

Nie, J . - Y . . . 18

N o m o t o , T . . . 216

P a c e y , M . . . 76

P a r k e s , C . H . . . 110

P l a m o n d o n , P . . . 18

R e n o u f , A . . . 76

R i c c a r d i , G . . . 188

Riloff, E . . . 49

Schiaffino, R . . . 66

S c h m e l z e n b a c h , M . . . 49

Scovel, C . . . 1

Sekine, S . . . 171

S h i n n o u , H . . . 171

S k u t , W . . . 143

S t e r l i n g , J . . . 152

S t e v e n s o n , S . . . 134

T h o m a s , T . . . 1

T h o m p s o n , C . A . . . 57

W i e b e , J . M . . . 225

Y a m a m o t o , M . . . 28

(14)

References

Related documents

Edge Based Best First Chart Parsing E d g e B a s e d B e s t F i r s t C h a r t P a r s i n g * E u g e n e C h a r n i a k a n d S h a r o n G o l d w a t e r a n d M a r k J o h n s o

State Key Laboratory of Intelligent Technology and Systems, China The Hong Kong University of Science and Technology.. City University of

Enjoy the Paper Lexicology Enjoy the Paper Lexical Semantics via Lexicology Ted Briscoe & A n n Copestake Bran Boguraev Computer Laboratory, Cambridge University IBM Thomas J

Instructions for ACL 2007 Proceedings Unsupervised Classification of Sentiment and Objectivity in Chinese Text TarasZagibalov John Carroll University of Sussex Department of

1 91058 Erlangen Germany E-mail: [email protected] John Kelley University of Wisconsin-Madison Wisconsin IceCube Particle Astrophysics Center.

Lukas Michelbacher, University of Stuttgart Saif Mohammad, University of Maryland Alessandro Moschitti, University of Trento Diarmuid O Seaghdha, Cambridge University Ted