Proceedings of the
Third SIGHAN Workshop
on Chinese Language Learning
Held in cooperation with ACL-2004
Order copies of this and other ACL proceedings from:
Association for Computational Linguistics (ACL)
73 Landmark Center
East Stroudsburg, PA 18301
USA
iii
ORGANIZERS:
Chair: Qin Lu & Oliver Streiter
Proceedings: Qin Lu & Oliver Streiter
PROGRAM COMMITTEE:
Andi Wu - Microsoft, USA
Changning Huang - Microsoft, China
Chu-ren Huang - Academia Sinica, Taiwan
Joyce Chai - Michigan State Univ, USA
Keh-Jian Chen - Academia Sinica, Taiwan
Li Wenjie - the Hong Kong Polytechnic University, Hong Kong
Martha Palmer - Univ. of Pennsylvania, USA
Nianwen Xue - Univ. of Pennsylvania, USA
Oliver Streiter - EURAC, Italy
Qiang Zhou - Tsinghua University, China
Qing Ma - Ryukoku University, Japan
Qin Lu - The Hong Kong Polytechnic University
Sui Zhifang - Peking University, China
Sun Maosong - Tsinghua University, China
Tom Emerson - Basis Technology Corp, U.S.A.
FURTHER INFORMATION:
Dr. Qin Lu
Department of Computing,
The Hong Kong Polytechnic University,
Hung Hom, Kowloon,
!#"%$&$('*),+-"./01324$56 1/780:940;".<=)>$!?0
@ACBED-FHGIJBKG,L @BEMONQPRFSGHTU3BKVWLHXPRGHTJBKYZU3B[V]\RGS^WIRPJGHTM`_baJA1PRcdZA1Afe1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCe1e g
hjilk 1mn"."./! k;o OpSq o $5 ilk$Rr8!Jr".mn"s0;O"%7t=$CqJmu0v'$ k )Z+6".w;01 h rCr k 1xC"%!#"%$H0
IJBKGHTRM`y-zHBKG|{}zS\RGHT\GS^:NQF-M`~(Pd4\B?ee1eCe1eCe1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1eeCe1e1eCe`
d,B[GHTJ ACGHTNv\GHT LSXPJGHTJ_PRGHT*IJBZ\GS^d,B4~4\RGHTeeCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCeJg5
h qR!8/O".x?:)Q$5 k ;080;"%$5 (r!501q h R k $C!J7+'$ k )Z+-"./0 i ". o
".5/p-IJBKGS_F_bFS\RGHT\GS^WX\?<Bs^P5}ACeeCe1eCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCee6
h Wl6+S!5/788qt|$qR1mJ'$ k )Z+6".w;01W$ k q* !#"%$!5wq i ! k O ($'8 S887+S!CCR".<
HACGHTIJB[\RGHTSL-_bFSBZd,B[F,L-NQFS¡<FS\RG={}zHA1G\G ^:¢FS£1zS\RGWdZFbeeCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e1eCeR¤
)Z+S! k!J7` k S01 h 0801$7"%!#"%$:!5wq*)Q$5H$p-/q5".6: /m[!5` ".".mK! k ".oR¥lh p-$*!#"%7l!5 #"%7f)m[!?080;"¦}78!5O"%$5 $(')Z+6".w;01:)Q$5H$p-/q?0
_bPJGHTJ¡<Bs\P*d,BOL/{}zS\RGHTRGHB[GHT_bF \GHT LHIRBs\G-ª©ACGHT«3\P\G ^¬B[\RPR£1zSPRGHT* S\RGeCe1e1e1e1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1e` g
)Q$*r1".S".6®p k !5m ®O¨v$ k 50!5/q `!5O"s0;#"%70Q'$ k )Z+6".w;01W$ k qw1H01
"s01!5r"ERpS!5O"%$5
¯/zHB[V\Pd,FZLS~BKGST*dZBKF\RGS^y-zHA1GHTdZBe1eCe1eCe1e1e1e1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e%<
)Z+-"./0$ k qwR*1S`!5O"%$5&r o )mK!500;"¦l7!#"%$°$(')Z+H! k !R71 k 0
{}zHP6PRBKM`d,B[GHT«±_LH@|\J²\?a6FHc6B³\zS\R\\G ^NQF´BZ@|\5µ8(FSVPRµ²Pe1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCee1e#¶<
!J7O"%$5&$'¸b".mn".<RpS!5m $*!".¹ $mK$²
o
'$
k
t=8q5"%78!m $5!5".ºWrf8!
k
78+ IJFHBEM` HACGHTNQA1z,Lw{}z<FSGHTM`_²BKG»&F,L @B[GHTRMIJFHG|{}zHA1G\G ^:dZB[\RGHTM`¼8zHB[zNQFeCeCe1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1ee(½<¶
h `!5O"s0;#"%78!5m,t|$qR1mJ'$ k¾ !<<p-mn ¾ !5C¿1!|)Q$5 x? k 0;"%$5&".À k "./$mK$ o
$5!5".
IJBKG-M¬Bs\*_bFS\RGHTSL/y6FHG-M`@A1AfÁ\Af\RGS^WU3A1a<M²FHG{}zHPRB5eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeR
)Z+-"./0 k Â>·Jk !R71#"%$' k $Ãr i !CJ0¸!?018q:$Ä)Q$5H$p-/q°S k ik $CqpS7O".x".o
_bB[PJ²zHB,Å\Rc?\RTJ\?\HL6_bB[²P5a6FHc6B,U3PR´BKV\\GS^W³bc6B[\*@|\RAC^H\e1e1e1eCe1eCee1eCe1e1eCe1eCe1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e
940;".< o /$ o Æ1mK!5O"%$5S0f".Ä)Z+6"./01:)>$m.mK$7!#"%$=>·Jk !J7O"%$5
»°\G6a6BKGd,BOL ÇB[GWdZF\RGS^W¢FHBKª©A1GHT¬F*e1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1e1e1e1e1eCe1e1¤J½
)Q$*r1".S".6 ik$?01$q5"%7!5wq ²·J>ÈZ8!5Op k 0'$ k wR*1S`!5O"%$5&$'3t=!5wqR! k ".¸ k $!RqR78!508Q®¨0
«B[GS\5M`³bGHGSAd,A1JP5Ée1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e1eCeHgCÊ<
h p-`$5!5O"%7w1*!SO"%7$mK h 080;"E *1S/'$ k !& k 8 k pS7Op k
IJB[\M@B[GHTNQPRFW\G ^U3ACz-M`IRBs\GHG{}zHA1GeCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1e1e1e1e1eCe1eCe1e1eCee1eCegCÊJ
h 2Z! k< 7!mKw1*!SO"%7f/k pS71#p k >'$ k )Z+6".w;01w1S`1/78;0
d,B4~4\RGHTSLHXPRGHTJzHPRGSTIJBZ\G ^:dZBKGHTJ ACGHTNv\GHT:e1eCe1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1e/gRg½
$Ë®88q)Z+6".w;01W$ k q* !#"%$'$ k `!5O"s08O"%78!5mt=!J7+-"./ k !5H0;mK!#"%$QÌ
IJB[\¬FZLS¢Bs¼8zS\8^W¯wA1GS\RGS^W_bA1V\GHGWÅbACae1eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCe1eeCe1e1eCe1e6gJ
»°\G-D-Bs\GHT{}zHAJLS~BKGST*dZBKF\RGS^y-zHA1GHTdZB5e1eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1e1e1e1e1eCe1eCe1e1eCe1eeCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1e;gC§R
h mE"E ".<*¸".mn".6pS!m})Q$ k H$ k !Ë940;".<w1S`w702Z$78!5O"%$5C'$ k *!#"%$
»&A1B[TJ\GST*dZB#L ~BKGHTdZBKF,Lw¯/zHA1G»°\GHT\RGS^y-zHA1GHTdZB4eCe1e1eCee1eCe1eCe1e1eCe1eCee1eCe1e1eCe1eCe1e1e1e1e1eCe1eCe1e1eCee1eCe1eCe1e1eCe1eCeSg1 g
h
;S`
k
!5`q*t|1©+H$q'$
k
)Z+6"./01=946/$¨>É$
k
q*>·J
k
!R71#"%$
vii
Technical Program Schedule
Sunday, July 25
8:45-8:50
Welcome
8:50-9:10
Segmentation of Chinese Long Sentences Using Commas
Meixun Jin, Mi-Young Kim, Dongil Kim and Jong-Hyeok Lee
9:15-9:35
A Preliminary Study on Probabilistic Models for Chinese Abbreviations
Jing-Shin Chang and Yu-Tso Lai
9:40-10:00
Document Re-ranking based on Global and Local Terms
Lingpeng Yang, DongHong Ji and Li Tang
Coffee Break
10:30-10:50
Chinese Chunking with Another Type of Spec
Hongqiao Li, Changning Huang, Jianfeng Gao and Xiaozhong Fan
10:55-11:15
Chinese Word Segmentation by Classification of Characters
Chooi-Ling GOH, Masayuki Asahara and Yuji Matsumoto
11:20-11:40
Automated Alignment and Extraction of Bilingual Domain Ontology for
Medical Domain Web Search
Jui-Feng Yeh, Chung-Hsiwn Wu, Ming-Jun Chen and Liang-chih Yu
11:45-12:05
Using Synonym Relations in Chinese Collocation Extraction
Wanyin Li, Qin Lu and Ruifeng Xu
Lunch Break
13:50-14:10
Combining Prosodic and Text Features for Segmentation of Mandarin
Broadcast News
Gina-Anne Levow
14:15-14:35
Automatic Semantic Role Assignment for a Tree Structure
Jia-Ming You and Keh-Jiann Chen
14:40-15:00
A Large-Scale Semantic Structure for Chinese Sentences
Li Tang, Donghong Ji and Lingpeng Yang
15:05-15:25
Aligning Bilingual Corpora Using Sentences Location Information
Weigang Li, Ting Liu, Zhen Wang and Sheng Li
viii
Poster Session
15:50-17:10
An Integrated Method for Chinese Unknown Word Extraction
Zhiyong Luo and Rou Song
15:50-17:10
Adaptive Compression-based Approach for Chinese Pinyin Input
JinHu Huang and David Powers
15:50-17:10
Character-Sense Association and Compounding Template Similarity:
Automatic Semantic Classification
Chao-Jan Chen
15:50-17:10
Combining Neural Networks and Statistics for Chinese Word Sense
Disambiguation
Zhimao Lu, Ting Liu and Sheng Li
15:50-17:10
A Statistical Model for Hangeul-Hanja Conversion in Terminology Domain
Jin-Xia Huang, Sun-Mee Bae and Key-sun Choi
15:50-17:10
Chinese Term Extraction from Web Pages Based on Compound Term
Productivity
Hiroshi Nakagawa, Hiroyuki Kojima and Akira Maeda
15:50-17:10
The Construction of A Chinese Shallow Treebank
Ruifeng Xu, Qin Lu, Yin Li and Wanyin Li
15:50-17:10
Do We Need Chinese Word Segmentation for Statistical Machine
Translation?
Jia Xu, Richard Zens and Hermann Ney
15:50-17:10
A New Chinese Natural Language Understanding Architecture Based on
Multilayer Search Mechanism
Wanxiang Che, Ting Liu and Sheng Li
15:50-17:10
A Semi-Supervised Approach to Build Annotated Corpus for Chinese
Named Entity Recognition
Xiaoshan Fang, Jianfeng Gao and Huanye Sheng
15:50-17:10
An Enhanced Model for Chinese Word Segmentation and Part-of-Speech
Tagging
Feng Jiang, Hui Liu, Yuquan Chen and Ruzhan Lu
SIGHAN
Meeting
17:20-18:20
Organizational Meeting
! "#%$'&(')+*,*-*.*,*-*.*-*.*-*-*.*-*.*,*-*.*-*-*.*-*.*-*,*.*-*-*0/#1 2 '&345 2.26*-*.*-*,*.*-*.*-*-*.*-*.*-*,*.*-*-*.*-*.*-*-*.*,*-*.*-*.*-*-*.*718 9 :3;:<)=3;4')=3>*-*.*,*-*.*-*.*-*-*.*-*.*,*-*.*-*-*.*-*.*-*-*-*-*-*.*-*.*-*? 9 2@A3!B!)C3;D*,*-*.*-*,*.*-*-*.*-*.*-*-*.*,*-*.*-*.*-*-*.*-*.*,*-*.*-*-*E.FG 9 2-3 9 :H4I<+3>*0*-*.*-*.*-*-*-*-*-*.*-*.*-*-*.*-*,*.*-*.*-*-*.*-*.*-*,*.*FF 9 2-3 J2.!4I<)C33*,*-*.*-*.*-*-*-*-*-*.*-*.*-*-*.*-*,*.*-*.*-*-*.*-*.*-*%E.K#? 9 2-3 ")L3;45<#&3J*,*.*-*-*.*-*.*-*-*-*-*-*.*-*.*-*-*.*-*,*.*-*.*-*-*.*-*.*,*NM#/