Bilingual sentence alignment based on punctuation statistics and lexicon

被引:0
|
作者
Chuang, TC
Wu, JC
Lin, T
Shei, WC
Chang, JS
机构
[1] Vanung Univ, Dept Comp Sci, Chungli 320, Taiwan
[2] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 300, Taiwan
[3] Natl Chiao Tung Univ, Dept Telecommun, Hsinchu 300, Taiwan
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new method of aligning bilingual parallel texts based on punctuation statistics and lexical information, It is demonstrated that the punctuation statistics prove to be effective means to achieve good results. The task of sentence alignment of bilingual texts written in disparate language pairs like English and Chinese is reportedly more difficult. We examine the feasibility of using punctuations for high accuracy sentence alignment. Encouraging precision rate is demonstrated in aligning sentences in bilingual parallel corpora based solely on punctuation statistics. Improved results were obtained when both punctuation statistics and lexical information were employed. We have experimented with an implementation of the proposed method on the parallel corpora of Sinorama Magazine and Records of the Hong Kong Legislative Council with satisfactory results.
引用
收藏
页码:224 / 232
页数:9
相关论文
共 50 条
  • [1] Sentence alignment based on the text length between punctuation marks
    Fattah, Mohamed Abdel
    Ren, Fuji
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2008, 11 (04): : 445 - 465
  • [2] Adaptive bilingual sentence alignment
    Chuang, TC
    You, GN
    Chang, JS
    MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 21 - 30
  • [3] Bilingual dictionary based sentence alignment for Chinese English bitext
    Zhao, TJ
    Yang, MY
    Qian, LP
    Fang, GL
    ADVANCES IN MULTIMODAL INTERFACES - ICMI 2000, PROCEEDINGS, 2000, 1948 : 253 - 259
  • [4] Fast and accurate sentence alignment of bilingual corpora
    Moore, RC
    MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 135 - 144
  • [5] Bilingual sentence alignment: balancing robustness and accuracy
    Simard, M.
    Plamondon, P.
    Machine Translation, 1998, 13 (01): : 59 - 80
  • [7] The Acquisition and Sentence Alignment for Academic Bilingual Resources Based on Web Paper Libraries
    Sun, Yueheng
    Men, Rui
    Ni, Weijie
    2009 INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN COMPUTER SCIENCE, ICRCCS 2009, 2009, : 45 - 48
  • [8] Graph-Based Bilingual Sentence Alignment from Large Scale Web Pages
    Zhu, Yihe
    Wang, Haofen
    Ouyang, Xixiu
    Yu, Yong
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 209 - 216
  • [9] Bilingual Word Embedding with Sentence Combination CNN for 1-to-N Sentence Alignment
    Ren, Xinyuan
    Fu, Xiangling
    Zhou, Xuesi
    Liu, Chunsheng
    Gao, Songfeng
    Peng, Lei
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 119 - 124
  • [10] Experiments with a PPM Compression-Based Method for English-Chinese Bilingual Sentence Alignment
    Liu, Wei
    Chang, Zhipeng
    Teahan, William J.
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 70 - 81