Parallel Sentence Extraction from Comparable Corpora with Neural Network Features

被引:0
|
作者
Chu, Chenhui [1 ]
Dabre, Raj [2 ]
Kurohashi, Sadao [2 ]
机构
[1] Japan Sci & Technol Agcy, Kawaguchi, Saitama, Japan
[2] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
关键词
Parallel Sentence Extraction; Comparable Corpora; Neural Network;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Parallel corpora are crucial for machine translation (MT), however they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for MT. In this paper, we exploit the neural network features acquired from neural MT for parallel sentence extraction. We observe significant improvements for both accuracy in sentence extraction and MT performance.
引用
收藏
页码:2931 / 2935
页数:5
相关论文
共 50 条
  • [1] A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
    Zweigenbaum, Pierre
    Sharoff, Serge
    Rapp, Reinhard
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3828 - 3833
  • [2] Parallel Sentence Alignment from Biomedical Comparable Corpora
    Cardon, Remi
    Grabar, Natalia
    [J]. DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 362 - 366
  • [3] Improved machine translation performance via parallel sentence extraction from comparable corpora
    Munteanu, DS
    Fraser, A
    Marcu, D
    [J]. HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 265 - 272
  • [4] Parallel sentence generation from comparable corpora for improved SMT
    Rauf, Sadaf Abdul
    Schwenk, Holger
    [J]. MACHINE TRANSLATION, 2011, 25 (04) : 341 - 375
  • [5] PEXACC: A Parallel Sentence Mining Algorithm from Comparable Corpora
    Ion, Radu
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2181 - 2188
  • [6] A Holistic Approach to Bilingual Sentence Fragment Extraction from Comparable Corpora
    Khademian, Mahdi
    Taghipour, Kaveh
    Mansour, Saab
    Khadivi, Shahram
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 4073 - 4079
  • [7] Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese-Japanese Wikipedia
    Chu, Chenhui
    Nakazawa, Toshiaki
    Kurohashi, Sadao
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 15 (02)
  • [8] Sentence alignment for monolingual comparable corpora
    Barzilay, R
    Elhadad, N
    [J]. PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2003, : 25 - 32
  • [9] Extracting Parallel Phrases from Comparable Corpora
    Zhang, Jiexin
    Cao, Hailong
    Zhao, Tiejun
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 166 - 169
  • [10] NOISY-PARALLEL AND COMPARABLE CORPORA FILTERING METHODOLOGY FOR THE EXTRACTION OF BI-LINGUAL EQUIVALENT DATA AT SENTENCE LEVEL
    Wolk, Krzysztof
    [J]. COMPUTER SCIENCE-AGH, 2015, 16 (02): : 169 - 184