Improved machine translation performance via parallel sentence extraction from comparable corpora

被引:0
|
作者
Munteanu, DS [1 ]
Fraser, A [1 ]
Marcu, D [1 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel method for discovering parallel sentences in comparable corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach we extract parallel data from large, Gigaword, Arabic and English newspaper corpora. We evaluate the quality of the extracted data by showing it improves the performance of a baseline statistical machine translation system.
引用
收藏
页码:265 / 272
页数:8
相关论文
共 50 条
  • [1] Parallel sentence generation from comparable corpora for improved SMT
    Rauf, Sadaf Abdul
    Schwenk, Holger
    [J]. MACHINE TRANSLATION, 2011, 25 (04) : 341 - 375
  • [2] Parallel Sentence Extraction from Comparable Corpora with Neural Network Features
    Chu, Chenhui
    Dabre, Raj
    Kurohashi, Sadao
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2931 - 2935
  • [3] A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
    Zweigenbaum, Pierre
    Sharoff, Serge
    Rapp, Reinhard
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3828 - 3833
  • [4] Mining Parallel Resources for Machine Translation from Comparable Corpora
    Pal, Santanu
    Pakray, Partha
    Gelbukh, Alexander
    van Genabith, Josef
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 534 - 544
  • [5] Iterative Sentence-Pair Extraction from Quasi-Parallel Corpora for Machine Translation
    Sarikaya, R.
    Maskey, S.
    Zhang, R.
    Jan, E.
    Wang, D.
    Ramabhadran, B.
    Roukos, S.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 432 - 435
  • [6] Parallel Sentence Alignment from Biomedical Comparable Corpora
    Cardon, Remi
    Grabar, Natalia
    [J]. DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 362 - 366
  • [7] Improving Machine Translation Performance Using Comparable Corpora
    Eisele, Andreas
    Xu, Jia
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 35 - 41
  • [8] PEXACC: A Parallel Sentence Mining Algorithm from Comparable Corpora
    Ion, Radu
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2181 - 2188
  • [9] Vector disambiguation for translation extraction from comparable corpora
    [J]. 1600, Slovene Society Informatika (37):
  • [10] Vector Disambiguation for Translation Extraction from Comparable Corpora
    Apidianaki, Marianna
    Ljubesic, Nikola
    Fiser, Darja
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2013, 37 (02): : 193 - 202