Back-Translation Sampling by Targeting DifficultWords in Neural Machine Translation

被引:0
|
作者
Fadaee, Marzieh [1 ]
Monz, Christof [1 ]
机构
[1] Univ Amsterdam, Informat Inst, Sci Pk 904, NL-1098 XH Amsterdam, Netherlands
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural Machine Translation has achieved state-of-the-art performance for several language pairs using a combination of parallel and synthetic data. Synthetic data is often generated by back-translating sentences randomly sampled from monolingual data using a reverse translation model. While backtranslation has been shown to be very effective in many cases, it is not entirely clear why. In this work, we explore different aspects of back-translation, and show that words with high prediction loss during training benefit most from the addition of synthetic data. We introduce several variations of sampling strategies targeting difficult-to-predict words using prediction losses and frequencies of words. In addition, we also target the contexts of difficult words and sample sentences that are similar in context. Experimental results for the WMT news translation task show that our method improves translation quality by up to 1.7 and 1.2 BLEU points over back-translation using random sampling for German NEnglish and English NGerman, respectively.
引用
收藏
页码:436 / 446
页数:11
相关论文
共 50 条
  • [1] Iterative Back-Translation for Neural Machine Translation
    Vu Cong Duy Hoang
    Koehn, Philipp
    Haffari, Gholamreza
    Cohn, Trevor
    [J]. NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 18 - 24
  • [2] Generalizing Back-Translation in Neural Machine Translation
    Graca, Miguel
    Kim, Yunsu
    Schamper, Julian
    Khadivi, Shahram
    Ney, Hermann
    [J]. FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS, 2019, : 45 - 52
  • [3] Neural Machine Translation Based on Back-Translation for Multilingual Translation Evaluation Task
    Lai, Siyu
    Yang, Yueting
    Xu, Jin'an
    Chen, Yufeng
    Huang, Hui
    [J]. MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 132 - 141
  • [4] Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation
    Wu, Jiawei
    Wang, Xin
    Wang, William Yang
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1173 - 1183
  • [5] On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation
    Liu, Xuebo
    Wang, Longyue
    Wong, Derek F.
    Ding, Liang
    Chao, Lidia S.
    Shi, Shuming
    Tu, Zhaopeng
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2900 - 2907
  • [6] On The Evaluation of Machine Translation Systems Trained With Back-Translation
    Edunov, Sergey
    Ott, Myle
    Ranzato, Marc'Aurelio
    Auli, Michael
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2836 - 2846
  • [7] Back-translation in Translation Teaching
    刘聪
    [J]. 读与写(教育教学刊), 2018, 15 (10) : 3 - 3
  • [8] A Joint Back-Translation and Transfer Learning Method for Low-Resource Neural Machine Translation
    Luo, Gong-Xu
    Yang, Ya-Ting
    Dong, Rui
    Chen, Yan-Hong
    Zhang, Wen-Bo
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [9] Enhancement of English-Bengali Machine Translation Leveraging Back-Translation
    Mondal, Subrota Kumar
    Wang, Chengwei
    Chen, Yijun
    Cheng, Yuning
    Huang, Yanbo
    Dai, Hong-Ning
    Kabir, H. M. Dipu
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [10] Evaluation of the Validity of Back-Translation as a Method of Assessing the Accuracy of Machine Translation
    Miyabe, Mai
    Yoshino, Takashi
    [J]. 2015 INTERNATIONAL CONFERENCE ON CULTURE AND COMPUTING (CULTURE COMPUTING), 2015, : 145 - 150