Generalizing Back-Translation in Neural Machine Translation

被引:0
|
作者
Graca, Miguel [1 ,3 ]
Kim, Yunsu [1 ]
Schamper, Julian [1 ,3 ]
Khadivi, Shahram [2 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Human Language Technol & Pattern Recognit Grp, Aachen, Germany
[2] eBay Inc, Aachen, Germany
[3] DeepL GmbH, Cologne, Germany
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Back-translation - data augmentation by translating target monolingual data - is a crucial component in modern neural machine translation (NMT). In this work, we reformulate back-translation in the scope of cross-entropy optimization of an NMT model, clarifying its underlying mathematical assumptions and approximations beyond its heuristic usage. Our formulation covers broader synthetic data generation schemes, including sampling from a target-to-source NMT model. With this formulation, we point out fundamental problems of the sampling-based approaches and propose to remedy them by (i) disabling label smoothing for the target-to-source model and (ii) sampling from a restricted search space. Our statements are investigated on the WMT 2018 German <-> English news translation task.
引用
收藏
页码:45 / 52
页数:8
相关论文
共 50 条
  • [31] Iterative Domain-Repaired Back-Translation
    Wei, Hao-Ran
    Zhang, Zhirui
    Chen, Boxing
    Luo, Weihua
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5884 - 5893
  • [32] 'Untranslatable Testimony': Paul Celan in Back-Translation
    Taylor, Byron
    [J]. TRANSLATION AND LITERATURE, 2020, 29 (03) : 411 - 426
  • [33] Back-Translation for Discovering Distant Protein Homologies
    Girdea, Marta
    Noe, Laurent
    Kucherov, Gregory
    [J]. ALGORITHMS IN BIOINFORMATICS, PROCEEDINGS, 2009, 5724 : 108 - 120
  • [34] FPGA Acceleration of Protein Back-Translation and Alignment
    Salamat, Sahand
    Kang, Jaeyoung
    Kim, Yeseong
    Imani, Mohsen
    Moshiri, Niema
    Rosing, Tajana
    [J]. PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 822 - 827
  • [35] BACK-TRANSLATION FOR CROSS-CULTURAL RESEARCH
    BRISLIN, RW
    [J]. JOURNAL OF CROSS-CULTURAL PSYCHOLOGY, 1970, 1 (03) : 185 - 216
  • [36] Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation
    Ye, Jinhui
    Jiao, Wenxiang
    Wang, Xing
    Tu, Zhaopeng
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 463 - 476
  • [37] Fully-Abstract Compilation by Approximate Back-Translation
    Devriese, Dominique
    Patrignani, Marco
    Piessens, Frank
    [J]. ACM SIGPLAN NOTICES, 2016, 51 (01) : 164 - 177
  • [38] HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints
    Ramnath, Sahana
    Johnson, Melvin
    Gupta, Abhirut
    Raghuveer, Aravindan
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1717 - 1733
  • [39] Text Mining a Self-Report Back-Translation
    Blanch, Angel
    Aluja, Anton
    [J]. PSYCHOLOGICAL ASSESSMENT, 2016, 28 (06) : 750 - 764
  • [40] Dynamic Data Selection and Weighting for Iterative Back-Translation
    Dou, Zi-Yi
    Anastasopoulos, Antonios
    Neubig, Graham
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5894 - 5904