Dynamic Data Selection and Weighting for Iterative Back-Translation

被引:0
|
作者
Dou, Zi-Yi [1 ]
Anastasopoulos, Antonios [2 ]
Neubig, Graham [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] George Mason Univ, Dept Comp Sci, Fairfax, VA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Back-translation has proven to be an effective method to utilize monolingual data in neural machine translation (NMT), and iteratively conducting back-translation can further improve the model performance. Selecting which monolingual data to back-translate is crucial, as we require that the resulting synthetic data are of high quality and reflect the target domain. To achieve these two goals, data selection and weighting strategies have been proposed, with a common practice being to select samples close to the target domain but also dissimilar to the average general-domain text. In this paper, we provide insights into this commonly used approach and generalize it to a dynamic curriculum learning strategy, which is applied to iterative back-translation models. In addition, we propose weighting strategies based on both the current quality of the sentence and its improvement over the previous iteration. We evaluate our models on domain adaptation, low-resource, and high-resource MT settings and on two language pairs. Experimental results demonstrate that our methods achieve improvements of up to 1.8 BLEU points over competitive baselines.(1)
引用
收藏
页码:5894 / 5904
页数:11
相关论文
共 50 条
  • [1] Improving Back-Translation with Iterative Filtering and Data Selection for Sinhala-English NMT
    Epaliyana, Koshiya
    Ranathunga, Surangika
    Jayasena, Sanath
    [J]. MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON 2021) / 7TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2021, : 438 - 443
  • [2] Iterative Back-Translation for Neural Machine Translation
    Vu Cong Duy Hoang
    Koehn, Philipp
    Haffari, Gholamreza
    Cohn, Trevor
    [J]. NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 18 - 24
  • [3] Iterative Domain-Repaired Back-Translation
    Wei, Hao-Ran
    Zhang, Zhirui
    Chen, Boxing
    Luo, Weihua
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5884 - 5893
  • [4] Revisiting Iterative Back-Translation from the Perspective of Compositional Generalization
    Guo, Yinuo
    Zhu, Hualei
    Lin, Zeqi
    Chen, Bei
    Lou, Jian-Guang
    Zhang, Dongmei
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7601 - 7609
  • [5] Back-translation in Translation Teaching
    刘聪
    [J]. 读与写(教育教学刊), 2018, 15 (10) : 3 - 3
  • [6] Tagged Back-Translation
    Caswell, Isaac
    Chelba, Ciprian
    Grangier, David
    [J]. FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS, 2019, : 53 - 63
  • [7] Improving Sign Language Translation with Monolingual Data by Sign Back-Translation
    Zhou, Hao
    Zhou, Wengang
    Qi, Weizhen
    Pu, Junfu
    Li, Houqiang
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1316 - 1325
  • [8] Understanding Back-Translation at Scale
    Edunov, Sergey
    Ott, Myle
    Auli, Michael
    Grangier, David
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 489 - 500
  • [9] EXPLICIATION AND IMPLICITATION IN BACK-TRANSLATION
    Makkos, Aniko
    Robin, Edina
    [J]. CURRENT TRENDS IN TRANSLATION TEACHING AND LEARNING E, 2014, 1 : 151 - 182
  • [10] Generalizing Back-Translation in Neural Machine Translation
    Graca, Miguel
    Kim, Yunsu
    Schamper, Julian
    Khadivi, Shahram
    Ney, Hermann
    [J]. FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS, 2019, : 45 - 52