Data Augmentation for Low-Resource Neural Machine Translation

被引:200
|
作者
Fadaee, Marzieh [1 ]
Bisazza, Arianna [1 ]
Monz, Christof [1 ]
机构
[1] Univ Amsterdam, Inst Informat, Sci Pk 904, NL-1098 XH Amsterdam, Netherlands
关键词
D O I
10.18653/v1/P17-2090
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.
引用
收藏
页码:567 / 573
页数:7
相关论文
共 50 条
  • [1] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
    Li, Yu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    [J]. INFORMATION, 2020, 11 (05)
  • [2] A Bilingual Templates Data Augmentation Method for Low-Resource Neural Machine Translation
    Li, Fuxue
    Liu, Beibei
    Yan, Hong
    Shao, Mingzhi
    Xie, Peijun
    Li, Jiarui
    Chi, Chuncheng
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 40 - 51
  • [3] STA: An efficient data augmentation method for low-resource neural machine translation
    Li, Fuxue
    Chi, Chuncheng
    Yan, Hong
    Liu, Beibei
    Shao, Mingzhi
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (01) : 121 - 132
  • [4] Generalized Data Augmentation for Low-Resource Translation
    Xia, Mengzhou
    Kong, Xiang
    Anastasopoulos, Antonios
    Neubig, Graham
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5786 - 5796
  • [5] Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach
    Sanchez-Cartagena, Victor M.
    Espla-Gomis, Miquel
    Antonio Perez-Ortiz, Juan
    Sanchez-Martinez, Felipe
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8502 - 8516
  • [6] A Survey on Low-Resource Neural Machine Translation
    Wang, Rui
    Tan, Xu
    Luo, Renqian
    Qin, Tao
    Liu, Tie-Yan
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4636 - 4643
  • [7] A Survey on Low-resource Neural Machine Translation
    Li, Hong-Zheng
    Feng, Chong
    Huang, He-Yan
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2021, 47 (06): : 1217 - 1231
  • [8] Transformers for Low-resource Neural Machine Translation
    Gezmu, Andargachew Mekonnen
    Nuernberger, Andreas
    [J]. ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2022, : 459 - 466
  • [9] Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation
    Pang, Jianhui
    Yang, Baosong
    Wong, Derek Fai
    Wan, Yu
    Liu, Dayiheng
    Chao, Lidia Sam
    Xie, Jun
    [J]. COMPUTATIONAL LINGUISTICS, 2023, 50 (01) : 25 - 47
  • [10] Low-Resource Neural Machine Translation with Neural Episodic Control
    Wu, Nier
    Hou, Hongxu
    Sun, Shuo
    Zheng, Wei
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,