STA: An efficient data augmentation method for low-resource neural machine translation

被引:2
|
作者
Li, Fuxue [1 ,2 ]
Chi, Chuncheng [3 ]
Yan, Hong [2 ]
Liu, Beibei [3 ]
Shao, Mingzhi [3 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China
[2] Yingkou Inst Technol, Coll Elect Engn, Yingkou, Peoples R China
[3] Shenyang Univ Chem Technol, Shenyang, Peoples R China
关键词
Data augmentation; neural machine translation; sentence trunk; mixture; concatenation;
D O I
10.3233/JIFS-230682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based neural machine translation (NMT) has achieved state-of-the-art performance in the NMT paradigm. However, it relies on the availability of copious parallel corpora. For low-resource language pairs, the amount of parallel data is insufficient, resulting in poor translation quality. To alleviate this issue, this paper proposes an efficient data augmentation (DA) method named STA. Firstly, the pseudo-parallel sentence pairs are generated by translating sentence trunks with the target-to-source NMT model. Furthermore, two strategies are introduced to merge the original data and pseudo-parallel corpus to augment the training set. Experimental results on simulated and real low-resource translation tasks show that the proposed method improves the translation quality over the strong baseline, and also outperforms other data augmentation methods. Moreover, the STA method can further improve the translation quality when combined with the back-translation method with the extra monolingual data.
引用
收藏
页码:121 / 132
页数:12
相关论文
共 50 条
  • [1] Data Augmentation for Low-Resource Neural Machine Translation
    Fadaee, Marzieh
    Bisazza, Arianna
    Monz, Christof
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573
  • [2] A Bilingual Templates Data Augmentation Method for Low-Resource Neural Machine Translation
    Li, Fuxue
    Liu, Beibei
    Yan, Hong
    Shao, Mingzhi
    Xie, Peijun
    Li, Jiarui
    Chi, Chuncheng
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 40 - 51
  • [3] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
    Li, Yu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    [J]. INFORMATION, 2020, 11 (05)
  • [4] Generalized Data Augmentation for Low-Resource Translation
    Xia, Mengzhou
    Kong, Xiang
    Anastasopoulos, Antonios
    Neubig, Graham
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5786 - 5796
  • [5] An Efficient Method for Generating Synthetic Data for Low-Resource Machine Translation An empirical study of Chinese, Japanese to Vietnamese Neural Machine Translation
    Thi-Vinh Ngo
    Phuong-Thai Nguyen
    Van Vinh Nguyen
    Thanh-Le Ha
    Le-Minh Nguyen
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [6] Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach
    Sanchez-Cartagena, Victor M.
    Espla-Gomis, Miquel
    Antonio Perez-Ortiz, Juan
    Sanchez-Martinez, Felipe
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8502 - 8516
  • [7] Efficient Low-Resource Neural Machine Translation with Reread and Feedback Mechanism
    Yu, Zhiqiang
    Yu, Zhengtao
    Guo, Junjun
    Huang, Yuxin
    Wen, Yonghua
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (03)
  • [8] A Survey on Low-Resource Neural Machine Translation
    Wang, Rui
    Tan, Xu
    Luo, Renqian
    Qin, Tao
    Liu, Tie-Yan
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4636 - 4643
  • [9] A Survey on Low-resource Neural Machine Translation
    Li H.-Z.
    Feng C.
    Huang H.-Y.
    [J]. Huang, He-Yan (hhy63@bit.edu.cn), 1600, Science Press (47): : 1217 - 1231
  • [10] Transformers for Low-resource Neural Machine Translation
    Gezmu, Andargachew Mekonnen
    Nuernberger, Andreas
    [J]. ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2022, : 459 - 466