Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios

被引:0
|
作者
Sun, Haipeng [1 ,2 ]
Wang, Rui [3 ]
Chen, Kehai [4 ]
Utiyama, Masao [4 ]
Sumita, Eiichiro [4 ]
Zhao, Tiejun [1 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] JD AI Res, Beijing, Peoples R China
[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[4] Natl Inst Informat & Commun Technol NICT, Kyoto, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks. However, in real-world scenarios, massive monolingual corpora do not exist for some extremely low-resource languages such as Estonian, and UNMT systems usually perform poorly when there is not adequate training corpus for one language. In this paper, we first define and analyze the unbalanced training data scenario for UNMT. Based on this scenario, we propose UNMT self-training mechanisms to train a robust UNMT system and improve its performance in this case. Experimental results on several language pairs show that the proposed methods substantially outperform conventional UNMT systems.
引用
收藏
页码:3975 / 3981
页数:7
相关论文
共 50 条
  • [41] Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval
    Kulshreshtha, Devang
    Belfer, Robert
    Serban, Iulian Vlad
    Reddy, Siva
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7064 - 7078
  • [42] Self-Training Reinforced Adversarial Adaptation for Machine Fault Diagnosis
    Jiao, Jinyang
    Li, Hao
    Lin, Jing
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (11) : 11649 - 11658
  • [43] SETRED: Self-training with editing
    Li, M
    Zhou, ZH
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 611 - 621
  • [44] Unsupervised Domain Adaptation for Medical Image Segmentation by Disentanglement Learning and Self-Training
    Xie, Qingsong
    Li, Yuexiang
    He, Nanjun
    Ning, Munan
    Ma, Kai
    Wang, Guoxing
    Lian, Yong
    Zheng, Yefeng
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (01) : 4 - 14
  • [45] Deep neural model with self-training for scientific keyphrase extraction
    Zhu, Xun
    Lyu, Chen
    Ji, Donghong
    Liao, Han
    Li, Fei
    PLOS ONE, 2020, 15 (05):
  • [46] Deep Bayesian Self-Training
    Fabio De Sousa Ribeiro
    Francesco Calivá
    Mark Swainson
    Kjartan Gudmundsson
    Georgios Leontidis
    Stefanos Kollias
    Neural Computing and Applications, 2020, 32 : 4275 - 4291
  • [47] Self-training Reduces Flicker in Retranslation-based Simultaneous Translation
    Sen, Sukanta
    Sennrich, Rico
    Zhang, Biao
    Haddow, Barry
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3734 - 3744
  • [48] Confidence Regularized Self-Training
    Zou, Yang
    Yu, Zhiding
    Liu, Xiaofeng
    Kumar, B. V. K. Vijaya
    Wang, Jinsong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5981 - 5990
  • [49] Self-training on graph neural networks for recommendation with implicit feedback
    Qiu, Lin
    Zou, Qi
    KNOWLEDGE-BASED SYSTEMS, 2023, 276
  • [50] Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training Framework
    Li, Zhenyu
    Li, Xiuxing
    Fan, Sunqi
    Wang, Jianyong
    IEEE Transactions on Knowledge and Data Engineering, 2024, 36 (12) : 8996 - 9010