Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model

被引:0
|
作者
Nguyen, Xuan-Phi [1 ,3 ]
Joty, Shafiq [1 ,2 ]
Kui, Wu [3 ]
Aw, Ai Ti [3 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Salesforce Res, Palo Alto, CA USA
[3] ASTAR Singapore, Inst Infocomm Res I2R, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numerous recent work on unsupervised machine translation (UMT) implies that competent unsupervised translations of low-resource and unrelated languages, such as Nepali or Sinhala, are only possible if the model is trained in a massive multilingual environment, where these low-resource languages are mixed with high-resource counterparts. Nonetheless, while the high-resource languages greatly help kick-start the target low-resource translation tasks, the language discrepancy between them may hinder their further improvement. In this work, we propose a simple refinement procedure to separate languages from a pre-trained multilingual UMT model for it to focus on only the target low-resource task. Our method achieves the state of the art in the fully unsupervised translation tasks of English to Nepali, Sinhala, Gujarati, Latvian, Estonian and Kazakh, with BLEU score gains of 3.5, 3.5, 3.3, 4.1, 4.2, and 3.3, respectively. Our codebase is available at github.com/nxphi47/refine_unsup_multilingual_mt.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Mismatching-aware unsupervised translation quality estimation for low-resource languages
    Azadi, Fatemeh
    Faili, Heshaam
    Dousti, Mohammad Javad
    LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (04) : 1207 - 1231
  • [22] Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation
    Elbayad, Maha
    Sun, Anna
    Bhosale, Shruti
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 14237 - 14253
  • [23] Extending Multilingual BERT to Low-Resource Languages
    Wang, Zihan
    Karthikeyan, K.
    Mayhew, Stephen
    Roth, Dan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2649 - 2656
  • [24] NICT-2 Translation System at WAT-2021: Applying a Pretrained Multilingual Encoder-Decoder Model to Low-resource Language Pairs
    Imamura, Kenji
    Sumita, Eiichiro
    WAT 2021: THE 8TH WORKSHOP ON ASIAN TRANSLATION, 2021, : 90 - 95
  • [25] The Task of Post-Editing Machine Translation for the Low-Resource Language
    Rakhimova, Diana
    Karibayeva, Aidana
    Turarbek, Assem
    APPLIED SCIENCES-BASEL, 2024, 14 (02):
  • [26] Entropy-guided Vocabulary Augmentation of Multilingual Language Models for Low-resource Tasks
    Nag, Arijit
    Samanta, Bidisha
    Mukherjee, Animesh
    Ganguly, Niloy
    Chakrabarti, Soumen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8619 - 8629
  • [27] ON-TRAC' systems for the IWSLT 2021 low-resource speech translation and multilingual speech translation shared tasks
    Lee, Hang
    Barbier, Florentin
    Ha Nguyen
    Tomanshenko, Natalia
    Mdhaffar, Salima
    Gahbiche, Souhir
    Bougares, Fethi
    Lecouteux, Benjamin
    Schwabe, Didier
    Esteve, Yannick
    IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 169 - 174
  • [28] Unsupervised Neural Machine Translation for Low-Resource Domains via Meta-Learning
    Park, Cheonbok
    Tae, Yunwon
    Kim, Taehee
    Yang, Soyoung
    Khan, Mohammad Azam
    Park, Eunjeong
    Choo, Jaegul
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 2888 - 2901
  • [29] Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms
    Chakrabarty, Abhisek
    Dabre, Raj
    Ding, Chenchen
    Utiyama, Masao
    Sumita, Eiichiro
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (07)
  • [30] Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages
    Saxena, Shefali
    Chaurasia, Uttkarsh
    Bansal, Nitin
    Daniel, Philemon
    IETE JOURNAL OF RESEARCH, 2023, 69 (12) : 8848 - 8858