Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model

被引:0
|
作者
Nguyen, Xuan-Phi [1 ,3 ]
Joty, Shafiq [1 ,2 ]
Kui, Wu [3 ]
Aw, Ai Ti [3 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Salesforce Res, Palo Alto, CA USA
[3] ASTAR Singapore, Inst Infocomm Res I2R, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numerous recent work on unsupervised machine translation (UMT) implies that competent unsupervised translations of low-resource and unrelated languages, such as Nepali or Sinhala, are only possible if the model is trained in a massive multilingual environment, where these low-resource languages are mixed with high-resource counterparts. Nonetheless, while the high-resource languages greatly help kick-start the target low-resource translation tasks, the language discrepancy between them may hinder their further improvement. In this work, we propose a simple refinement procedure to separate languages from a pre-trained multilingual UMT model for it to focus on only the target low-resource task. Our method achieves the state of the art in the fully unsupervised translation tasks of English to Nepali, Sinhala, Gujarati, Latvian, Estonian and Kazakh, with BLEU score gains of 3.5, 3.5, 3.3, 4.1, 4.2, and 3.3, respectively. Our codebase is available at github.com/nxphi47/refine_unsup_multilingual_mt.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Survey of Low-Resource Machine Translation
    Haddow, Barry
    Bawden, Rachel
    Barone, Antonio Valerio Miceli
    Helcl, Jindrich
    Birch, Alexandra
    COMPUTATIONAL LINGUISTICS, 2022, 48 (03) : 673 - 732
  • [32] Terminology Translation in Low-Resource Scenarios
    Haque, Rejwanul
    Hasanuzzaman, Mohammed
    Way, Andy
    INFORMATION, 2019, 10 (09)
  • [33] UNSUPERVISED DATA SELECTION AND WORD-MORPH MIXED LANGUAGE MODEL FOR TAMIL LOW-RESOURCE KEYWORD SEARCH
    Ni, Chongjia
    Leung, Cheung-Chi
    Wang, Lei
    Chen, Nancy F.
    Ma, Bin
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4714 - 4718
  • [34] MULTILINGUAL MLP FEATURES FOR LOW-RESOURCE LVCSR SYSTEMS
    Thomas, Samuel
    Ganapathy, Sriram
    Hermansky, Hynek
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4269 - 4272
  • [35] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
  • [36] Translation Memories as Baselines for Low-Resource Machine Translation
    Knowles, Rebecca
    Littell, Patrick
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6759 - 6767
  • [37] Adding Visual Information to Improve Multimodal Machine Translation for Low-Resource Language
    Shi, Xiayang
    Yu, Zhenqiang
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [38] The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation
    Ahia, Orevaoghene
    Kreutzer, Julia
    Hooker, Sara
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3316 - 3333
  • [39] How to choose the best pivot language for automatic translation of low-resource languages
    Paul, Michael
    Finch, Andrew
    Sumita, Eiichrio
    ACM Transactions on Asian Language Information Processing, 2013, 12 (04):
  • [40] Transfer Learning for Low-Resource Multilingual Relation Classification
    Nag, Arijit
    Samanta, Bidisha
    Mukherjee, Animesh
    Ganguly, Niloy
    Chakrabarti, Soumen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (02)