Improved Unsupervised Neural Machine Translation with Semantically Weighted Back Translation for Morphologically Rich and Low Resource Languages

被引:5
|
作者
Chauhan, Shweta [1 ]
Saxena, Shefali [1 ]
Daniel, Philemon [1 ]
机构
[1] Natl Inst Technol, Dept Elect & Commun, Hamirpur 177005, Himachal Prades, India
关键词
Back translation; Neural machine translation; Evaluation metrics; Semantic analysis; AUTOMATIC EVALUATION;
D O I
10.1007/s11063-021-10702-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The effective method to utilize monolingual data and enhance the performance of neural machine translation models is back-translation. Iteratively conducting back-translation can further improve the performance of the translation model. In back-translation where, pseudo sentence pairs are generated to train the translation systems with a reconstruction loss, but all the pseudo sentence pairs are not of good quality, which can severely impact the performance of neural machine translation systems. This paper proposes an approach to unsupervised learning for neural machine translation with weighted back translation as part of the training process, as it provides more weight to good pseudo-parallel sentence pairs. The weight is calculated as the round-trip semantic similarity score for each pseudo-parallel sentence. We overcome the limitation of earlier lexical metric-based approaches, especially in the case of morphologically rich languages. Experimental results show an improvement of up to around 0.7% BLEU score over the baseline paper for morphologically rich language (English-Hindi, English-Tamil, and English-Telugu) and 0.3% BLEU score for low resource Hindi-Kangri language.
引用
收藏
页码:1707 / 1726
页数:20
相关论文
共 50 条
  • [1] Improved Unsupervised Neural Machine Translation with Semantically Weighted Back Translation for Morphologically Rich and Low Resource Languages
    Shweta Chauhan
    Shefali Saxena
    Philemon Daniel
    [J]. Neural Processing Letters, 2022, 54 : 1707 - 1726
  • [2] Statistical Machine Translation from and into Morphologically Rich and Low Resourced Languages
    Pushpananda, Randil
    Weerasinghe, Ruvan
    Niranjan, Mahesan
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 545 - 556
  • [3] Addressing data sparsity for neural machine translation between morphologically rich languages
    Garcia-Martinez, Mercedes
    Aransa, Walid
    Bougares, Fethi
    Barrault, Loic
    [J]. MACHINE TRANSLATION, 2020, 34 (01) : 1 - 20
  • [4] Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data
    Pinnis, Marcis
    Krislauks, Rihards
    Deksne, Daiga
    Miks, Toms
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 237 - 245
  • [5] Neural Machine Translation for Low-resource Languages: A Survey
    Ranathunga, Surangika
    Lee, En-Shiun Annie
    Skenduli, Marjana Prifti
    Shekhar, Ravi
    Alam, Mehreen
    Kaur, Rishemjit
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (11)
  • [6] Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages
    Saxena, Shefali
    Chaurasia, Uttkarsh
    Bansal, Nitin
    Daniel, Philemon
    [J]. IETE JOURNAL OF RESEARCH, 2023, 69 (12) : 8848 - 8858
  • [7] Unsupervised Source Hierarchies for Low-Resource Neural Machine Translation
    Currey, Anna
    Heafield, Kenneth
    [J]. RELEVANCE OF LINGUISTIC STRUCTURE IN NEURAL ARCHITECTURES FOR NLP, 2018, : 6 - 12
  • [8] Extremely low-resource neural machine translation for Asian languages
    Rubino, Raphael
    Marie, Benjamin
    Dabre, Raj
    Fujita, Atushi
    Utiyama, Masao
    Sumita, Eiichiro
    [J]. MACHINE TRANSLATION, 2020, 34 (04) : 347 - 382
  • [9] Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation
    Przystupa, Michael
    Abdul-Mageed, Muhammad
    [J]. FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 224 - 235
  • [10] Machine Translation in Low-Resource Languages by an Adversarial Neural Network
    Sun, Mengtao
    Wang, Hao
    Pasquine, Mark
    Hameed, Ibrahim A.
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (22):