The neural machine translation models for the low-resource Kazakh-English language pair

被引:7
|
作者
Karyukin, Vladislav [1 ]
Rakhimova, Diana [1 ,2 ]
Karibayeva, Aidana [1 ]
Turganbayeva, Aliya [1 ]
Turarbek, Asem [1 ]
机构
[1] Al Farabi Kazakh Natl Univ, Dept Informat Syst, Alma Ata, Kazakhstan
[2] Inst Informat & Computat Technol, Alma Ata, Kazakhstan
关键词
Neural machine translation; Forward translation; Backward translation; Seq2Seq; RNN; BRNN; Transformer; OpenNMT; English; Kazakh;
D O I
10.7717/peerj-cs.1224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The development of the machine translation field was driven by people's need to communicate with each other globally by automatically translating words, sentences, and texts from one language into another. The neural machine translation approach has become one of the most significant in recent years. This approach requires large parallel corpora not available for low-resource languages, such as the Kazakh language, which makes it difficult to achieve the high performance of the neural machine translation models. This article explores the existing methods for dealing with low-resource languages by artificially increasing the size of the corpora and improving the performance of the Kazakh-English machine translation models. These methods are called forward translation, backward translation, and transfer learning. Then the Sequence-to-Sequence (recurrent neural network and bidirectional recurrent neural network) and Transformer neural machine translation architectures with their features and specifications are concerned for conducting experiments in training models on parallel corpora. The experimental part focuses on building translation models for the high-quality translation of formal social, political, and scientific texts with the synthetic parallel sentences from existing monolingual data in the Kazakh language using the forward translation approach and combining them with the parallel corpora parsed from the official government websites. The total corpora of 380,000 parallel Kazakh-English sentences are trained on the recurrent neural network, bidirectional recurrent neural network, and Transformer models of the OpenNMT framework. The quality of the trained model is evaluated with the BLEU, WER, and TER metrics. Moreover, the sample translations were also analyzed. The RNN and BRNN models showed a more precise translation than the Transformer model. The Byte-Pair Encoding tokenization technique showed better metrics scores and translation than the word tokenization technique. The Bidirectional recurrent neural network with the Byte-Pair Encoding technique showed the best performance with 0.49 BLEU, 0.51 WER, and 0.45 TER.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Recent advances of low-resource neural machine translation
    Haque, Rejwanul
    Liu, Chao-Hong
    Way, Andy
    MACHINE TRANSLATION, 2021, 35 (04) : 451 - 474
  • [22] A Study for Enhancing Low-resource Thai-Myanmar-English Neural Machine Translation
    San, Mya Ei
    Usanavasin, Sasiporn
    Thu, Ye Kyaw
    Okumura, Manabu
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [23] Overcoming the rare word problem for low-resource language pairs in neural machine translation
    Ngo, Thi-Vinh
    Ha, Thanh-Le
    Nguyen, Phuong-Thai
    Nguyen, Le-Minh
    arXiv, 2019,
  • [24] Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data
    Chowdhury, Koel Dutta
    Hasanuzzaman, Mohammed
    Liu, Qun
    DEEP LEARNING APPROACHES FOR LOW-RESOURCE NATURAL LANGUAGE PROCESSING (DEEPLO), 2018, : 33 - 42
  • [25] Morpheme-Based Neural Machine Translation Models for Low-Resource Fusion Languages
    Gezmu, Andargachew Mekonnen
    Nuenberger, Andreas
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
  • [26] A Strategy for Referential Problem in Low-Resource Neural Machine Translation
    Ji, Yatu
    Shi, Lei
    Su, Yila
    Ren, Qing-dao-er-ji
    Wu, Nier
    Wang, Hongbin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 321 - 332
  • [27] Machine Translation in Low-Resource Languages by an Adversarial Neural Network
    Sun, Mengtao
    Wang, Hao
    Pasquine, Mark
    Hameed, Ibrahim A.
    APPLIED SCIENCES-BASEL, 2021, 11 (22):
  • [28] Unsupervised Source Hierarchies for Low-Resource Neural Machine Translation
    Currey, Anna
    Heafield, Kenneth
    RELEVANCE OF LINGUISTIC STRUCTURE IN NEURAL ARCHITECTURES FOR NLP, 2018, : 6 - 12
  • [29] Low-Resource Neural Machine Translation: A Systematic Literature Review
    Yazar, Bilge Kagan
    Sahin, Durmus Ozkan
    Kilic, Erdal
    IEEE ACCESS, 2023, 11 : 131775 - 131813
  • [30] Meta-Learning for Low-Resource Neural Machine Translation
    Gu, Jiatao
    Wang, Yong
    Chen, Yun
    Cho, Kyunghyun
    Li, Victor O. K.
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3622 - 3631