Transformer-based Machine Translation for Low-resourced Languages embedded with Language Identification

被引:5
|
作者
Sefara, Tshephisho J. [1 ]
Zwane, Skhumbuzo G. [2 ]
Gama, Nelisiwe [3 ]
Sibisi, Hlawulani [4 ]
Senoamadi, Phillemon N. [5 ]
Marivate, Vukosi [6 ]
机构
[1] CSIR, Next Generat Enterprises & Inst, Pretoria, South Africa
[2] Univ Zululand, Dept Comp Sci, Richards Bay, South Africa
[3] Univ Witwatersrand, Sch Comp Sci & Appl Math, Johannesburg, South Africa
[4] Univ Johannesburg, Dept Comp Sci, Johannesburg, South Africa
[5] Univ Zululand, Dept Math, Richards Bay, South Africa
[6] Univ Pretoria, Dept Comp Sci, Pretoria, South Africa
关键词
machine translation; low-resourced languages; neural network; language identification;
D O I
10.1109/ICTAS50802.2021.9394996
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research on the development of machine translation (MT) models has resulted in state-of-the-art performance for many resourced European languages. However, there has been a little focus on applying these MT services to low-resourced languages. This paper presents the development of neural machine translation (NMT) for low-resourced languages of South Africa. Two MT models, JoeyNMT and transformer NMT with self-attention are trained and evaluated using BLEU score. The transformer NMT with self-attention obtained state-of-the-art performance on isiNdebele, SiSwati, Setswana, Tshivenda, isiXhosa, and Sepedi while JoeyNMT performed well on isiZulu. The MT models are embedded with language identification (LID) model that presets the language for translation models. The LID models are trained using logistic regression and multinomial naive Bayes (MNB). MNB classifier obtained an accuracy of 99% outperforming logistic regression which obtained the lowest accuracy of 97%.
引用
收藏
页码:127 / 132
页数:6
相关论文
共 50 条
  • [11] An Automatic Summarizer for a Low-Resourced Language
    Pattnaik, Sagarika
    Nayak, Ajit Kumar
    [J]. ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, 2020, 1082 : 285 - 295
  • [12] Common latent representation learning for low-resourced spoken language identification
    Chen, Chen
    Bu, Yulin
    Chen, Yong
    Chen, Deyun
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 34515 - 34535
  • [13] Multilingual Neural Semantic Parsing for Low-Resourced Languages
    Xia, Menglin
    Monti, Emilio
    [J]. 10TH CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS (SEM 2021), 2021, : 185 - 194
  • [14] Common latent representation learning for low-resourced spoken language identification
    Chen, Chen
    Bu, Yulin
    Chen, Yong
    Chen, Deyun
    [J]. Multimedia Tools and Applications, 2024, 83 (12) : 34515 - 34535
  • [15] Common latent representation learning for low-resourced spoken language identification
    Chen Chen
    Yulin Bu
    Yong Chen
    Deyun Chen
    [J]. Multimedia Tools and Applications, 2024, 83 : 34515 - 34535
  • [16] Evaluation of Neural Network Transformer Models for Named-Entity Recognition on Low-Resourced Languages
    Hanslo, Ridewaan
    [J]. PROCEEDINGS OF THE 2021 16TH CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS (FEDCSIS), 2021, : 115 - 119
  • [17] Acoustic Modeling with Bootstrap and Restructuring for Low-resourced Languages
    Cui, Xiaodong
    Xue, Jian
    Dognin, Pierre L.
    Chaudhari, Upendra V.
    Zhou, Bowen
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2974 - 2977
  • [18] Surface Realization Architecture for Low-resourced African Languages
    Mahlaza, Zola
    Keet, C. Maria
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [19] Statistical Machine Translation from and into Morphologically Rich and Low Resourced Languages
    Pushpananda, Randil
    Weerasinghe, Ruvan
    Niranjan, Mahesan
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 545 - 556
  • [20] Performance of Recent Large Language Models for a Low-Resourced Language
    Jayakody, Ravindu
    Dias, Gihan
    [J]. 2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 162 - 167