Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

被引:0
|
作者
Duh, Kevin [1 ]
McNamee, Paul [1 ]
Post, Matt [1 ]
Thompson, Brian [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
machine translation; low-resource languages; evaluation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Research in machine translation (MT) is developing at a rapid pace. However, most work in the community has focused on languages where large amounts of digital resources are available. In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili. These languages are of social importance and serve as test-beds for developing technologies that perform reasonably well despite the low-resource constraint. Our findings suggest that statistical machine translation (SMT) and neural machine translation (NMT) can perform similarly in low-resource scenarios, but neural systems require more careful tuning to match performance. We also investigate how to exploit additional data, such as bilingual text harvested from the web, or user dictionaries; we find that NMT can significantly improve in performance with the use of these additional data. Finally, we survey the landscape of machine translation resources for the languages of Africa and provide some suggestions for promising future research directions.
引用
收藏
页码:2667 / 2675
页数:9
相关论文
共 50 条
  • [41] Improving neural machine translation for low-resource Indian languages using rule-based feature extraction
    Muskaan Singh
    Ravinder Kumar
    Inderveer Chana
    [J]. Neural Computing and Applications, 2021, 33 : 1103 - 1122
  • [42] Improving neural machine translation for low-resource Indian languages using rule-based feature extraction
    Singh, Muskaan
    Kumar, Ravinder
    Chana, Inderveer
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (04): : 1103 - 1122
  • [43] Semantic Perception-Oriented Low-Resource Neural Machine Translation
    Wu, Nier
    Hou, Hongxu
    Li, Haoran
    Chang, Xin
    Jia, Xiaoning
    [J]. MACHINE TRANSLATION, CCMT 2021, 2021, 1464 : 51 - 62
  • [44] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
    Li, Yu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    [J]. INFORMATION, 2020, 11 (05)
  • [45] Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation
    Pang, Jianhui
    Yang, Baosong
    Wong, Derek Fai
    Wan, Yu
    Liu, Dayiheng
    Chao, Lidia Sam
    Xie, Jun
    [J]. COMPUTATIONAL LINGUISTICS, 2023, 50 (01) : 25 - 47
  • [46] Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings
    Kalimuthu, Marimuthu
    Barz, Michael
    Sonntag, Daniel
    [J]. FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 1 - 10
  • [47] Efficient Low-Resource Neural Machine Translation with Reread and Feedback Mechanism
    Yu, Zhiqiang
    Yu, Zhengtao
    Guo, Junjun
    Huang, Yuxin
    Wen, Yonghua
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (03)
  • [48] Regressing Word and Sentence Embeddings for Low-Resource Neural Machine Translation
    Unanue, Inigo Jauregi
    Borzeshi, Ehsan Zare
    Piccardi, Massimo
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (03): : 450 - 463
  • [49] Hierarchical Transfer Learning Architecture for Low-Resource Neural Machine Translation
    Luo, Gongxu
    Yang, Yating
    Yuan, Yang
    Chen, Zhanheng
    Ainiwaer, Aizimaiti
    [J]. IEEE ACCESS, 2019, 7 : 154157 - 154166
  • [50] Enhancing distant low-resource neural machine translation with semantic pivot
    Zhu, Enchang
    Huang, Yuxin
    Xian, Yantuan
    Zhu, Junguo
    Gao, Minghu
    Yu, Zhiqiang
    [J]. Alexandria Engineering Journal, 2025, 116 : 633 - 643