Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

被引:0
|
作者
Duh, Kevin [1 ]
McNamee, Paul [1 ]
Post, Matt [1 ]
Thompson, Brian [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
machine translation; low-resource languages; evaluation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Research in machine translation (MT) is developing at a rapid pace. However, most work in the community has focused on languages where large amounts of digital resources are available. In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili. These languages are of social importance and serve as test-beds for developing technologies that perform reasonably well despite the low-resource constraint. Our findings suggest that statistical machine translation (SMT) and neural machine translation (NMT) can perform similarly in low-resource scenarios, but neural systems require more careful tuning to match performance. We also investigate how to exploit additional data, such as bilingual text harvested from the web, or user dictionaries; we find that NMT can significantly improve in performance with the use of these additional data. Finally, we survey the landscape of machine translation resources for the languages of Africa and provide some suggestions for promising future research directions.
引用
收藏
页码:2667 / 2675
页数:9
相关论文
共 50 条
  • [21] Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof⇆French
    Dione, Cheikh M. Bamba
    Lo, Alla
    Nguer, Elhadji Mamadou
    Ba, Sileye Oumar
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6654 - 6661
  • [22] Low-resource Neural Machine Translation: Methods and Trends
    Shi, Shumin
    Wu, Xing
    Su, Rihai
    Huang, Heyan
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [23] Recent advances of low-resource neural machine translation
    Haque, Rejwanul
    Liu, Chao-Hong
    Way, Andy
    [J]. MACHINE TRANSLATION, 2021, 35 (04) : 451 - 474
  • [24] Data Augmentation for Low-Resource Neural Machine Translation
    Fadaee, Marzieh
    Bisazza, Arianna
    Monz, Christof
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573
  • [25] Neural Machine Translation for Low-Resource Languages from a Chinese-centric Perspective: A Survey
    Zhang, Jinyi
    Su, Ke
    Li, Haowei
    Mao, Jiannan
    Tian, Ye
    Wen, Feng
    Guo, Chong
    Matsumoto, Tadahiro
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (06)
  • [26] Neural Machine Translation Advised by Statistical Machine Translation: The Case of Farsi-Spanish Bilingually Low-Resource Scenario
    Ahmadnia, Benyamin
    Kordjamshidi, Parisa
    Haffari, Gholamreza
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1209 - 1213
  • [27] Performance Comparison of Statistical vs. Neural-Based Translation System on Low-Resource Languages
    Datta, Goutam
    Joshi, Nisheeth
    Gupta, Kusum
    [J]. INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2023, 16 (01):
  • [28] A Strategy for Referential Problem in Low-Resource Neural Machine Translation
    Ji, Yatu
    Shi, Lei
    Su, Yila
    Ren, Qing-dao-er-ji
    Wu, Nier
    Wang, Hongbin
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 321 - 332
  • [29] Meta-Learning for Low-Resource Neural Machine Translation
    Gu, Jiatao
    Wang, Yong
    Chen, Yun
    Cho, Kyunghyun
    Li, Victor O. K.
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3622 - 3631
  • [30] Low-Resource Neural Machine Translation: A Systematic Literature Review
    Yazar, Bilge Kagan
    Sahin, Durmus Ozkan
    Kilic, Erdal
    [J]. IEEE ACCESS, 2023, 11 : 131775 - 131813