Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

被引:0
|
作者
Duh, Kevin [1 ]
McNamee, Paul [1 ]
Post, Matt [1 ]
Thompson, Brian [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
machine translation; low-resource languages; evaluation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Research in machine translation (MT) is developing at a rapid pace. However, most work in the community has focused on languages where large amounts of digital resources are available. In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili. These languages are of social importance and serve as test-beds for developing technologies that perform reasonably well despite the low-resource constraint. Our findings suggest that statistical machine translation (SMT) and neural machine translation (NMT) can perform similarly in low-resource scenarios, but neural systems require more careful tuning to match performance. We also investigate how to exploit additional data, such as bilingual text harvested from the web, or user dictionaries; we find that NMT can significantly improve in performance with the use of these additional data. Finally, we survey the landscape of machine translation resources for the languages of Africa and provide some suggestions for promising future research directions.
引用
收藏
页码:2667 / 2675
页数:9
相关论文
共 50 条
  • [1] Neural Machine Translation for Low-resource Languages: A Survey
    Ranathunga, Surangika
    Lee, En-Shiun Annie
    Skenduli, Marjana Prifti
    Shekhar, Ravi
    Alam, Mehreen
    Kaur, Rishemjit
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (11)
  • [2] Machine Translation in Low-Resource Languages by an Adversarial Neural Network
    Sun, Mengtao
    Wang, Hao
    Pasquine, Mark
    Hameed, Ibrahim A.
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (22):
  • [3] Extremely low-resource neural machine translation for Asian languages
    Rubino, Raphael
    Marie, Benjamin
    Dabre, Raj
    Fujita, Atushi
    Utiyama, Masao
    Sumita, Eiichiro
    [J]. MACHINE TRANSLATION, 2020, 34 (04) : 347 - 382
  • [4] Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation
    Przystupa, Michael
    Abdul-Mageed, Muhammad
    [J]. FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 224 - 235
  • [5] Towards a Low-Resource Neural Machine Translation for Indigenous Languages in Canada
    Ngoc Tan Le
    Sadat, Fatiha
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2021, 62 (03): : 39 - 63
  • [6] An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages
    Mueller, Aaron
    Nicolai, Garrett
    McCarthy, Arya D.
    Lewis, Dylan
    Wu, Winston
    Yarowsky, David
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3710 - 3718
  • [7] Neural machine translation for low-resource languages without parallel corpora
    Karakanta, Alina
    Dehdari, Jon
    van Genabith, Josef
    [J]. MACHINE TRANSLATION, 2018, 32 (1-2) : 167 - 189
  • [8] Efficient Neural Machine Translation for Low-Resource Languages via Exploiting Related Languages
    Goyal, Vikrant
    Kumar, Sourav
    Sharma, Dipti Misra
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): STUDENT RESEARCH WORKSHOP, 2020, : 162 - 168
  • [9] DRA: dynamic routing attention for neural machine translation with low-resource languages
    Wang, Zhenhan
    Song, Ran
    Yu, Zhengtao
    Mao, Cunli
    Gao, Shengxiang
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
  • [10] OCR Improves Machine Translation for Low-Resource Languages
    Ignat, Oana
    Maillard, Jean
    Chaudhary, Vishrav
    Guzman, Francisco
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1164 - 1174