Extended Parallel Corpus for Amharic-English Machine Translation

被引:0
|
作者
Gezmu, Andargachew Mekonnen [1 ]
Nuernberger, Andreas [1 ]
Bati, Tesfaye Bayu [1 ]
机构
[1] Otto von Guericke Univ, Univ Pl 2, Magdeburg, Germany
关键词
Statistical Machine Translation; Neural Machine Translation; Less-Resourced Language;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper describes the acquisition, preprocessing, segmentation, and alignment of an Amharic-English parallel corpus. It will be helpful for machine translation of a low-resource language, Amharic. We freely released the corpus for research purposes. Furthermore, we developed baseline statistical and neural machine translation systems; we trained statistical and neural machine translation models using the corpus. In the experiments, we also used a large monolingual corpus for the language model of statistical machine translation and back-translation of neural machine translation. In the automatic evaluation, neural machine translation models outperform statistical machine translation models by approximately six to seven Bilingual Evaluation Understudy (BLEU) points. Besides, among the neural machine translation models, the subword models outperform the word-based models by three to four BLEU points. Moreover, two other relevant automatic evaluation metrics, Translation Edit Rate on Character Level and Better Evaluation as Ranking, reflect corresponding differences among the trained models.
引用
收藏
页码:6644 / 6653
页数:10
相关论文
共 50 条
  • [1] Neural Machine Translation for Amharic-English Translation
    Gezmu, Andargachew Mekonne
    Nuernberger, Andreas
    Bati, Tesfaye Bayu
    [J]. ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 526 - 532
  • [2] Amharic-english information retrieval
    Argaw, Atelach Alemu
    Asker, Lars
    [J]. EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 43 - +
  • [3] CONCISE AMHARIC DICTIONARY - AMHARIC-ENGLISH, ENGLISH-AMHARIC - LESLAU,W
    CLEAR, J
    [J]. MODERN LANGUAGE JOURNAL, 1978, 62 (1-2): : 79 - 79
  • [4] Parallel Corpora Preparation for English-Amharic Machine Translation
    Biadgligne, Yohanens
    Smaili, Kamel
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2021, PT I, 2021, 12861 : 443 - 455
  • [5] AMHARIC-ENGLISH DICTIONARY - KANE,TL
    FELLMAN, J
    [J]. RESEARCH IN AFRICAN LITERATURES, 1993, 24 (01) : 146 - 146
  • [6] Construction of Mizo: English Parallel Corpus for Machine Translation
    Haulai, Thangkhanhau
    Hussain, Jamal
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (08)
  • [7] Amharic-English Information Retrieval with Pseudo Relevance Feedback
    Argaw, Atelach Alemu
    [J]. ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 119 - 126
  • [8] UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
    Tian, Liang
    Wong, Derek F.
    Chao, Lidia S.
    Quaresma, Paulo
    Oliveira, Francisco
    Lu, Yi
    Li, Shuo
    Wang, Yiming
    Wang, Longyue
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1837 - 1842
  • [9] AMHARIC-ENGLISH DICTIONARY, VOL 1, PT 2 - KANE,TL
    WAGNER, E
    [J]. ZEITSCHRIFT DER DEUTSCHEN MORGENLANDISCHEN GESELLSCHAFT, 1992, 142 (02): : 378 - 378
  • [10] An English-Portuguese parallel corpus of questions: translation guidelines and application in Statistical Machine Translation
    Costa, Angela
    Luis, Tiago
    Ribeiro, Joana
    Mendes, Ana Cristina
    Coheur, Luisa
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2172 - 2176