Parallel Corpora Preparation for English-Amharic Machine Translation

被引:2
|
作者
Biadgligne, Yohanens [1 ]
Smaili, Kamel [2 ]
机构
[1] Bahir Dar Inst Technol, Bahir Dar, Ethiopia
[2] Loria Univ Lorraine, Nancy, France
关键词
Amharic language; Machine translation; SMT; NMT; Parallel corpus; BLEU;
D O I
10.1007/978-3-030-85030-2_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we describe the development of an English-Amharic parallel corpus and Machine Translation (MT) experiments conducted on it. Two different tests have been achieved. Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) experiments. The performance using the bilingual evaluation understudy metric (BLEU) shows 26.47 and 32.44 respectively for SMT and NMT. The corpus was collected from the Internet using automatic and semi automatic techniques. The harvested corpus concerns domains coming from Religion, Law, and News. Finally, the corpus, we built is composed of 225,304 parallel sentences, it will be shared for free with the community. In our knowledge, this is the biggest parallel corpus so far concerning the Amharic language.
引用
收藏
页码:443 / 455
页数:13
相关论文
共 50 条
  • [1] Phoneme-based English-Amharic Statistical Machine Translation
    Teshome, Mulu Gebreegziabher
    Besacier, Laurent
    Taye, Girma
    Teferi, Dereje
    [J]. PROCEEDINGS OF THE 2015 12TH IEEE AFRICON INTERNATIONAL CONFERENCE - GREEN INNOVATION FOR AFRICAN RENAISSANCE (AFRICON), 2015,
  • [2] Context based machine translation with recurrent neural network for English-Amharic translation
    Ashengo, Yeabsira Asefa
    Aga, Rosa Tsegaye
    Abebe, Surafel Lemma
    [J]. MACHINE TRANSLATION, 2021, 35 (01) : 19 - 36
  • [3] CONCISE AMHARIC DICTIONARY - AMHARIC-ENGLISH, ENGLISH-AMHARIC - LESLAU,W
    CLEAR, J
    [J]. MODERN LANGUAGE JOURNAL, 1978, 62 (1-2): : 79 - 79
  • [4] Extended Parallel Corpus for Amharic-English Machine Translation
    Gezmu, Andargachew Mekonnen
    Nuernberger, Andreas
    Bati, Tesfaye Bayu
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6644 - 6653
  • [5] Neural Machine Translation for Amharic-English Translation
    Gezmu, Andargachew Mekonne
    Nuernberger, Andreas
    Bati, Tesfaye Bayu
    [J]. ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 526 - 532
  • [6] ENGLISH-AMHARIC CONTEXT DICTIONARY - LESLAU,W
    FELLMAN, J
    [J]. LINGUA, 1978, 46 (04) : 395 - 396
  • [7] The Application of Parallel Corpora in the Translation Teaching of College English
    Wu, Jiaping
    Peng, Dejing
    [J]. 2016 5TH EEM INTERNATIONAL CONFERENCE ON PUBLIC ADMINISTRATION & MANAGEMENT (EEM-PAM 2016), 2016, 91 : 106 - 111
  • [8] Bilingual Lexicon Extraction from Arabic-English Parallel Corpora with a View to Machine Translation
    Sabtan, Yasser Muhammad Naguib
    [J]. ARAB WORLD ENGLISH JOURNAL, 2016, : 317 - 336
  • [9] Parallel subtitle corpora and their applications in machine translation and translatology
    Bywood, Lindsay
    Volk, Martin
    Fishel, Mark
    Georgakopoulou, Panayota
    [J]. PERSPECTIVES-STUDIES IN TRANSLATOLOGY, 2013, 21 (04): : 595 - 610
  • [10] A CONCISE SOCIOPOLITICAL DICTIONARY - ENGLISH-AMHARIC - POLACEK,Z, AZZANA,M, TASFAYE,T
    IRVINE, AK
    [J]. BULLETIN OF THE SCHOOL OF ORIENTAL AND AFRICAN STUDIES-UNIVERSITY OF LONDON, 1993, 56 : 139 - 139