Parallel Corpora Preparation for English-Amharic Machine Translation

被引:2
|
作者
Biadgligne, Yohanens [1 ]
Smaili, Kamel [2 ]
机构
[1] Bahir Dar Inst Technol, Bahir Dar, Ethiopia
[2] Loria Univ Lorraine, Nancy, France
关键词
Amharic language; Machine translation; SMT; NMT; Parallel corpus; BLEU;
D O I
10.1007/978-3-030-85030-2_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we describe the development of an English-Amharic parallel corpus and Machine Translation (MT) experiments conducted on it. Two different tests have been achieved. Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) experiments. The performance using the bilingual evaluation understudy metric (BLEU) shows 26.47 and 32.44 respectively for SMT and NMT. The corpus was collected from the Internet using automatic and semi automatic techniques. The harvested corpus concerns domains coming from Religion, Law, and News. Finally, the corpus, we built is composed of 225,304 parallel sentences, it will be shared for free with the community. In our knowledge, this is the biggest parallel corpus so far concerning the Amharic language.
引用
收藏
页码:443 / 455
页数:13
相关论文
共 50 条
  • [41] A Parallel Corpora for bi-directional Neural Machine Translation for Low Resourced Ethiopian Languages
    Tonja, Atnafu Lambebo
    Woldeyohannis, Michael Melese
    Yigezu, Mesay Gemeda
    [J]. 2021 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR DEVELOPMENT FOR AFRICA (ICT4DA), 2021, : 71 - 76
  • [42] Benchmarking of English-Hindi parallel corpora
    Yeka, Jayendra Rakesh
    Kolachina, Prasanth
    Sharma, Dipti Misra
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1812 - 1818
  • [43] Automatic construction of English/Chinese parallel corpora
    Yang, CC
    Li, KW
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2003, 54 (08): : 730 - 742
  • [44] Impact of Corpora Quality on Neural Machine Translation
    Rikters, Matiss
    [J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 126 - 133
  • [45] An English-Portuguese parallel corpus of questions: translation guidelines and application in Statistical Machine Translation
    Costa, Angela
    Luis, Tiago
    Ribeiro, Joana
    Mendes, Ana Cristina
    Coheur, Luisa
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2172 - 2176
  • [46] Neural Machine Translation System for English to Indian Language Translation Using MTIL Parallel Corpus
    Premjith, B.
    Kumar, M. Anand
    Soman, K. P.
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2019, 28 (03) : 387 - 398
  • [47] Building English - Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora
    Kaur, Dilshad
    Singh, Satwinder
    [J]. APPLIED COMPUTER SYSTEMS, 2023, 28 (02) : 245 - 251
  • [48] Building a Parallel Corpora: Translation Issues and Remedial Case
    Archana, G. P.
    Jithesh, V. S.
    Remya, L. B.
    Sherly, Elizabeth
    [J]. 2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 2414 - 2417
  • [49] Annotation-preserving machine translation of English corpora to validate Dutch clinical concept extraction tools
    Seinen, Tom M.
    Kors, Jan A.
    van Mulligen, Erik M.
    Rijnbeek, Peter R.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (08) : 1725 - 1734
  • [50] Fixing Translation Divergences in Parallel Corpora for Neural MT
    MinhQuang Pham
    Crego, Josep
    Senellart, Jean
    Yvon, Francois
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2967 - 2973