The Impact of Word Segmentation Techniques on Neural and Statistical Machine Translation: English-Arabic Case

被引:0
|
作者
Berrichi, Safae [1 ]
Mazroui, Azzeddine [1 ]
机构
[1] Mohammed First Univ, Fac Sci, Dept Comp Sci, Oujda, Morocco
关键词
Machine translation; Morphological segmentation; Sub-word segmentation; Statistical approach; Neural approach;
D O I
10.1007/978-3-030-90633-7_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with Machine Translation between the English and Arabic languages. This task is very tricky given the morphological richness of the Arabic language and the unavailability of large parallel corpora. To overcome those issues, we have examined the impact of word segmentation (sub-word and morphological segmentation) on machine translation performance. We have tested both the statistical approach and the neural approach which is widely employed in recent years owing to its promising results. In our experiments, carried out on English-Arabic direction and based on the United Nations parallel corpus, we show that applying morphological segmentation to the target language proved very beneficial, whereas sub-word segmentation made no significant impact on both neural and statistical models.
引用
收藏
页码:454 / 462
页数:9
相关论文
共 50 条
  • [31] Gender of cited authors A problem for the English-Arabic translation of scholarly research
    Hamdan, Jihad M.
    Natour, Yaser S.
    BABEL-REVUE INTERNATIONALE DE LA TRADUCTION-INTERNATIONAL JOURNAL OF TRANSLATION, 2014, 60 (03): : 265 - 280
  • [32] English-Basque Statistical and Neural Machine Translation
    Unanue, Inigo Jauregi
    Garmendia Arratibel, Lierni
    Borzeshi, Ehsan Zare
    Piccardi, Massimo
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 880 - 885
  • [33] Probabilistic neural network based English-Arabic sentence alignment
    Fattah, MA
    Ren, F
    Kuroiwa, S
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2006, 3878 : 97 - 100
  • [34] Morphology-Inspired Word Segmentation for Neural Machine Translation
    Zuters, Janis
    Strazds, Gus
    Leonova, Viktorija
    DATABASES AND INFORMATION SYSTEMS X (DB&IS 2018), 2019, 315 : 225 - 239
  • [35] Optimal Word Segmentation for Neural Machine Translation into Dravidian Languages
    Dhar, Prajit
    Bisazza, Arianna
    van Noord, Gertjan
    WAT 2021: THE 8TH WORKSHOP ON ASIAN TRANSLATION, 2021, : 181 - 190
  • [36] Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text
    Gaser, Marwa
    Mager, Manuel
    Hamed, Injy
    Habash, Nizar
    Abdennadher, Slim
    Vu, Ngoc Thang
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3523 - 3538
  • [37] Inversion transduction grammar coverage of arabic-english word alignment for tree-structured statistical machine translation
    Wu, Dekai
    Carpuat, Marine
    Shen, Khai
    2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 234 - +
  • [38] A comparison of segmentation methods and extended lexicon models for Arabic statistical machine translation
    Hasan, Sasa
    Mansour, Saab
    Ney, Hermann
    MACHINE TRANSLATION, 2012, 26 (1-2) : 47 - 65
  • [39] Evaluating Arabic to English Machine Translation
    Hadla, Laith S.
    Hailat, Taghreed M.
    Al-Kabi, Mohammed N.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (11) : 68 - 73
  • [40] Word Reordering Approaches for Bangla-English Statistical Machine Translation
    Roy, Maxim
    Popowich, Fred
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2010, 6085 : 282 - 285