The Impact of Word Segmentation Techniques on Neural and Statistical Machine Translation: English-Arabic Case

被引:0
|
作者
Berrichi, Safae [1 ]
Mazroui, Azzeddine [1 ]
机构
[1] Mohammed First Univ, Fac Sci, Dept Comp Sci, Oujda, Morocco
关键词
Machine translation; Morphological segmentation; Sub-word segmentation; Statistical approach; Neural approach;
D O I
10.1007/978-3-030-90633-7_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with Machine Translation between the English and Arabic languages. This task is very tricky given the morphological richness of the Arabic language and the unavailability of large parallel corpora. To overcome those issues, we have examined the impact of word segmentation (sub-word and morphological segmentation) on machine translation performance. We have tested both the statistical approach and the neural approach which is widely employed in recent years owing to its promising results. In our experiments, carried out on English-Arabic direction and based on the United Nations parallel corpus, we show that applying morphological segmentation to the target language proved very beneficial, whereas sub-word segmentation made no significant impact on both neural and statistical models.
引用
收藏
页码:454 / 462
页数:9
相关论文
共 50 条
  • [1] Word Agreement and Ordering in English-Arabic Machine Translation
    Abu Shquier, Mohammed M.
    Sembok, Tengku Mohd T.
    INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 644 - +
  • [2] English-Arabic Statistical Machine Translation: State of the Art
    Ebrahim, Sara
    Hegazy, Doaa
    Mostafa, Mostafa G. M.
    El-Beltagy, Samhaa R.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 520 - 533
  • [3] Benefits of morphosyntactic features on English-Arabic Statistical Machine Translation
    Berrichi, Safae
    Mazroui, Azzeddine
    2018 IEEE 5TH INTERNATIONAL CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'18), 2018, : 244 - 248
  • [4] Orthographic and morphological processing for English-Arabic statistical machine translation
    El Kholy, Ahmed
    Habash, Nizar
    MACHINE TRANSLATION, 2012, 26 (1-2) : 25 - 45
  • [5] Improving English-Arabic statistical machine translation with morpho-syntactic and semantic word class
    Khemakhem I.T.
    Jamoussi S.
    Hamadou A.B.
    International Journal of Intelligent Systems Technologies and Applications, 2020, 19 (02) : 172 - 190
  • [6] Detecting and Integrating Multiword Expression into English-Arabic Statistical Machine Translation
    Ebrahim, Sara
    Hegazy, Doaa
    Mostafa, Mostafa Gadal-Haqq M.
    El-Beltagy, Samhaa R.
    ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 : 111 - 118
  • [7] SVO word order errors in english-arabic translation
    Al-Jarf, Reima-Sado
    META, 2007, 52 (02) : 299 - 308
  • [8] The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation
    Al-Haj, Hassan
    Lavie, Alon
    MACHINE TRANSLATION, 2012, 26 (1-2) : 3 - 24
  • [9] Addressing Limited Vocabulary and Long Sentences Constraints in English-Arabic Neural Machine Translation
    Berrichi, Safae
    Mazroui, Azzeddine
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (09) : 8245 - 8259
  • [10] Bi-text Alignment of Movie Subtitles for Spoken English-Arabic Statistical Machine Translation
    Al-Obaidli, Fahad
    Cox, Stephen
    Nakov, Preslav
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 127 - 139