Addressing Limited Vocabulary and Long Sentences Constraints in English-Arabic Neural Machine Translation

被引:7
|
作者
Berrichi, Safae [1 ]
Mazroui, Azzeddine [1 ]
机构
[1] Mohamed First Univ, Dept Comp Sci, Fac Sci, Oujda, Morocco
关键词
Neural machine translation; Factored models; Arabic morphology; Sentence segmentation;
D O I
10.1007/s13369-020-05328-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Neural Machine Translation (NMT) has attracted growing interest in recent years for its promising performance compared to traditional approaches such as Statistical Machine Translation. However, its application to languages having different structures, like the (English, Arabic) pair that interests us in this work, degrades its performance. Indeed, the limited vocabulary size required by the NMT models decreases the vocabulary coverage rate of the Arabic language, well known by its morphological richness. Likewise, long sentences present an additional challenge to NMT systems because they perform less well for longer sentences than for the shorter ones. In this paper, we provide a series of experiments to mitigate the effects of these constraints. To address the problem of out-of-vocabulary words, we integrated into factored NMT models morphosyntactic features as an output factor, namely stem, lemma, POS, root, and pattern. We have also developed two techniques for segmenting long sentences into smaller sub-sentences. The first uses a list of lexical markers that we have collected as segmentation points, and the second integrates into the NMT model the parallel phrases extracted by an SMT system. The experiments carried out on the English-Arabic pair show that the proposed approaches considerably improve the translation quality compared to the basic NMT system.
引用
收藏
页码:8245 / 8259
页数:15
相关论文
共 50 条
  • [1] Addressing Limited Vocabulary and Long Sentences Constraints in English–Arabic Neural Machine Translation
    Safae Berrichi
    Azzeddine Mazroui
    [J]. Arabian Journal for Science and Engineering, 2021, 46 : 8245 - 8259
  • [2] SUDAN ARABIC: AN ENGLISH-ARABIC VOCABULARY
    Davies, R.
    [J]. SUDAN NOTES AND RECORDS, 1925, 8 : 220 - 223
  • [3] English-Arabic Statistical Machine Translation: State of the Art
    Ebrahim, Sara
    Hegazy, Doaa
    Mostafa, Mostafa G. M.
    El-Beltagy, Samhaa R.
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 520 - 533
  • [4] Word Agreement and Ordering in English-Arabic Machine Translation
    Abu Shquier, Mohammed M.
    Sembok, Tengku Mohd T.
    [J]. INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 644 - +
  • [5] The Impact of Word Segmentation Techniques on Neural and Statistical Machine Translation: English-Arabic Case
    Berrichi, Safae
    Mazroui, Azzeddine
    [J]. ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 454 - 462
  • [6] Benefits of morphosyntactic features on English-Arabic Statistical Machine Translation
    Berrichi, Safae
    Mazroui, Azzeddine
    [J]. 2018 IEEE 5TH INTERNATIONAL CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'18), 2018, : 244 - 248
  • [7] Orthographic and morphological processing for English-Arabic statistical machine translation
    El Kholy, Ahmed
    Habash, Nizar
    [J]. MACHINE TRANSLATION, 2012, 26 (1-2) : 25 - 45
  • [8] English-Arabic Hybrid Machine Translation System using EBMT and Translation Memory
    Ehab, Rana
    Gadallah, Mahmoud
    Amer, Eslam
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) : 195 - 203
  • [9] Detecting and Integrating Multiword Expression into English-Arabic Statistical Machine Translation
    Ebrahim, Sara
    Hegazy, Doaa
    Mostafa, Mostafa Gadal-Haqq M.
    El-Beltagy, Samhaa R.
    [J]. ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 : 111 - 118
  • [10] Asymmetry of gender markedness in English-Arabic translation
    Al-Qinai, J
    [J]. THEORETICAL LINGUISTICS, 1999, 25 (01) : 75 - 96