Addressing Limited Vocabulary and Long Sentences Constraints in English-Arabic Neural Machine Translation

被引:7
|
作者
Berrichi, Safae [1 ]
Mazroui, Azzeddine [1 ]
机构
[1] Mohamed First Univ, Dept Comp Sci, Fac Sci, Oujda, Morocco
关键词
Neural machine translation; Factored models; Arabic morphology; Sentence segmentation;
D O I
10.1007/s13369-020-05328-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Neural Machine Translation (NMT) has attracted growing interest in recent years for its promising performance compared to traditional approaches such as Statistical Machine Translation. However, its application to languages having different structures, like the (English, Arabic) pair that interests us in this work, degrades its performance. Indeed, the limited vocabulary size required by the NMT models decreases the vocabulary coverage rate of the Arabic language, well known by its morphological richness. Likewise, long sentences present an additional challenge to NMT systems because they perform less well for longer sentences than for the shorter ones. In this paper, we provide a series of experiments to mitigate the effects of these constraints. To address the problem of out-of-vocabulary words, we integrated into factored NMT models morphosyntactic features as an output factor, namely stem, lemma, POS, root, and pattern. We have also developed two techniques for segmenting long sentences into smaller sub-sentences. The first uses a list of lexical markers that we have collected as segmentation points, and the second integrates into the NMT model the parallel phrases extracted by an SMT system. The experiments carried out on the English-Arabic pair show that the proposed approaches considerably improve the translation quality compared to the basic NMT system.
引用
收藏
页码:8245 / 8259
页数:15
相关论文
共 50 条
  • [21] EXISTENTIAL SENTENCES IN ARABIC-ENGLISH TRANSLATION
    AZIZ, YY
    [J]. META, 1995, 40 (01) : 47 - 53
  • [22] English-Arabic Text Translation and Abstractive Summarization Using Transformers
    Holiel, Heidi Ahmed
    Mohamed, Nancy
    Ahmed, Arwa
    Medhat, Walaa
    [J]. 2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [23] An intelligent algorithm for fast machine translation of long English sentences
    He, Hengheng
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
  • [24] Gender Aware Spoken Language Translation Applied to English-Arabic
    Elaraby, Mostafa
    Tawfik, Ahmed Y.
    Khaled, Mahmoud
    Hassan, Hany
    Osama, Aly
    [J]. 2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP), 2018, : 119 - 124
  • [25] Improving English-Arabic statistical machine translation with morpho-syntactic and semantic word class
    Khemakhem I.T.
    Jamoussi S.
    Hamadou A.B.
    [J]. International Journal of Intelligent Systems Technologies and Applications, 2020, 19 (02) : 172 - 190
  • [26] Gender of cited authors A problem for the English-Arabic translation of scholarly research
    Hamdan, Jihad M.
    Natour, Yaser S.
    [J]. BABEL-REVUE INTERNATIONALE DE LA TRADUCTION-INTERNATIONAL JOURNAL OF TRANSLATION, 2014, 60 (03): : 265 - 280
  • [27] Probabilistic neural network based English-Arabic sentence alignment
    Fattah, MA
    Ren, F
    Kuroiwa, S
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2006, 3878 : 97 - 100
  • [28] Recognition and Segmentation of English Long and Short Sentences Based on Machine Translation
    Zhang, Tiehu
    [J]. INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2020, 15 (01) : 152 - 162
  • [29] A Simple Present and Past Sentences Machine Translation from Arabic Language (AL) to English language
    Hmeidi, Ismail
    Al-Aiad, Ahmad
    Al-Momani, Sama
    Ibnian, Mohammad
    [J]. 2016 INTERNATIONAL CONFERENCE ON ENGINEERING & MIS (ICEMIS), 2016,
  • [30] Evaluating Arabic to English Machine Translation
    Hadla, Laith S.
    Hailat, Taghreed M.
    Al-Kabi, Mohammed N.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (11) : 68 - 73