Addressing Limited Vocabulary and Long Sentences Constraints in English-Arabic Neural Machine Translation

被引:7
|
作者
Berrichi, Safae [1 ]
Mazroui, Azzeddine [1 ]
机构
[1] Mohamed First Univ, Dept Comp Sci, Fac Sci, Oujda, Morocco
关键词
Neural machine translation; Factored models; Arabic morphology; Sentence segmentation;
D O I
10.1007/s13369-020-05328-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Neural Machine Translation (NMT) has attracted growing interest in recent years for its promising performance compared to traditional approaches such as Statistical Machine Translation. However, its application to languages having different structures, like the (English, Arabic) pair that interests us in this work, degrades its performance. Indeed, the limited vocabulary size required by the NMT models decreases the vocabulary coverage rate of the Arabic language, well known by its morphological richness. Likewise, long sentences present an additional challenge to NMT systems because they perform less well for longer sentences than for the shorter ones. In this paper, we provide a series of experiments to mitigate the effects of these constraints. To address the problem of out-of-vocabulary words, we integrated into factored NMT models morphosyntactic features as an output factor, namely stem, lemma, POS, root, and pattern. We have also developed two techniques for segmenting long sentences into smaller sub-sentences. The first uses a list of lexical markers that we have collected as segmentation points, and the second integrates into the NMT model the parallel phrases extracted by an SMT system. The experiments carried out on the English-Arabic pair show that the proposed approaches considerably improve the translation quality compared to the basic NMT system.
引用
收藏
页码:8245 / 8259
页数:15
相关论文
共 50 条
  • [31] Errors and non-errors in English-Arabic machine translation of gender-bound constructs in technical texts
    Abu-Ayyash, Emad A. S.
    [J]. ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 : 73 - 80
  • [33] Rethinking the English-Arabic Legal Translation Course: Restructuring for Specific Competence Acquisition
    Halimi, Sonia Asmahene
    [J]. INTERNATIONAL JOURNAL FOR THE SEMIOTICS OF LAW-REVUE INTERNATIONALE DE SEMIOTIQUE JURIDIQUE, 2019, 32 (01): : 117 - 134
  • [34] Machine translation of simple English sentences to Hindi
    Ahmed, Mansoor
    Bhattacharyya, S.K.
    [J]. Advances in Modelling and Analysis B: Signals, Information, Data, Patterns, 1995, 33 (1-3): : 13 - 26
  • [35] Explicitation Techniques in English-Arabic Translation: A Linguistic Corpus-based Study
    El-Nashar, Mohamed Mohamed Mostafa
    [J]. ARAB WORLD ENGLISH JOURNAL, 2016, 7 (03) : 317 - 335
  • [36] English/Arabic/English machine translation: A historical perspective
    Zughoul, MR
    Abu-Alshaar, AM
    [J]. META, 2005, 50 (03) : 1022 - 1041
  • [37] Vocabulary Manipulation for Neural Machine Translation
    Mi, Haitao
    Wang, Zhiguo
    Ittycheriah, Abe
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 124 - 129
  • [38] Neural machine translation for limited resources English-Nyishi pair
    Nabam Kakum
    Sahinur Rahman Laskar
    Koj Sambyo
    Partha Pakray
    [J]. Sādhanā, 48
  • [39] Estimating and Controlling the Appropriate Number of Output Sentences in Neural Machine Translation for Japanese–English News Translation
    Ito, Hitoshi
    Kinugawa, Kazutaka
    Mino, Hideya
    Goto, Isao
    Yamada, Ichiro
    [J]. Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2022, 76 (03): : 416 - 419
  • [40] Neural machine translation for limited resources English-Nyishi pair
    Kakum, Nabam
    Laskar, Sahinur Rahman
    Sambyo, Koj
    Pakray, Partha
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2023, 48 (04):