Addressing data sparsity for neural machine translation between morphologically rich languages

被引:4
|
作者
Garcia-Martinez, Mercedes [1 ]
Aransa, Walid [1 ]
Bougares, Fethi [1 ]
Barrault, Loic [1 ]
机构
[1] Le Mans Univ, LIUM, Le Mans, France
关键词
Neural machine translation; Factored models; Deep learning;
D O I
10.1007/s10590-019-09242-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Translating between morphologically rich languages is still challenging for current machine translation systems. In this paper, we experiment with various neural machine translation (NMT) architectures to address the data sparsity problem caused by data availability (quantity), domain shift and the languages involved (Arabic and French). We show that the Factored NMT (FNMT) model, which uses linguistically motivated factors, is able to outperform standard NMT systems using subword units by more than 1 BLEU point even when a large quantity of data is available. Our work shows the benefits of applying linguistic factors in NMT when faced with low- and high-resource conditions.
引用
收藏
页码:1 / 20
页数:20
相关论文
共 50 条
  • [1] Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data
    Pinnis, Marcis
    Krislauks, Rihards
    Deksne, Daiga
    Miks, Toms
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 237 - 245
  • [2] Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Translation System
    El-Kahlout, Ilknur Durgar
    Bektas, Emre
    Erdem, Naime Seyma
    Kaya, Hamza
    [J]. FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 158 - 166
  • [3] Improved Unsupervised Neural Machine Translation with Semantically Weighted Back Translation for Morphologically Rich and Low Resource Languages
    Chauhan, Shweta
    Saxena, Shefali
    Daniel, Philemon
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1707 - 1726
  • [4] Improved Unsupervised Neural Machine Translation with Semantically Weighted Back Translation for Morphologically Rich and Low Resource Languages
    Shweta Chauhan
    Shefali Saxena
    Philemon Daniel
    [J]. Neural Processing Letters, 2022, 54 : 1707 - 1726
  • [5] Statistical Machine Translation from and into Morphologically Rich and Low Resourced Languages
    Pushpananda, Randil
    Weerasinghe, Ruvan
    Niranjan, Mahesan
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 545 - 556
  • [6] Using POS information for statistical machine translation into morphologically rich languages
    Ueffing, N
    Ney, H
    [J]. EACL 2003: 10TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 347 - 354
  • [7] Improving Adversarial Neural Machine Translation for Morphologically Rich Language
    Mi, Chenggang
    Xie, Lei
    Zhang, Yanning
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2020, 4 (04): : 417 - 426
  • [8] End-to-End Lexically Constrained Machine Translation for Morphologically Rich Languages
    Jon, Josef
    Aires, Joao Paulo
    Varis, Dusan
    Bojar, Ondrej
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4019 - 4033
  • [9] Compositional Representation of Morphologically-Rich Input for Neural Machine Translation
    Ataman, Duygu
    Federico, Marcello
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 305 - 311
  • [10] On the Sparsity of Neural Machine Translation Models
    Wang, Yong
    Wang, Longyue
    Li, Victor O. K.
    Tu, Zhaopeng
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1060 - 1066