Unsupervised Extraction of Partial Translations for Neural Machine Translation

被引:0
|
作者
Marie, Benjamin [1 ]
Fujita, Atsushi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, 3-5 Hikaridai, Seika, Kyoto 6190289, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In neural machine translation (NMT), monolingual data are usually exploited through a so-called back-translation: sentences in the target language are translated into the source language to synthesize new parallel data. While this method provides more training data to better model the target language, on the source side, it only exploits translations that the NMT system is already able to generate using a model trained on existing parallel data. In this work, we assume that new translation knowledge can be extracted from monolingual data, without relying at all on existing parallel data. We propose a new algorithm for extracting from monolingual data what we call partial translations: pairs of source and target sentences that contain sequences of tokens that are translations of each other. Our algorithm is fully unsupervised and takes only source and target monolingual data as input. Our empirical evaluation points out that our partial translations can be used in combination with back-translation to further improve NMT models. Furthermore, while partial translations are particularly useful for low-resource language pairs, they can also be successfully exploited in resource-rich scenarios to improve translation quality.
引用
收藏
页码:3834 / 3844
页数:11
相关论文
共 50 条
  • [1] Unsupervised dialectal neural machine translation
    Farhan, Wael
    Talafha, Bashar
    Abuammar, Analle
    Jaikat, Ruba
    Al-Ayyoub, Mahmoud
    Tarakji, Ahmad Bisher
    Toma, Anas
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (03)
  • [2] Boosting Neural Machine Translation with Similar Translations
    Xu, Jitao
    Crego, Josep
    Senellart, Jean
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1580 - 1590
  • [3] Unsupervised Domain Adaptation for Neural Machine Translation
    Yang, Zhen
    Chen, Wei
    Wang, Feng
    Xu, Bo
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 338 - 343
  • [4] Unsupervised Neural Machine Translation with Universal Grammar
    Li, Zuchao
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Hai
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3249 - 3264
  • [5] Unsupervised Quality Estimation for Neural Machine Translation
    Fomicheva, Marina
    Sun, Shuo
    Yankovskaya, Lisa
    Blain, Frederic
    Guzman, Francisco
    Fishel, Mark
    Aletras, Nikolaos
    Chaudhary, Vishrav
    Specia, Lucia
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 539 - 555
  • [6] Deep Learning for Unsupervised Neural Machine Translation
    Yu, Kuai
    [J]. 2021 2ND INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2021), 2021, : 614 - 617
  • [7] Unsupervised Neural Machine Translation with Weight Sharing
    Yang, Zhen
    Chen, Wei
    Wang, Feng
    Xu, Bo
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 46 - 55
  • [8] Explicitly Modeling Word Translations in Neural Machine Translation
    Han, Dong
    Li, Junhui
    Li, Yachao
    Zhang, Min
    Zhou, Guodong
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (01)
  • [9] Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation
    Sun, Haipeng
    Wang, Rui
    Chen, Kehai
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1235 - 1245
  • [10] Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation
    Sun, Haipeng
    Wang, Rui
    Chen, Kehai
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3525 - 3535