A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

被引:1
|
作者
Vazquez, Raul [1 ]
Raganato, Alessandro [1 ]
Creutz, Mathias [1 ]
Tiedemann, Jorg [1 ]
机构
[1] Univ Helsinki, Dept Digital Humanities, Helsinki, Finland
基金
芬兰科学院; 欧洲研究理事会;
关键词
Computational linguistics - Semantics - Benchmarking - Classification (of information) - Computer aided language translation;
D O I
10.1162/coli_a_00377
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this article, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate crosslingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an in-depth analysis of the proposed attention bridge and its ability to encode linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.
引用
收藏
页码:387 / 424
页数:38
相关论文
共 50 条
  • [21] Explicit Sentence Compression for Neural Machine Translation
    Li, Zuchao
    Wang, Rui
    Chen, Kehai
    Utiyama, Masao
    Sumita, Eiichiro
    Zhang, Zhuosheng
    Zhao, Hai
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8311 - 8318
  • [22] Long Sentence Preprocessing in Neural Machine Translation
    Ha Nguyen Tien
    Huyen Nguyen Thi Minh
    2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 301 - 306
  • [23] Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation
    Cheng, Yong
    Bapna, Ankur
    Firat, Orhan
    Cao, Yuan
    Wang, Pidong
    Macherey, Wolfgang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4092 - 4102
  • [24] Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?
    Kim, Zae Myung
    Besacier, Laurent
    Nikoulina, Vassilina
    Schwab, Didier
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2832 - 2841
  • [25] Enhancing Zero-Shot Translation in Multilingual Neural Machine Translation: Focusing on Obtaining Location-Agnostic Representations
    Zhang, Jiarui
    Huang, Heyan
    Hu, Yue
    Guo, Ping
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 194 - 208
  • [26] Learning Bilingual Sentence Representations for Quality Estimation of Machine Translation
    Zhu, Junguo
    Yang, Muyun
    Li, Sheng
    Zhao, Tiejun
    MACHINE TRANSLATION, 2016, 668 : 35 - 42
  • [27] Multilingual Machine Translation : An Analytical Study
    Phadke, Madhura Mandar
    Devane, Satish R.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 881 - 884
  • [28] Recurrent Attention for Neural Machine Translation
    Zeng, Jiali
    Wu, Shuangzhi
    Yin, Yongjing
    Jiang, Yufan
    Li, Mu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3216 - 3225
  • [29] Neural Machine Translation with Deep Attention
    Zhang, Biao
    Xiong, Deyi
    Su, Jinsong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 154 - 163
  • [30] Multilingual Unsupervised Neural Machine Translation with Denoising Adapters
    Ustun, Ahmet
    Berard, Alexandre
    Besacier, Laurent
    Galle, Matthias
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6650 - 6662