A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

被引:1
|
作者
Vazquez, Raul [1 ]
Raganato, Alessandro [1 ]
Creutz, Mathias [1 ]
Tiedemann, Jorg [1 ]
机构
[1] Univ Helsinki, Dept Digital Humanities, Helsinki, Finland
基金
芬兰科学院; 欧洲研究理事会;
关键词
Computational linguistics - Semantics - Benchmarking - Classification (of information) - Computer aided language translation;
D O I
10.1162/coli_a_00377
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this article, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate crosslingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an in-depth analysis of the proposed attention bridge and its ability to encode linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.
引用
收藏
页码:387 / 424
页数:38
相关论文
共 50 条
  • [31] Open and Competitive Multilingual Neural Machine Translation in Production
    Tattar, Andre
    Purason, Taido
    Kuulmets, Hele-Andra
    Luhtaru, Agnes
    Ratsep, Liisa
    Tars, Maali
    Pinnis, Marcis
    Bergmanis, Toms
    Fishel, Mark
    BALTIC JOURNAL OF MODERN COMPUTING, 2022, 10 (03): : 422 - 434
  • [32] Multi-way, multilingual neural machine translation
    Firat, Orhan
    Cho, Kyunghyun
    Sankaran, Baskaran
    Vural, Fatos T. Yarman
    Bengio, Yoshua
    COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 236 - 252
  • [33] Language relatedness evaluation for multilingual neural machine translation
    Mi, Chenggang
    Xie, Shaoliang
    NEUROCOMPUTING, 2024, 570
  • [34] Synchronous Interactive Decoding for Multilingual Neural Machine Translation
    He, Hao
    Wang, Qian
    Yu, Zhipeng
    Zhao, Yang
    Zhang, Jiajun
    Zong, Chengqing
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12981 - 12988
  • [35] Multilingual Neural Machine Translation for Indic to Indic Languages
    Das, Sudhansu Bala
    Panda, Divyajyoti
    Mishra, Tapas Kumar
    Patra, Bidyut Kr.
    Ekbal, Asif
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (05)
  • [36] Paraphrases as Foreign Languages in Multilingual Neural Machine Translation
    Zhou, Zhong
    Sperber, Matthias
    Waibel, Alex
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 113 - 122
  • [37] Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation
    Sun, Haipeng
    Wang, Rui
    Chen, Kehai
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3525 - 3535
  • [38] Recursive Annotations for Attention-Based Neural Machine Translation
    Ye, Shaolin
    Guo, Wu
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 164 - 167
  • [39] Syntax-Based Attention Masking for Neural Machine Translation
    McDonald, Colin
    Chiang, David
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 47 - 52
  • [40] From bilingual to multilingual neural-based machine translation by incremental training
    Escolano, Carlos
    Costa-Jussa, Marta R.
    Fonollosa, Jose A. R.
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2021, 72 (02) : 190 - 203