Cross Aggregation of Multi-head Attention for Neural Machine Translation

被引:2
|
作者
Cao, Juncheng [1 ,2 ,3 ]
Zhao, Hai [1 ,2 ,3 ]
Yu, Kai [1 ,2 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai, Peoples R China
[3] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Machine translation; Attention mechanism; Information aggregation;
D O I
10.1007/978-3-030-32233-5_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer based encoder has been the state-of-the-art model for the latest neural machine translation, which relies on the key design called self-attention. Multi-head attention of self-attention network (SAN) plays a significant role in extracting information of the given input from different subspaces among each pair of tokens. However, that information captured by each token on a specific head, which is explicitly represented by the attention weights, is independent from other heads and tokens, which means it does not take the global structure into account. Besides, since SAN does not apply an RNN-like network structure, its ability of modeling relative position and sequential information is weakened. In this paper, we propose a method named Cross Aggregation with an iterative routing-by-agreement algorithm to alleviate these problems. Experimental results on the machine translation task show that our method help the model outperform the strong Transformer baseline significantly.
引用
收藏
页码:380 / 392
页数:13
相关论文
共 50 条
  • [1] Multi-Head Attention for End-to-End Neural Machine Translation
    Fung, Ivan
    Mak, Brian
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 250 - 254
  • [2] Gaussian Multi-head Attention for Simultaneous Machine Translation
    Zhang, Shaolei
    Feng, Yang
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3019 - 3030
  • [3] Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation
    Xu, Hongfei
    Liu, Qiuhui
    van Genabith, Josef
    Xiong, Deyi
    Zhang, Meng
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 273 - 282
  • [4] Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation
    Zhang, Tianfu
    Huang, Heyan
    Feng, Chong
    Cao, Longbing
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3238 - 3248
  • [5] A Reverse Positional Encoding Multi-Head Attention-Based Neural Machine Translation Model for Arabic Dialects
    Baniata, Laith H.
    Kang, Sangwoo
    Ampomah, Isaac K. E.
    [J]. MATHEMATICS, 2022, 10 (19)
  • [6] Generating Diverse Translation by Manipulating Multi-Head Attention
    Sun, Zewei
    Huang, Shujian
    Wei, Hao-Ran
    Dai, Xin-yu
    Chen, Jiajun
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8976 - 8983
  • [7] Information Aggregation for Multi-Head Attention with Routing-by-Agreement
    Li, Jian
    Yang, Baosong
    Dou, Zi-Yi
    Wang, Xing
    Lyu, Michael R.
    Tu, Zhaopeng
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3566 - 3575
  • [8] On the diversity of multi-head attention
    Li, Jian
    Wang, Xing
    Tu, Zhaopeng
    Lyu, Michael R.
    [J]. NEUROCOMPUTING, 2021, 454 : 14 - 24
  • [9] Neural News Recommendation with Multi-Head Self-Attention
    Wu, Chuhan
    Wu, Fangzhao
    Ge, Suyu
    Qi, Tao
    Huang, Yongfeng
    Xie, Xing
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6389 - 6394
  • [10] Dynamic Attention Aggregation with BERT for Neural Machine Translation
    Zhang, JiaRui
    Li, HongZheng
    Shi, ShuMin
    Huang, HeYan
    Hu, Yue
    Wei, XiangPeng
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,