Cross Aggregation of Multi-head Attention for Neural Machine Translation

被引：2

作者：

Cao, Juncheng ^{[1
,2
,3
]}

Zhao, Hai ^{[1
,2
,3
]}

Yu, Kai ^{[1
,2
,3
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China

[2] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai, Peoples R China

[3] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China

来源：

NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I | 2019年 / 11838卷

基金：

中国国家自然科学基金;

关键词：

Machine translation; Attention mechanism; Information aggregation;

D O I：

10.1007/978-3-030-32233-5_30

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer based encoder has been the state-of-the-art model for the latest neural machine translation, which relies on the key design called self-attention. Multi-head attention of self-attention network (SAN) plays a significant role in extracting information of the given input from different subspaces among each pair of tokens. However, that information captured by each token on a specific head, which is explicitly represented by the attention weights, is independent from other heads and tokens, which means it does not take the global structure into account. Besides, since SAN does not apply an RNN-like network structure, its ability of modeling relative position and sequential information is weakened. In this paper, we propose a method named Cross Aggregation with an iterative routing-by-agreement algorithm to alleviate these problems. Experimental results on the machine translation task show that our method help the model outperform the strong Transformer baseline significantly.

引用

页码：380 / 392

页数：13

共 50 条

[1] Multi-Head Attention for End-to-End Neural Machine Translation
Fung, Ivan
Mak, Brian
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 250 - 254
[2] Gaussian Multi-head Attention for Simultaneous Machine Translation
Zhang, Shaolei
Feng, Yang
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3019 - 3030
[3] Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation
Xu, Hongfei
Liu, Qiuhui
van Genabith, Josef
Xiong, Deyi
Zhang, Meng
[J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 273 - 282
[4] Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation
Zhang, Tianfu
Huang, Heyan
Feng, Chong
Cao, Longbing
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3238 - 3248
[5] A Reverse Positional Encoding Multi-Head Attention-Based Neural Machine Translation Model for Arabic Dialects
Baniata, Laith H.
Kang, Sangwoo
Ampomah, Isaac K. E.
[J]. MATHEMATICS, 2022, 10 (19)
[6] Generating Diverse Translation by Manipulating Multi-Head Attention
Sun, Zewei
Huang, Shujian
Wei, Hao-Ran
Dai, Xin-yu
Chen, Jiajun
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8976 - 8983
[7] Information Aggregation for Multi-Head Attention with Routing-by-Agreement
Li, Jian
Yang, Baosong
Dou, Zi-Yi
Wang, Xing
Lyu, Michael R.
Tu, Zhaopeng
[J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3566 - 3575
[8] On the diversity of multi-head attention
Li, Jian
Wang, Xing
Tu, Zhaopeng
Lyu, Michael R.
[J]. NEUROCOMPUTING, 2021, 454 : 14 - 24
[9] Neural News Recommendation with Multi-Head Self-Attention
Wu, Chuhan
Wu, Fangzhao
Ge, Suyu
Qi, Tao
Huang, Yongfeng
Xie, Xing
[J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6389 - 6394
[10] Dynamic Attention Aggregation with BERT for Neural Machine Translation
Zhang, JiaRui
Li, HongZheng
Shi, ShuMin
Huang, HeYan
Hu, Yue
Wei, XiangPeng
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →