Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation

被引：0

作者：

Zhang, Tianfu ^{[1
,2
]}

Huang, Heyan ^{[1
,2
]}

Feng, Chong ^{[1
,3
]}

Cao, Longbing ^{[4
]}

机构：

[1] Beijing Inst Technol, Beijing, Peoples R China

[2] Key Lab MIIT, Intelligent Informat Proc & Contents Comp, Beijing, Peoples R China

[3] BIT, Southeast Informat Technol Res Inst, Beijing, Peoples R China

[4] Univ Technol Sydney, Ultimo, Australia

来源：

2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-head self-attention recently attracts enormous interest owing to its specialized functions, significant parallelizable computation, and flexible extensibility. However, very recent empirical studies show that some self-attention heads make little contribution and can be pruned as redundant heads. This work takes a novel perspective of identifying and then vitalizing redundant heads. We propose a redundant head enlivening (RHE) method to precisely identify redundant heads, and then vitalize their potential by learning syntactic relations and prior knowledge in text without sacrificing the roles of important heads. Two novel syntax-enhanced attention (SEA) mechanisms: a dependency mask bias and a relative local-phrasal position bias, are introduced to revise self-attention distributions for syntactic enhancement in machine translation. The importance of individual heads is dynamically evaluated during the redundant heads identification, on which we apply SEA to vitalize redundant heads while maintaining the strength of important heads. Experimental results on WMT14 and WMT16 English -> German and English -> Czech language machine translation validate the RHE effectiveness.

引用

页码：3238 / 3248

页数：11

共 50 条

[1] Adaptive Pruning for Multi-Head Self-Attention
Messaoud, Walid
Trabelsi, Rim
Cabani, Adnane
Abdelkefi, Fatma
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II, 2023, 14126 : 48 - 57
[2] Neural News Recommendation with Multi-Head Self-Attention
Wu, Chuhan
Wu, Fangzhao
Ge, Suyu
Qi, Tao
Huang, Yongfeng
Xie, Xing
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6389 - 6394
[3] Gaussian Multi-head Attention for Simultaneous Machine Translation
Zhang, Shaolei
Feng, Yang
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3019 - 3030
[4] A small object detection architecture with concatenated detection heads and multi-head mixed self-attention mechanism
Mu, Jianhong
Su, Qinghua
Wang, Xiyu
Liang, Wenhui
Xu, Sheng
Wan, Kaizheng
Journal of Real-Time Image Processing, 2024, 21 (06)
[5] Masked multi-head self-attention for causal speech enhancement
Nicolson, Aaron
Paliwal, Kuldip K.
SPEECH COMMUNICATION, 2020, 125 : 80 - 96
[6] Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Voita, Elena
Talbot, David
Moiseev, Fedor
Sennrich, Rico
Titov, Ivan
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5797 - 5808
[7] Multi-modal multi-head self-attention for medical VQA
Joshi, Vasudha
Mitra, Pabitra
Bose, Supratik
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 42585 - 42608
[8] Multi-head enhanced self-attention network for novelty detection
Zhang, Yingying
Gong, Yuxin
Zhu, Haogang
Bai, Xiao
Tang, Wenzhong
PATTERN RECOGNITION, 2020, 107
[9] Epilepsy detection based on multi-head self-attention mechanism
Ru, Yandong
An, Gaoyang
Wei, Zheng
Chen, Hongming
PLOS ONE, 2024, 19 (06):
[10] Neural Linguistic Steganalysis via Multi-Head Self-Attention
Jiao, Sai-Mei
Wang, Hai-feng
Zhang, Kun
Hu, Ya-qi
JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2021, 2021 (2021)

← 1 2 3 4 5 →