Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation

被引:0
|
作者
Zhang, Tianfu [1 ,2 ]
Huang, Heyan [1 ,2 ]
Feng, Chong [1 ,3 ]
Cao, Longbing [4 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Key Lab MIIT, Intelligent Informat Proc & Contents Comp, Beijing, Peoples R China
[3] BIT, Southeast Informat Technol Res Inst, Beijing, Peoples R China
[4] Univ Technol Sydney, Ultimo, Australia
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-head self-attention recently attracts enormous interest owing to its specialized functions, significant parallelizable computation, and flexible extensibility. However, very recent empirical studies show that some self-attention heads make little contribution and can be pruned as redundant heads. This work takes a novel perspective of identifying and then vitalizing redundant heads. We propose a redundant head enlivening (RHE) method to precisely identify redundant heads, and then vitalize their potential by learning syntactic relations and prior knowledge in text without sacrificing the roles of important heads. Two novel syntax-enhanced attention (SEA) mechanisms: a dependency mask bias and a relative local-phrasal position bias, are introduced to revise self-attention distributions for syntactic enhancement in machine translation. The importance of individual heads is dynamically evaluated during the redundant heads identification, on which we apply SEA to vitalize redundant heads while maintaining the strength of important heads. Experimental results on WMT14 and WMT16 English -> German and English -> Czech language machine translation validate the RHE effectiveness.
引用
收藏
页码:3238 / 3248
页数:11
相关论文
共 50 条
  • [1] Adaptive Pruning for Multi-Head Self-Attention
    Messaoud, Walid
    Trabelsi, Rim
    Cabani, Adnane
    Abdelkefi, Fatma
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II, 2023, 14126 : 48 - 57
  • [2] Neural News Recommendation with Multi-Head Self-Attention
    Wu, Chuhan
    Wu, Fangzhao
    Ge, Suyu
    Qi, Tao
    Huang, Yongfeng
    Xie, Xing
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6389 - 6394
  • [3] Gaussian Multi-head Attention for Simultaneous Machine Translation
    Zhang, Shaolei
    Feng, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3019 - 3030
  • [4] A small object detection architecture with concatenated detection heads and multi-head mixed self-attention mechanism
    Mu, Jianhong
    Su, Qinghua
    Wang, Xiyu
    Liang, Wenhui
    Xu, Sheng
    Wan, Kaizheng
    Journal of Real-Time Image Processing, 2024, 21 (06)
  • [5] Masked multi-head self-attention for causal speech enhancement
    Nicolson, Aaron
    Paliwal, Kuldip K.
    SPEECH COMMUNICATION, 2020, 125 : 80 - 96
  • [6] Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
    Voita, Elena
    Talbot, David
    Moiseev, Fedor
    Sennrich, Rico
    Titov, Ivan
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5797 - 5808
  • [7] Multi-modal multi-head self-attention for medical VQA
    Joshi, Vasudha
    Mitra, Pabitra
    Bose, Supratik
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 42585 - 42608
  • [8] Multi-head enhanced self-attention network for novelty detection
    Zhang, Yingying
    Gong, Yuxin
    Zhu, Haogang
    Bai, Xiao
    Tang, Wenzhong
    PATTERN RECOGNITION, 2020, 107
  • [9] Epilepsy detection based on multi-head self-attention mechanism
    Ru, Yandong
    An, Gaoyang
    Wei, Zheng
    Chen, Hongming
    PLOS ONE, 2024, 19 (06):
  • [10] Neural Linguistic Steganalysis via Multi-Head Self-Attention
    Jiao, Sai-Mei
    Wang, Hai-feng
    Zhang, Kun
    Hu, Ya-qi
    JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2021, 2021 (2021)