Adaptive Pruning for Multi-Head Self-Attention

被引:0
|
作者
Messaoud, Walid [1 ]
Trabelsi, Rim [2 ]
Cabani, Adnane [3 ]
Abdelkefi, Fatma [1 ]
机构
[1] Carthage Univ, Supcom Lab MEDIATRON, Ariana, Tunisia
[2] Univ Gabes, Natl Engn Sch Gabes, Hatem Bettaher IResCoMath Res Unit, Gabes, Tunisia
[3] Normandie Univ, UNIROUEN, ESIGELEC, IRSEEM, Rouen, France
来源
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II | 2023年 / 14126卷
关键词
object detection; Multi-head attention mechanism; head's pruning;
D O I
10.1007/978-3-031-42508-0_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper provides an adaptive pruning approach to compress Multi-Head Self Attention (MHSA) models. The main aim is to suppress redundant attention heads, requiring high computational complexity, without substantially affecting performance. Through head pruning, we propose more flexible and efficient models for the object detection tasks. Specifically, we propose to enhance the architectures of the two state-of-the-art MHSA-based models: Bottleneck Transformers (BoTNet) and Attention Augmented Convolutional Networks (AACN). Our approach relies on the alternation between the escalation and the ablation of heads. We selected the less productive heads. We suggest to exploit two and four heads, rather than four and eight heads for BotNet and AACN models, respectively. Our experiments on ImageNet and Pascal VOC datasets prove that our light-weighted architectures are more efficient compared with the original heavy-weighted ones. We reach close performances and achieve faster convergence during training, which allows easier transfer and deployment.
引用
收藏
页码:48 / 57
页数:10
相关论文
共 50 条
  • [1] Neural News Recommendation with Multi-Head Self-Attention
    Wu, Chuhan
    Wu, Fangzhao
    Ge, Suyu
    Qi, Tao
    Huang, Yongfeng
    Xie, Xing
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6389 - 6394
  • [2] Masked multi-head self-attention for causal speech enhancement
    Nicolson, Aaron
    Paliwal, Kuldip K.
    SPEECH COMMUNICATION, 2020, 125 : 80 - 96
  • [3] Multi-modal multi-head self-attention for medical VQA
    Joshi, Vasudha
    Mitra, Pabitra
    Bose, Supratik
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 42585 - 42608
  • [4] An adaptive multi-head self-attention coupled with attention filtered LSTM for advanced scene text recognition
    Selvam, Prabu
    Kumar, S. N.
    Kannadhasan, S.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2025,
  • [5] Multi-head enhanced self-attention network for novelty detection
    Zhang, Yingying
    Gong, Yuxin
    Zhu, Haogang
    Bai, Xiao
    Tang, Wenzhong
    PATTERN RECOGNITION, 2020, 107
  • [6] Neural Linguistic Steganalysis via Multi-Head Self-Attention
    Jiao, Sai-Mei
    Wang, Hai-feng
    Zhang, Kun
    Hu, Ya-qi
    JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2021, 2021 (2021)
  • [7] Epilepsy detection based on multi-head self-attention mechanism
    Ru, Yandong
    An, Gaoyang
    Wei, Zheng
    Chen, Hongming
    PLOS ONE, 2024, 19 (06):
  • [8] Multi-modal multi-head self-attention for medical VQA
    Vasudha Joshi
    Pabitra Mitra
    Supratik Bose
    Multimedia Tools and Applications, 2024, 83 : 42585 - 42608
  • [9] Personalized News Recommendation with CNN and Multi-Head Self-Attention
    Li, Aibin
    He, Tingnian
    Guo, Yi
    Li, Zhuoran
    Rong, Yixuan
    Liu, Guoqi
    2022 IEEE 13TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2022, : 102 - 108
  • [10] Personalized multi-head self-attention network for news recommendation
    Zheng, Cong
    Song, Yixuan
    NEURAL NETWORKS, 2025, 181