A generic shared attention mechanism for various backbone neural networks

被引:1
|
作者
Huang, Zhongzhan [1 ]
Liang, Senwei [2 ]
Liang, Mingfu [3 ]
机构
[1] Sun Yat Sen Univ, Guangzhou 510275, Peoples R China
[2] Purdue Univ, W Lafayette, IN 47906 USA
[3] Northwestern Univ, Evanston, IL 60201 USA
基金
中国国家自然科学基金;
关键词
Layer-wise shared attention mechanism; Parameter sharing; Dense-and-implicit connection; Stable training;
D O I
10.1016/j.neucom.2024.128697
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The self-attention mechanism is crucial for enhancing various backbone neural networks' performance. However, current methods add self-attention modules (SAMs) to each network layer without fully utilizing their potential, resulting in suboptimal performance and higher parameter consumption as network depth increases. In this paper, we reveal an inherent phenomenon: SAMs produce highly correlated attention maps across layers, with an average Pearson correlation coefficient of 0.85. Inspired by this inherent observation, we propose Dense-and-Implicit Attention (DIA), which shares SAMs across layers and uses a long short-term memory module to calibrate and connect these correlated attention maps, improving parameter efficiency. This approach aligns with the neural network's dynamic system perspective. Extensive experiments show DIA consistently enhances various backbones like ResNet, Transformer, and UNet in tasks such as image classification, object detection, and image generation with diffusion models. Our analysis indicates that DIA's effectiveness stems from its dense inter-layer information connections, absent in conventional mechanisms, stabilizing training and providing regularization effects. This paper's insights advance our understanding of attention mechanisms, optimizing them and paving the way for future developments across diverse neural networks.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] EPILEPTIC SPIKE DETECTION BY RECURRENT NEURAL NETWORKS WITH SELF-ATTENTION MECHANISM
    Fukumori, Kosuke
    Yoshida, Noboru
    Sugano, Hidenori
    Nakajima, Madoka
    Tanaka, Toshihisa
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1406 - 1410
  • [32] Holistic Graph Neural Networks based on a global-based attention mechanism
    Rassil, Asmaa
    Chougrad, Hiba
    Zouaki, Hamid
    KNOWLEDGE-BASED SYSTEMS, 2022, 240
  • [33] Pruning Convolutional Neural Networks with an Attention Mechanism for Remote Sensing Image Classification
    Zhang, Shuo
    Wu, Gengshen
    Gu, Junhua
    Han, Jungong
    ELECTRONICS, 2020, 9 (08) : 1 - 19
  • [34] Deep neural networks with attention mechanism for monocular depth estimation on embedded devices
    Liu, Siping
    Tu, Xiaohan
    Xu, Cheng
    Li, Renfa
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 131 : 137 - 150
  • [35] Algorithm for Skeleton Action Recognition by Integrating Attention Mechanism and Convolutional Neural Networks
    Liu, Jianhua
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (08) : 604 - 613
  • [36] Personalized Graph Neural Networks With Attention Mechanism for Session-Aware Recommendation
    Zhang, Mengqi
    Wu, Shu
    Gao, Meng
    Jiang, Xin
    Xu, Ke
    Wang, Liang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (08) : 3946 - 3957
  • [37] A REGULARIZED ATTENTION MECHANISM FOR GRAPH ATTENTION NETWORKS
    Shanthamallu, Uday Shankar
    Jayaraman, J. Thiagarajan
    Spanias, Andreas
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3372 - 3376
  • [38] Fully shared convolutional neural networks
    Yao Lu
    Guangming Lu
    Jinxing Li
    Zheng Zhang
    Yuanrong Xu
    Neural Computing and Applications, 2021, 33 : 8635 - 8648
  • [39] Fully shared convolutional neural networks
    Lu, Yao
    Lu, Guangming
    Li, Jinxing
    Zhang, Zheng
    Xu, Yuanrong
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (14): : 8635 - 8648
  • [40] Highly shared Convolutional Neural Networks
    Lu, Yao
    Lu, Guangming
    Zhou, Yicong
    Li, Jinxing
    Xu, Yuanrong
    Zhang, David
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 175