A generic shared attention mechanism for various backbone neural networks

被引：1

作者：

Huang, Zhongzhan ^{[1
]}

Liang, Senwei ^{[2
]}

Liang, Mingfu ^{[3
]}

机构：

[1] Sun Yat Sen Univ, Guangzhou 510275, Peoples R China

[2] Purdue Univ, W Lafayette, IN 47906 USA

[3] Northwestern Univ, Evanston, IL 60201 USA

来源：

NEUROCOMPUTING | 2025年 / 611卷

基金：

中国国家自然科学基金;

关键词：

Layer-wise shared attention mechanism; Parameter sharing; Dense-and-implicit connection; Stable training;

D O I：

10.1016/j.neucom.2024.128697

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The self-attention mechanism is crucial for enhancing various backbone neural networks' performance. However, current methods add self-attention modules (SAMs) to each network layer without fully utilizing their potential, resulting in suboptimal performance and higher parameter consumption as network depth increases. In this paper, we reveal an inherent phenomenon: SAMs produce highly correlated attention maps across layers, with an average Pearson correlation coefficient of 0.85. Inspired by this inherent observation, we propose Dense-and-Implicit Attention (DIA), which shares SAMs across layers and uses a long short-term memory module to calibrate and connect these correlated attention maps, improving parameter efficiency. This approach aligns with the neural network's dynamic system perspective. Extensive experiments show DIA consistently enhances various backbones like ResNet, Transformer, and UNet in tasks such as image classification, object detection, and image generation with diffusion models. Our analysis indicates that DIA's effectiveness stems from its dense inter-layer information connections, absent in conventional mechanisms, stabilizing training and providing regularization effects. This paper's insights advance our understanding of attention mechanisms, optimizing them and paving the way for future developments across diverse neural networks.

引用

页数：14

共 50 条

[31] EPILEPTIC SPIKE DETECTION BY RECURRENT NEURAL NETWORKS WITH SELF-ATTENTION MECHANISM
Fukumori, Kosuke
Yoshida, Noboru
Sugano, Hidenori
Nakajima, Madoka
Tanaka, Toshihisa
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1406 - 1410
[32] Holistic Graph Neural Networks based on a global-based attention mechanism
Rassil, Asmaa
Chougrad, Hiba
Zouaki, Hamid
KNOWLEDGE-BASED SYSTEMS, 2022, 240
[33] Pruning Convolutional Neural Networks with an Attention Mechanism for Remote Sensing Image Classification
Zhang, Shuo
Wu, Gengshen
Gu, Junhua
Han, Jungong
ELECTRONICS, 2020, 9 (08) : 1 - 19
[34] Deep neural networks with attention mechanism for monocular depth estimation on embedded devices
Liu, Siping
Tu, Xiaohan
Xu, Cheng
Li, Renfa
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 131 : 137 - 150
[35] Algorithm for Skeleton Action Recognition by Integrating Attention Mechanism and Convolutional Neural Networks
Liu, Jianhua
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (08) : 604 - 613
[36] Personalized Graph Neural Networks With Attention Mechanism for Session-Aware Recommendation
Zhang, Mengqi
Wu, Shu
Gao, Meng
Jiang, Xin
Xu, Ke
Wang, Liang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (08) : 3946 - 3957
[37] A REGULARIZED ATTENTION MECHANISM FOR GRAPH ATTENTION NETWORKS
Shanthamallu, Uday Shankar
Jayaraman, J. Thiagarajan
Spanias, Andreas
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3372 - 3376
[38] Fully shared convolutional neural networks
Yao Lu
Guangming Lu
Jinxing Li
Zheng Zhang
Yuanrong Xu
Neural Computing and Applications, 2021, 33 : 8635 - 8648
[39] Fully shared convolutional neural networks
Lu, Yao
Lu, Guangming
Li, Jinxing
Zhang, Zheng
Xu, Yuanrong
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (14): : 8635 - 8648
[40] Highly shared Convolutional Neural Networks
Lu, Yao
Lu, Guangming
Zhou, Yicong
Li, Jinxing
Xu, Yuanrong
Zhang, David
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 175

← 1 2 3 4 5 →