Spatial Pyramid Attention for Deep Convolutional Neural Networks

被引：24

作者：

Ma, Xu ^{[1
,2
]}

Guo, Jingda ^{[1
]}

Sansom, Andrew ^{[3
]}

McGuire, Mara ^{[4
]}

Kalaani, Andrew ^{[5
]}

Chen, Qi ^{[1
]}

Tang, Sihai ^{[1
]}

Yang, Qing ^{[1
]}

Fu, Song ^{[1
]}

机构：

[1] Univ North Texas, Dept Comp Sci & Engn, Denton, TX 76203 USA

[2] Nanjing Forestry Univ, Coll Informat Sci & Technol, Nanjing 210037, Peoples R China

[3] Univ North Texas, Dept Math, Denton, TX 76203 USA

[4] Texas A&M Univ Corpus Christi, Dept Math & Stat, Corpus Christi, TX 78412 USA

[5] Georgia Southern Univ, Dept Elect & Comp Engn, Statesboro, GA 30458 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2021年 / 23卷

基金：

美国国家科学基金会;

关键词：

Object detection; Feature extraction; Convolutional codes; Computer architecture; Benchmark testing; Topology; Task analysis; Attention mechanism; convolutional neural network; image classification; object detection; spatial pyramid structure; structural regularization; structural information;

D O I：

10.1109/TMM.2021.3068576

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Attention mechanisms have shown great success in computer vision. However, the commonly used global average pooling in some implementations aggregates a three-dimensional feature map to a one-dimensional attention map, leading a significant loss of structural information in the attention learning. In this article, we present a novel Spatial Pyramid Attention Network (SPANet), which exploits the structural information and channel relationships for better feature representation. SPANet enhances a base network by adding Spatial Pyramid Attention (SPA) blocks laterally. By rethinking the self-attention mechanism design, we further present three topology structures of attention path connection for our SPANet. They can be flexibly applied to various CNN architectures. SPANet is conceptually simple but practically powerful. It uses both structural regularization and structural information to achieve better learning capability. We have comprehensively evaluated the performance of SPANet on four benchmark datasets for different visual tasks. The experimental results show that SPANet significantly improves the recognition accuracy without adding much computation overhead. Using SPANet, we achieve an improvement of 1.6% top-1 classification accuracy on the ImageNet 2012 benchmark based on ResNet50, and SPANet outperforms SENet and other attention methods. SPANet also significantly improves the object detection performance by a clear margin with negligible additional computation overhead. When applying SPANet to RetinaNet based on the ResNet50 backbone, we improve the performance of the baseline model by 2.3 mAP and the enhanced model outperforms SENet and GCNet by 1.1 mAP and 1.7 mAP respectively. The code of SPANet is made publicly available.(1) (1) [Online]. Available: https://github.com/13952522076/SPANet_TMM

引用

下载

页码：3048 / 3058

页数：11

共 50 条

[21] Spatial Pyramid-based Wavelet Embedding Deep Convolutional Neural Network for Semantic Segmentation
Liu, Jin
Liu, Yazhou
Sun, Quansen
PATTERN RECOGNITION, ACPR 2021, PT I, 2022, 13188 : 326 - 337
[22] Deep Convolutional Neural Networks
Gonzalez, Rafael C.
IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (06) : 79 - 87
[23] An Attention Module for Convolutional Neural Networks
Zhu, Baozhou
Hofstee, Peter
Lee, Jinho
Al-Ars, Zaid
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I, 2021, 12891 : 167 - 178
[24] Reparameterized attention for convolutional neural networks
Wu, Yiming
Li, Ruixiang
Yu, Yunlong
Li, Xi
PATTERN RECOGNITION LETTERS, 2022, 164 : 89 - 95
[25] How deep convolutional neural networks lose spatial information with training
Tomasini, Umberto M.
Petrini, Leonardo
Cagnetta, Francesco
Wyart, Matthieu
MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):
[26] A study on denoising with deep convolutional neural networks in spatial heterodyne spectroscopy
Luo, Wei
Ye, Song
Zhang, Ziyang
Liu, Shuang
Xiong, Wei
Wang, Xinqiang
Li, Shu
Wang, Fangyuan
Dong, Baijun
JOURNAL OF QUANTITATIVE SPECTROSCOPY & RADIATIVE TRANSFER, 2024, 316
[27] Temporally Adaptive Common Spatial Patterns with Deep Convolutional Neural Networks
Mousavi, Mahta
de Sa, Virginia R.
2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 4533 - 4536
[28] Multiple Trajectory Prediction with Deep Temporal and Spatial Convolutional Neural Networks
Strohbeck, Jan
Belagiannis, Vasileios
Mueller, Johannes
Schreiber, Marcel
Herrmann, Martin
Wolf, Daniel
Buchholz, Michael
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 1992 - 1998
[29] SPATIAL AND CHANNEL ATTENTION BASED CONVOLUTIONAL NEURAL NETWORKS FOR MODELING NOISY SPEECH
Xu, Sirui
Fosler-Lussier, Eric
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6625 - 6629
[30] Spatial and Channel Dimensions Attention Feature Transfer for Better Convolutional Neural Networks
Tang, Jialiang
Liu, Mingjin
Jiang, Ning
Yu, Wenxin
Yang, Changzheng
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,

← 1 2 3 4 5 →