Spatial Pyramid Attention for Deep Convolutional Neural Networks

被引:24
|
作者
Ma, Xu [1 ,2 ]
Guo, Jingda [1 ]
Sansom, Andrew [3 ]
McGuire, Mara [4 ]
Kalaani, Andrew [5 ]
Chen, Qi [1 ]
Tang, Sihai [1 ]
Yang, Qing [1 ]
Fu, Song [1 ]
机构
[1] Univ North Texas, Dept Comp Sci & Engn, Denton, TX 76203 USA
[2] Nanjing Forestry Univ, Coll Informat Sci & Technol, Nanjing 210037, Peoples R China
[3] Univ North Texas, Dept Math, Denton, TX 76203 USA
[4] Texas A&M Univ Corpus Christi, Dept Math & Stat, Corpus Christi, TX 78412 USA
[5] Georgia Southern Univ, Dept Elect & Comp Engn, Statesboro, GA 30458 USA
基金
美国国家科学基金会;
关键词
Object detection; Feature extraction; Convolutional codes; Computer architecture; Benchmark testing; Topology; Task analysis; Attention mechanism; convolutional neural network; image classification; object detection; spatial pyramid structure; structural regularization; structural information;
D O I
10.1109/TMM.2021.3068576
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Attention mechanisms have shown great success in computer vision. However, the commonly used global average pooling in some implementations aggregates a three-dimensional feature map to a one-dimensional attention map, leading a significant loss of structural information in the attention learning. In this article, we present a novel Spatial Pyramid Attention Network (SPANet), which exploits the structural information and channel relationships for better feature representation. SPANet enhances a base network by adding Spatial Pyramid Attention (SPA) blocks laterally. By rethinking the self-attention mechanism design, we further present three topology structures of attention path connection for our SPANet. They can be flexibly applied to various CNN architectures. SPANet is conceptually simple but practically powerful. It uses both structural regularization and structural information to achieve better learning capability. We have comprehensively evaluated the performance of SPANet on four benchmark datasets for different visual tasks. The experimental results show that SPANet significantly improves the recognition accuracy without adding much computation overhead. Using SPANet, we achieve an improvement of 1.6% top-1 classification accuracy on the ImageNet 2012 benchmark based on ResNet50, and SPANet outperforms SENet and other attention methods. SPANet also significantly improves the object detection performance by a clear margin with negligible additional computation overhead. When applying SPANet to RetinaNet based on the ResNet50 backbone, we improve the performance of the baseline model by 2.3 mAP and the enhanced model outperforms SENet and GCNet by 1.1 mAP and 1.7 mAP respectively. The code of SPANet is made publicly available.(1) (1) [Online]. Available: https://github.com/13952522076/SPANet_TMM
引用
下载
收藏
页码:3048 / 3058
页数:11
相关论文
共 50 条
  • [21] Spatial Pyramid-based Wavelet Embedding Deep Convolutional Neural Network for Semantic Segmentation
    Liu, Jin
    Liu, Yazhou
    Sun, Quansen
    PATTERN RECOGNITION, ACPR 2021, PT I, 2022, 13188 : 326 - 337
  • [22] Deep Convolutional Neural Networks
    Gonzalez, Rafael C.
    IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (06) : 79 - 87
  • [23] An Attention Module for Convolutional Neural Networks
    Zhu, Baozhou
    Hofstee, Peter
    Lee, Jinho
    Al-Ars, Zaid
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I, 2021, 12891 : 167 - 178
  • [24] Reparameterized attention for convolutional neural networks
    Wu, Yiming
    Li, Ruixiang
    Yu, Yunlong
    Li, Xi
    PATTERN RECOGNITION LETTERS, 2022, 164 : 89 - 95
  • [25] How deep convolutional neural networks lose spatial information with training
    Tomasini, Umberto M.
    Petrini, Leonardo
    Cagnetta, Francesco
    Wyart, Matthieu
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):
  • [26] A study on denoising with deep convolutional neural networks in spatial heterodyne spectroscopy
    Luo, Wei
    Ye, Song
    Zhang, Ziyang
    Liu, Shuang
    Xiong, Wei
    Wang, Xinqiang
    Li, Shu
    Wang, Fangyuan
    Dong, Baijun
    JOURNAL OF QUANTITATIVE SPECTROSCOPY & RADIATIVE TRANSFER, 2024, 316
  • [27] Temporally Adaptive Common Spatial Patterns with Deep Convolutional Neural Networks
    Mousavi, Mahta
    de Sa, Virginia R.
    2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 4533 - 4536
  • [28] Multiple Trajectory Prediction with Deep Temporal and Spatial Convolutional Neural Networks
    Strohbeck, Jan
    Belagiannis, Vasileios
    Mueller, Johannes
    Schreiber, Marcel
    Herrmann, Martin
    Wolf, Daniel
    Buchholz, Michael
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 1992 - 1998
  • [29] SPATIAL AND CHANNEL ATTENTION BASED CONVOLUTIONAL NEURAL NETWORKS FOR MODELING NOISY SPEECH
    Xu, Sirui
    Fosler-Lussier, Eric
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6625 - 6629
  • [30] Spatial and Channel Dimensions Attention Feature Transfer for Better Convolutional Neural Networks
    Tang, Jialiang
    Liu, Mingjin
    Jiang, Ning
    Yu, Wenxin
    Yang, Changzheng
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,