Spatial Pyramid Attention for Deep Convolutional Neural Networks

被引:24
|
作者
Ma, Xu [1 ,2 ]
Guo, Jingda [1 ]
Sansom, Andrew [3 ]
McGuire, Mara [4 ]
Kalaani, Andrew [5 ]
Chen, Qi [1 ]
Tang, Sihai [1 ]
Yang, Qing [1 ]
Fu, Song [1 ]
机构
[1] Univ North Texas, Dept Comp Sci & Engn, Denton, TX 76203 USA
[2] Nanjing Forestry Univ, Coll Informat Sci & Technol, Nanjing 210037, Peoples R China
[3] Univ North Texas, Dept Math, Denton, TX 76203 USA
[4] Texas A&M Univ Corpus Christi, Dept Math & Stat, Corpus Christi, TX 78412 USA
[5] Georgia Southern Univ, Dept Elect & Comp Engn, Statesboro, GA 30458 USA
基金
美国国家科学基金会;
关键词
Object detection; Feature extraction; Convolutional codes; Computer architecture; Benchmark testing; Topology; Task analysis; Attention mechanism; convolutional neural network; image classification; object detection; spatial pyramid structure; structural regularization; structural information;
D O I
10.1109/TMM.2021.3068576
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Attention mechanisms have shown great success in computer vision. However, the commonly used global average pooling in some implementations aggregates a three-dimensional feature map to a one-dimensional attention map, leading a significant loss of structural information in the attention learning. In this article, we present a novel Spatial Pyramid Attention Network (SPANet), which exploits the structural information and channel relationships for better feature representation. SPANet enhances a base network by adding Spatial Pyramid Attention (SPA) blocks laterally. By rethinking the self-attention mechanism design, we further present three topology structures of attention path connection for our SPANet. They can be flexibly applied to various CNN architectures. SPANet is conceptually simple but practically powerful. It uses both structural regularization and structural information to achieve better learning capability. We have comprehensively evaluated the performance of SPANet on four benchmark datasets for different visual tasks. The experimental results show that SPANet significantly improves the recognition accuracy without adding much computation overhead. Using SPANet, we achieve an improvement of 1.6% top-1 classification accuracy on the ImageNet 2012 benchmark based on ResNet50, and SPANet outperforms SENet and other attention methods. SPANet also significantly improves the object detection performance by a clear margin with negligible additional computation overhead. When applying SPANet to RetinaNet based on the ResNet50 backbone, we improve the performance of the baseline model by 2.3 mAP and the enhanced model outperforms SENet and GCNet by 1.1 mAP and 1.7 mAP respectively. The code of SPANet is made publicly available.(1) (1) [Online]. Available: https://github.com/13952522076/SPANet_TMM
引用
收藏
页码:3048 / 3058
页数:11
相关论文
共 50 条
  • [1] Spatial Channel Attention for Deep Convolutional Neural Networks
    Liu, Tonglai
    Luo, Ronghai
    Xu, Longqin
    Feng, Dachun
    Cao, Liang
    Liu, Shuangyin
    Guo, Jianjun
    [J]. MATHEMATICS, 2022, 10 (10)
  • [2] Deep Pyramid Convolutional Neural Networks for Text Categorization
    Johnson, Rie
    Zhang, Tong
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 562 - 570
  • [3] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. COMPUTER VISION - ECCV 2014, PT III, 2014, 8691 : 346 - 361
  • [4] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (09) : 1904 - 1916
  • [5] Spatial Decomposition and Aggregation for Attention in Convolutional Neural Networks
    Zhu, Meng
    Min, Weidong
    Xiang, Hongyue
    Zha, Cheng
    Huang, Zheng
    Li, Longfei
    Fu, Qiyan
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [6] Spatial pyramid pooling in deep convolutional networks for automatic tuberculosis diagnosis
    Msonda P.
    Uymaz S.A.
    Karaaǧaç S.S.
    [J]. Uymaz, Sait Ali (sauymaz@ktun.edu.tr), 1600, International Information and Engineering Technology Association (37): : 1075 - 1084
  • [7] Spatial Pyramid Pooling in Deep Convolutional Networks for Automatic Tuberculosis Diagnosis
    Msonda, Pike
    Uymaz, Sait Ali
    Karaagac, Seda Sogukpinar
    [J]. TRAITEMENT DU SIGNAL, 2020, 37 (06) : 1075 - 1084
  • [8] TEXT DETECTION BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH SPATIAL PYRAMID POOLING
    Zhu, Rui
    Mao, Xiao-Jiao
    Zhu, Qi-Hai
    Li, Ning
    Yang, Yu-Bin
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 1032 - 1036
  • [9] A New Cyclic Spatial Attention Module for Convolutional Neural Networks
    Li Daihui
    Zeng Shangyou
    Li Wenhui
    Yang Lei
    [J]. 2019 IEEE 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2019), 2019, : 607 - 611
  • [10] Handwritten Word Image Categorization with Convolutional Neural Networks and Spatial Pyramid Pooling
    Ignacio Toledo, J.
    Sudholt, Sebastian
    Fornes, Alicia
    Cucurull, Jordi
    Fink, Gernot A.
    Llados, Josep
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2016, 2016, 10029 : 543 - 552