Dynamic Parallel Pyramid Networks for Scene Recognition

被引:4
|
作者
Liu, Kai [1 ]
Moon, Seungbin [1 ]
机构
[1] Sejong Univ, Dept Comp Engn, Seoul 05006, South Korea
关键词
Convolution; Kernel; Radio frequency; Task analysis; Spatial resolution; Image recognition; Feature extraction; Convolutional neural networks (CNNs); dynamic networks; feature pyramid; scene recognition; VISUAL-ATTENTION; MODEL;
D O I
10.1109/TNNLS.2021.3129227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene recognition is considered a challenging task of image recognition, mainly due to the presence of multiscale information of global layout and local objects in a given scene. Recent convolutional neural networks (CNNs) that can learn multiscale features have achieved remarkable progress in scene recognition. They have two limitations: 1) the receptive field (RF) size is fixed even though a scene may have large-scale variations and 2) they are computing and memory intensive, partially due to the representation of multiscales. To address these limitations, we propose a lightweight dynamic scene recognition approach based on a novel architectural unit, namely, a dynamic parallel pyramid (DPP) block, that can adaptively select RF size based on multiscale information from the input regarding channel dimensions. We encode multiscale features by applying different convolutional (CONV) kernels on different input tensor channels and then dynamically merge their output using a group attention mechanism followed by channel shuffling to generate the parallel feature pyramid. DPP can be easily incorporated with existing CNNs to develop new deep models, called DPP networks (DPP-Nets). Extensive experiments on large-scale scene image datasets, Places365 standard, Places365 challenge, the Massachusetts Institute of Technology (MIT) Indoor67, and Sun397 confirmed that the proposed method provides significant performance improvement compared with current state-of-the-art (SOTA) approaches. We also verified general applicability from compelling results on lightweight models of MobileNetV2 and ShuffleNetV2 on ImageNet-1k and small object centralized benchmarks on CIFAR-10 and CIFAR-100.
引用
收藏
页码:6591 / 6601
页数:11
相关论文
共 50 条
  • [1] Temporal Residual Networks for Dynamic Scene Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Wildes, Richard P.
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7435 - 7444
  • [2] A Pattern Recognition Method Using Parallel Pyramid Neural Networks
    Yuan, Xue
    Wu, Xiaojin
    Wei, Xueye
    2011 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND AUTOMATION (CCCA 2011), VOL I, 2010, : 284 - 287
  • [3] Attention Pyramid Module for Scene Recognition
    Qiao, Zhinan
    Yuan, Xiaohui
    Zhuang, Chengyuan
    Meyarian, Abolfazl
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7521 - 7528
  • [4] Improved spatial pyramid matching for scene recognition
    Xie, Lin
    Lee, Feifei
    Liu, Li
    Yin, Zhong
    Yan, Yan
    Wang, Weidong
    Zhao, Junjie
    Chen, Qiu
    PATTERN RECOGNITION, 2018, 82 : 118 - 129
  • [5] Attentive Temporal Pyramid Network for Dynamic Scene Classification
    Huang, Yuanjun
    Cao, Xianbin
    Zhen, Xiantong
    Han, Jungong
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8497 - 8504
  • [6] Pyramid Coding for Functional Scene Element Recognition in Video Scenes
    Swears, Eran
    Hoogs, Anthony
    Boyer, Kim
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 345 - 352
  • [7] Scene text detection via decoupled feature pyramid networks
    Liang, Min
    Hou, Jie-Bo
    Zhu, Xiaobin
    Yang, Chun
    Qin, Jingyan
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (3) : 163 - 175
  • [8] Scene text detection via decoupled feature pyramid networks
    Min Liang
    Jie-Bo Hou
    Xiaobin Zhu
    Chun Yang
    Jingyan Qin
    International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 163 - 175
  • [9] Semantic Information Supplementary Pyramid Network for Dynamic Scene Deblurring
    Liu, Yiming
    Luo, Yifei
    Huang, Wenzhuo
    Qiao, Ying
    Li, Junhui
    Xu, Dahong
    Luo, Duqiang
    IEEE ACCESS, 2020, 8 : 188587 - 188599
  • [10] USING PYRAMID OF HISTOGRAM OF ORIENTED GRADIENTS ON NATURAL SCENE TEXT RECOGNITION
    Tan, Zhi Rong
    Tian, Shangxuan
    Tan, Chew Lim
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 2629 - 2633