Dynamic Parallel Pyramid Networks for Scene Recognition

被引:4
|
作者
Liu, Kai [1 ]
Moon, Seungbin [1 ]
机构
[1] Sejong Univ, Dept Comp Engn, Seoul 05006, South Korea
关键词
Convolution; Kernel; Radio frequency; Task analysis; Spatial resolution; Image recognition; Feature extraction; Convolutional neural networks (CNNs); dynamic networks; feature pyramid; scene recognition; VISUAL-ATTENTION; MODEL;
D O I
10.1109/TNNLS.2021.3129227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene recognition is considered a challenging task of image recognition, mainly due to the presence of multiscale information of global layout and local objects in a given scene. Recent convolutional neural networks (CNNs) that can learn multiscale features have achieved remarkable progress in scene recognition. They have two limitations: 1) the receptive field (RF) size is fixed even though a scene may have large-scale variations and 2) they are computing and memory intensive, partially due to the representation of multiscales. To address these limitations, we propose a lightweight dynamic scene recognition approach based on a novel architectural unit, namely, a dynamic parallel pyramid (DPP) block, that can adaptively select RF size based on multiscale information from the input regarding channel dimensions. We encode multiscale features by applying different convolutional (CONV) kernels on different input tensor channels and then dynamically merge their output using a group attention mechanism followed by channel shuffling to generate the parallel feature pyramid. DPP can be easily incorporated with existing CNNs to develop new deep models, called DPP networks (DPP-Nets). Extensive experiments on large-scale scene image datasets, Places365 standard, Places365 challenge, the Massachusetts Institute of Technology (MIT) Indoor67, and Sun397 confirmed that the proposed method provides significant performance improvement compared with current state-of-the-art (SOTA) approaches. We also verified general applicability from compelling results on lightweight models of MobileNetV2 and ShuffleNetV2 on ImageNet-1k and small object centralized benchmarks on CIFAR-10 and CIFAR-100.
引用
收藏
页码:6591 / 6601
页数:11
相关论文
共 50 条
  • [21] DEEP NEURAL NETWORKS FOR AUDIO SCENE RECOGNITION
    Petetin, Yohan
    Laroche, Cyrille
    Mayoue, Aurelien
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 125 - 129
  • [22] Pyramid Scene Parsing Network
    Zhao, Hengshuang
    Shi, Jianping
    Qi, Xiaojuan
    Wang, Xiaogang
    Jia, Jiaya
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6230 - 6239
  • [23] Acoustic Scene Classification Using Spatial Pyramid Pooling With Convolutional Neural Networks
    Basbug, Ahmet Melih
    Sert, Mustafa
    2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 128 - 131
  • [24] PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
    Qiao, Zhi
    Zhou, Yu
    Wei, Jin
    Wang, Wei
    Zhang, Yuan
    Jiang, Ning
    Wang, Hongbin
    Wang, Weiping
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2046 - 2055
  • [25] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    COMPUTER VISION - ECCV 2014, PT III, 2014, 8691 : 346 - 361
  • [26] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (09) : 1904 - 1916
  • [27] Parallel neural networks for speech recognition
    Lee, BJ
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 2093 - 2097
  • [28] Dynamic receptive field adaptation for scene text recognition
    Tian, Shu
    Zhu, Kang-Xi
    Qin, Hai-Bo
    Yang, Chun
    PATTERN RECOGNITION LETTERS, 2024, 178 : 55 - 61
  • [29] Spacetime Forests with Complementary Features for Dynamic Scene Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Wildes, Richard P.
    PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
  • [30] A Trajectory-Based Method for Dynamic Scene Recognition
    Peng, Xiaoming
    Bouzerdoum, Abdesselam
    Phung, Son Lam
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (10)