Dynamic Parallel Pyramid Networks for Scene Recognition

被引：4

作者：

Liu, Kai ^{[1
]}

Moon, Seungbin ^{[1
]}

机构：

[1] Sejong Univ, Dept Comp Engn, Seoul 05006, South Korea

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 09期

关键词：

Convolution; Kernel; Radio frequency; Task analysis; Spatial resolution; Image recognition; Feature extraction; Convolutional neural networks (CNNs); dynamic networks; feature pyramid; scene recognition; VISUAL-ATTENTION; MODEL;

D O I：

10.1109/TNNLS.2021.3129227

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene recognition is considered a challenging task of image recognition, mainly due to the presence of multiscale information of global layout and local objects in a given scene. Recent convolutional neural networks (CNNs) that can learn multiscale features have achieved remarkable progress in scene recognition. They have two limitations: 1) the receptive field (RF) size is fixed even though a scene may have large-scale variations and 2) they are computing and memory intensive, partially due to the representation of multiscales. To address these limitations, we propose a lightweight dynamic scene recognition approach based on a novel architectural unit, namely, a dynamic parallel pyramid (DPP) block, that can adaptively select RF size based on multiscale information from the input regarding channel dimensions. We encode multiscale features by applying different convolutional (CONV) kernels on different input tensor channels and then dynamically merge their output using a group attention mechanism followed by channel shuffling to generate the parallel feature pyramid. DPP can be easily incorporated with existing CNNs to develop new deep models, called DPP networks (DPP-Nets). Extensive experiments on large-scale scene image datasets, Places365 standard, Places365 challenge, the Massachusetts Institute of Technology (MIT) Indoor67, and Sun397 confirmed that the proposed method provides significant performance improvement compared with current state-of-the-art (SOTA) approaches. We also verified general applicability from compelling results on lightweight models of MobileNetV2 and ShuffleNetV2 on ImageNet-1k and small object centralized benchmarks on CIFAR-10 and CIFAR-100.

引用

页码：6591 / 6601

页数：11

共 50 条

[1] Temporal Residual Networks for Dynamic Scene Recognition
Feichtenhofer, Christoph
Pinz, Axel
Wildes, Richard P.
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7435 - 7444
[2] A Pattern Recognition Method Using Parallel Pyramid Neural Networks
Yuan, Xue
Wu, Xiaojin
Wei, Xueye
2011 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND AUTOMATION (CCCA 2011), VOL I, 2010, : 284 - 287
[3] Attention Pyramid Module for Scene Recognition
Qiao, Zhinan
Yuan, Xiaohui
Zhuang, Chengyuan
Meyarian, Abolfazl
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7521 - 7528
[4] Improved spatial pyramid matching for scene recognition
Xie, Lin
Lee, Feifei
Liu, Li
Yin, Zhong
Yan, Yan
Wang, Weidong
Zhao, Junjie
Chen, Qiu
PATTERN RECOGNITION, 2018, 82 : 118 - 129
[5] Attentive Temporal Pyramid Network for Dynamic Scene Classification
Huang, Yuanjun
Cao, Xianbin
Zhen, Xiantong
Han, Jungong
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8497 - 8504
[6] Pyramid Coding for Functional Scene Element Recognition in Video Scenes
Swears, Eran
Hoogs, Anthony
Boyer, Kim
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 345 - 352
[7] Scene text detection via decoupled feature pyramid networks
Liang, Min
Hou, Jie-Bo
Zhu, Xiaobin
Yang, Chun
Qin, Jingyan
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (3) : 163 - 175
[8] Scene text detection via decoupled feature pyramid networks
Min Liang
Jie-Bo Hou
Xiaobin Zhu
Chun Yang
Jingyan Qin
International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 163 - 175
[9] Semantic Information Supplementary Pyramid Network for Dynamic Scene Deblurring
Liu, Yiming
Luo, Yifei
Huang, Wenzhuo
Qiao, Ying
Li, Junhui
Xu, Dahong
Luo, Duqiang
IEEE ACCESS, 2020, 8 : 188587 - 188599
[10] USING PYRAMID OF HISTOGRAM OF ORIENTED GRADIENTS ON NATURAL SCENE TEXT RECOGNITION
Tan, Zhi Rong
Tian, Shangxuan
Tan, Chew Lim
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 2629 - 2633

← 1 2 3 4 5 →