FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

被引:34
|
作者
Seong, Hongje [1 ]
Hyun, Junhyuk [1 ]
Kim, Euntai [1 ]
机构
[1] Yonsei Univ, Sch Elect & Elect Engn, Seoul 03722, South Korea
基金
新加坡国家研究基金会;
关键词
Scene recognition; convolutional neural network; fusion network; scene coherence; end-to-end trainable;
D O I
10.1109/ACCESS.2020.2989863
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scene recognition is a kind of image recognition problems which is aimed at predicting the category of the place at which the image is taken. In this paper, a new scene recognition method using the convolutional neural network (CNN) is proposed. The proposed method is based on the fusion of the object and the scene information in the given image and the CNN framework is named as FOS (fusion of object and scene) Net. To combine the object and the scene information effectively, a new fusion framework named CCG (correlative context gating) is proposed. In addition, a new loss named scene coherence loss (SCL) is developed to train the FOSNet and to improve the scene recognition performance. The proposed SCL is based on the idea that the scene class does not change all over the image. The proposed FOSNet was experimented with three most popular scene recognition datasets, and their state-of-the-art performance is obtained in two sets: 60.14%; on Places 2 and 90.30%; on MIT indoor 67. The second highest performance of 77.28%; is obtained on SUN 397.
引用
收藏
页码:82066 / 82077
页数:12
相关论文
共 50 条
  • [1] Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
    Busta, Michal
    Neumann, Lukas
    Matas, Jiri
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2223 - 2231
  • [2] An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
    Shi, Baoguang
    Bai, Xiang
    Yao, Cong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (11) : 2298 - 2304
  • [3] AXNet: ApproXimate computing using an end-to-end trainable neural network
    Peng, Zhenghao
    Chen, Xuyang
    Xu, Chengwen
    Jing, Naifeng
    Liang, Xiaoyao
    Lu, Cewu
    Jiang, Li
    [J]. 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) DIGEST OF TECHNICAL PAPERS, 2018,
  • [4] End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB
    [J]. 2021, Institute of Electrical and Electronics Engineers Inc. (2021-May):
  • [5] End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB
    Ainetter, Stefan
    Fraundorfer, Friedrich
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13452 - 13458
  • [6] Trainable Dynamic Subsampling for End-to-End Speech Recognition
    Zhang, Shucong
    Loweimi, Erfan
    Xu, Yumo
    Bell, Peter
    Renals, Steve
    [J]. INTERSPEECH 2019, 2019, : 1413 - 1417
  • [7] End-to-end video subtitle recognition via a deep Residual Neural Network
    Yan, Hongyu
    Xu, Xin
    [J]. PATTERN RECOGNITION LETTERS, 2020, 131 : 368 - 375
  • [8] An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
    Li, Zhenyu
    Zhou, Aiguo
    Shen, Yong
    [J]. SENSORS, 2020, 20 (06)
  • [9] End-to-End Scene Text Recognition
    Wang, Kai
    Babenko, Boris
    Belongie, Serge
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1457 - 1464
  • [10] End-to-end trainable network for superpixel and image segmentation
    Wang, Kai
    Li, Liang
    Zhang, Jiawan
    [J]. Pattern Recognition Letters, 2020, 140 : 135 - 142