Fusing Object Semantics and Deep Appearance Features for Scene Recognition

被引:36
|
作者
Sun, Ning [1 ]
Li, Wenli [2 ]
Liu, Jixin [1 ]
Han, Guang [1 ]
Wu, Cong [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Engn Res Ctr Wideband Wireless Commun Technol, Minist Educ, Nanjing 210003, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Coll Commun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China
关键词
Comprehensive representation; contextual feature; object semantics; scene recognition; CLASSIFICATION; SCALE;
D O I
10.1109/TCSVT.2018.2848543
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Scene images generally show the characteristics of large intra-class variety and high inter-class similarity because of complicated appearances, subtle differences, and ambiguous categorization. Hence, it is difficult to achieve satisfactory accuracy by using a single representation. For solving this issue, we present a comprehensive representation for scene recognition by fusing deep features extracted from three discriminative views, including the information of object semantics, global appearance, and contextual appearance. These views show diversity and complementarity of features. The object semantics representation of the scene image, denoted by spatial-layout-maintained object semantics features, is extracted from the output of a deep-learning-based multi-classes detector by using spatial fisher vectors, which can simultaneously encode the category and layout information of objects. A multi-direction long short-term memory-based model is built to represent contextual information of the scene image, and the activation of the fully connected layer of a convolutional neural network is used to represent the global appearance of scene image. These three kinds of deep features are then fused to draw a final conclusion for scene recognition. Extensive experiments are conducted to evaluate the proposed comprehensive representation on three benchmarks scene image database. The results show that the three deep features complement to each other strongly and are effective in improving recognition performance after fusion. The proposed method can achieve scene recognition accuracy of 89.51% on the MIT67 database, 78.93% on the SUN397 database, and 57.27% on the Places365 databases, respectively, which are better percentages than the accuracies obtained by the latest reported deep-learning-based scene recognition methods.
引用
收藏
页码:1715 / 1728
页数:14
相关论文
共 50 条
  • [41] Fusing Deep Learned and Hand-Crafted Features of Appearance, Shape, and Dynamics for Automatic Pain Estimation
    Egede, Joy
    Valstar, Michel
    Martinez, Brais
    [J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 689 - 696
  • [42] Scene Recognition with Sequential Object Context
    Wang, Yuelian
    Pan, Wei
    [J]. COMPUTER VISION, PT III, 2017, 773 : 108 - 119
  • [43] The role of object recognition in scene segmentation
    Bravo, MJ
    Farid, H
    [J]. INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2000, 41 (04) : S724 - S724
  • [44] Resnet Features and Optimization Enabled Deep Learning for Indoor Object Detection and Object Recognition
    Anandh, N.
    Gopinath, M. P.
    [J]. CYBERNETICS AND SYSTEMS, 2022, 55 (08) : 2280 - 2307
  • [45] Incorporating Scene Context and Object Layout into Appearance Modeling
    Izadinia, Hamid
    Sadeghi, Fereshteh
    Farhadi, Ali
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 232 - 239
  • [46] A new approach for small sample face recognition with pose variation by fusing Gabor encoding features and deep features
    Guofeng Zou
    Guixia Fu
    Mingliang Gao
    Jinfeng Pan
    Zheng Liu
    [J]. Multimedia Tools and Applications, 2020, 79 : 23571 - 23598
  • [47] Using appearance and context for outdoor scene object classification
    Bosch, A
    Muñoz, X
    Martí, J
    [J]. 2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 1901 - 1904
  • [48] A new approach for small sample face recognition with pose variation by fusing Gabor encoding features and deep features
    Zou, Guofeng
    Fu, Guixia
    Gao, Mingliang
    Pan, Jinfeng
    Liu, Zheng
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (31-32) : 23571 - 23598
  • [49] Learning appearance models for object recognition
    Pope, Arthur R.
    Lowe, David G.
    [J]. Lecture Notes in Computer Science, 1144
  • [50] Local appearance for robust object recognition
    Jugessur, D
    Dudek, G
    [J]. IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, VOL I, 2000, : 834 - 839