Fusing Object Semantics and Deep Appearance Features for Scene Recognition

被引:36
|
作者
Sun, Ning [1 ]
Li, Wenli [2 ]
Liu, Jixin [1 ]
Han, Guang [1 ]
Wu, Cong [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Engn Res Ctr Wideband Wireless Commun Technol, Minist Educ, Nanjing 210003, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Coll Commun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China
关键词
Comprehensive representation; contextual feature; object semantics; scene recognition; CLASSIFICATION; SCALE;
D O I
10.1109/TCSVT.2018.2848543
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Scene images generally show the characteristics of large intra-class variety and high inter-class similarity because of complicated appearances, subtle differences, and ambiguous categorization. Hence, it is difficult to achieve satisfactory accuracy by using a single representation. For solving this issue, we present a comprehensive representation for scene recognition by fusing deep features extracted from three discriminative views, including the information of object semantics, global appearance, and contextual appearance. These views show diversity and complementarity of features. The object semantics representation of the scene image, denoted by spatial-layout-maintained object semantics features, is extracted from the output of a deep-learning-based multi-classes detector by using spatial fisher vectors, which can simultaneously encode the category and layout information of objects. A multi-direction long short-term memory-based model is built to represent contextual information of the scene image, and the activation of the fully connected layer of a convolutional neural network is used to represent the global appearance of scene image. These three kinds of deep features are then fused to draw a final conclusion for scene recognition. Extensive experiments are conducted to evaluate the proposed comprehensive representation on three benchmarks scene image database. The results show that the three deep features complement to each other strongly and are effective in improving recognition performance after fusion. The proposed method can achieve scene recognition accuracy of 89.51% on the MIT67 database, 78.93% on the SUN397 database, and 57.27% on the Places365 databases, respectively, which are better percentages than the accuracies obtained by the latest reported deep-learning-based scene recognition methods.
引用
收藏
页码:1715 / 1728
页数:14
相关论文
共 50 条
  • [1] Fusing Attention Features and Contextual Information for Scene Recognition
    Peng, Yuqing
    Liu, Xianzi
    Wang, Chenxi
    Xiao, Tengfei
    Li, Tiejun
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (03)
  • [2] Learning and Fusing Multimodal Deep Features for Acoustic Scene Categorization
    Yin, Yifang
    Shah, Rajiv Ratn
    Zimmermann, Roger
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1892 - 1900
  • [3] Fusing Facial Shape and Appearance Based Features For Robust Face Recognition
    Essa, Almabrok
    Asari, Vijayan
    [J]. 2017 IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE (NAECON), 2017, : 7 - 10
  • [4] Heterogeneous bag-of-features for object/scene recognition
    Nanni, Loris
    Lumini, Alessandra
    [J]. APPLIED SOFT COMPUTING, 2013, 13 (04) : 2171 - 2178
  • [5] Multimodal Multiobject Tracking by Fusing Deep Appearance Features and Motion Information
    Zhang, Liwei
    Lai, Jiahong
    Zhang, Zenghui
    Deng, Zhen
    He, Bingwei
    He, Yucheng
    [J]. COMPLEXITY, 2020, 2020
  • [6] HPAT indexing for fast object/scene recognition based on local appearance
    Shao, H
    Svoboda, T
    Tuytelaars, T
    Van Gool, L
    [J]. IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 71 - +
  • [7] Fusing dynamic deep learned features and handcrafted features for facial expression recognition
    Fan, Xijian
    Tjahjadi, Tardi
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 65
  • [8] Object Recognition in Inferotemporal Cortex: From Visual Features to Semantics
    Tanaka, Keiji
    [J]. I-PERCEPTION, 2017, 8 : 3 - 3
  • [9] FUSING DEEP LOCAL AND GLOBAL FEATURES FOR REMOTE SENSING IMAGE SCENE CLASSIFICATION
    Yan, Keli
    Mei, Shaohui
    Ma, Mingyang
    Yan, Feng
    [J]. 2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 3029 - 3032
  • [10] Fusing Deep Features by Kernel Collaborative Representation for Remote Sensing Scene Classification
    Chen, Xiaoning
    Ma, Mingyang
    Li, Yong
    Cheng, Wei
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 12429 - 12439