Fusing Object Semantics and Deep Appearance Features for Scene Recognition

被引：36

作者：

Sun, Ning ^{[1
]}

Li, Wenli ^{[2
]}

Liu, Jixin ^{[1
]}

Han, Guang ^{[1
]}

Wu, Cong ^{[1
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Engn Res Ctr Wideband Wireless Commun Technol, Minist Educ, Nanjing 210003, Jiangsu, Peoples R China

[2] Nanjing Univ Posts & Telecommun, Coll Commun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2019年 / 29卷 / 06期

关键词：

Comprehensive representation; contextual feature; object semantics; scene recognition; CLASSIFICATION; SCALE;

D O I：

10.1109/TCSVT.2018.2848543

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Scene images generally show the characteristics of large intra-class variety and high inter-class similarity because of complicated appearances, subtle differences, and ambiguous categorization. Hence, it is difficult to achieve satisfactory accuracy by using a single representation. For solving this issue, we present a comprehensive representation for scene recognition by fusing deep features extracted from three discriminative views, including the information of object semantics, global appearance, and contextual appearance. These views show diversity and complementarity of features. The object semantics representation of the scene image, denoted by spatial-layout-maintained object semantics features, is extracted from the output of a deep-learning-based multi-classes detector by using spatial fisher vectors, which can simultaneously encode the category and layout information of objects. A multi-direction long short-term memory-based model is built to represent contextual information of the scene image, and the activation of the fully connected layer of a convolutional neural network is used to represent the global appearance of scene image. These three kinds of deep features are then fused to draw a final conclusion for scene recognition. Extensive experiments are conducted to evaluate the proposed comprehensive representation on three benchmarks scene image database. The results show that the three deep features complement to each other strongly and are effective in improving recognition performance after fusion. The proposed method can achieve scene recognition accuracy of 89.51% on the MIT67 database, 78.93% on the SUN397 database, and 57.27% on the Places365 databases, respectively, which are better percentages than the accuracies obtained by the latest reported deep-learning-based scene recognition methods.

引用

页码：1715 / 1728

页数：14

共 50 条

[41] Fusing Deep Learned and Hand-Crafted Features of Appearance, Shape, and Dynamics for Automatic Pain Estimation
Egede, Joy
Valstar, Michel
Martinez, Brais
[J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 689 - 696
[42] Scene Recognition with Sequential Object Context
Wang, Yuelian
Pan, Wei
[J]. COMPUTER VISION, PT III, 2017, 773 : 108 - 119
[43] The role of object recognition in scene segmentation
Bravo, MJ
Farid, H
[J]. INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2000, 41 (04) : S724 - S724
[44] Resnet Features and Optimization Enabled Deep Learning for Indoor Object Detection and Object Recognition
Anandh, N.
Gopinath, M. P.
[J]. CYBERNETICS AND SYSTEMS, 2022, 55 (08) : 2280 - 2307
[45] Incorporating Scene Context and Object Layout into Appearance Modeling
Izadinia, Hamid
Sadeghi, Fereshteh
Farhadi, Ali
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 232 - 239
[46] A new approach for small sample face recognition with pose variation by fusing Gabor encoding features and deep features
Guofeng Zou
Guixia Fu
Mingliang Gao
Jinfeng Pan
Zheng Liu
[J]. Multimedia Tools and Applications, 2020, 79 : 23571 - 23598
[47] Using appearance and context for outdoor scene object classification
Bosch, A
Muñoz, X
Martí, J
[J]. 2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 1901 - 1904
[48] A new approach for small sample face recognition with pose variation by fusing Gabor encoding features and deep features
Zou, Guofeng
Fu, Guixia
Gao, Mingliang
Pan, Jinfeng
Liu, Zheng
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (31-32) : 23571 - 23598
[49] Learning appearance models for object recognition
Pope, Arthur R.
Lowe, David G.
[J]. Lecture Notes in Computer Science, 1144
[50] Local appearance for robust object recognition
Jugessur, D
Dudek, G
[J]. IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, VOL I, 2000, : 834 - 839

← 1 2 3 4 5 →