Learning Semantic Feature Map for Visual Content Recognition

被引:4
|
作者
Zhao, Rui-Wei [1 ]
Wu, Zuxuan [2 ]
Li, Jianguo [3 ]
Jiang, Yu-Gang [1 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
[2] Univ Maryland, College Pk, MD 20742 USA
[3] Intel Labs China, Beijing, Peoples R China
关键词
image representation; contextual fusion; image classification; video classification; LATE FUSION;
D O I
10.1145/3123266.3123379
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The spatial relationship among objects provide rich clues to object contexts for visual recognition. In this paper, we propose to learn Semantic Feature Map (SFM) by deep neural networks to model the spatial object contexts for better understanding of image and video contents. Specifically, we first extract high-level semantic object features on input image with convolutional neural networks for every object proposals, and organize them to the designed SFM so that spatial information among objects are preserved. To fully exploit the spatial relationship among objects, we employ either Fully Convolutional Networks (FCN) or Long-Short Term Memory (LSTM) on top of SFM for final recognition. For better training, we also introduce a multi-task learning framework to train the model in an end-to-end manner. It is composed of an overall image classification loss as well as a grid labeling loss, which predicts the objects label at each SFM grid. Extensive experiments are conducted to verify the effectiveness of the proposed approach. For image classification, very promising results are obtained on Pascal VOC 2007/2012 and MS-COCO benchmarks. We also directly transfer the SFM learned on image domain to the video classification task. The results on CCV benchmark demonstrate the robustness and generalization capability of the proposed approach.
引用
收藏
页码:1291 / 1299
页数:9
相关论文
共 50 条
  • [31] Semantic-aware visual attributes learning for zero-shot recognition
    Xie, Yurui
    Song, Tiecheng
    Li, Wei
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 74
  • [32] Spatial attention based visual semantic learning for action recognition in still images
    Zheng, Yunpeng
    Zheng, Xiangtao
    Lu, Xiaoqiang
    Wu, Siyuan
    [J]. NEUROCOMPUTING, 2020, 413 : 383 - 396
  • [33] Semantic-aware visual attributes learning for zero-shot recognition
    Xie, Yurui
    Song, Tiecheng
    Li, Wei
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 74
  • [34] Semantic-aware visual attributes learning for zero-shot recognition
    Xie, Yurui
    Song, Tiecheng
    Li, Wei
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 74
  • [35] Vical: Visual cognitive architecture for concepts learning to understanding semantic image content
    Ali Y.M.B.
    [J]. Advances in Intelligent and Soft Computing, 2010, 84 : 15 - 30
  • [36] Visual-semantic network: a visual and semantic enhanced model for gesture recognition
    Yizhe Wang
    Congqi Cao
    Yanning Zhang
    [J]. Visual Intelligence, 1 (1):
  • [37] Feature activation during word recognition: action, visual, and associative-semantic priming effects
    Lam, Kevin J. Y.
    Dijkstra, Ton
    Rueschemeyer, Shirley-Ann
    [J]. FRONTIERS IN PSYCHOLOGY, 2015, 6
  • [38] A Self-Learning Map-Seeking Circuit For Visual Object Recognition
    Shukla, Rohit
    Lipasti, Mikko
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [39] Biologically Inspired Model for Visual Cognition Achieving Unsupervised Episodic and Semantic Feature Learning
    Qiao, Hong
    Li, Yinlin
    Li, Fengfu
    Xi, Xuanyang
    Wu, Wei
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (10) : 2335 - 2347
  • [40] Scene recognition by semantic visual words
    Farahzadeh, Elahe
    Cham, Tat-Jen
    Sluzek, Andrzej
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2015, 9 (08) : 1935 - 1944