Learning Semantic Feature Map for Visual Content Recognition

被引:4
|
作者
Zhao, Rui-Wei [1 ]
Wu, Zuxuan [2 ]
Li, Jianguo [3 ]
Jiang, Yu-Gang [1 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
[2] Univ Maryland, College Pk, MD 20742 USA
[3] Intel Labs China, Beijing, Peoples R China
关键词
image representation; contextual fusion; image classification; video classification; LATE FUSION;
D O I
10.1145/3123266.3123379
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The spatial relationship among objects provide rich clues to object contexts for visual recognition. In this paper, we propose to learn Semantic Feature Map (SFM) by deep neural networks to model the spatial object contexts for better understanding of image and video contents. Specifically, we first extract high-level semantic object features on input image with convolutional neural networks for every object proposals, and organize them to the designed SFM so that spatial information among objects are preserved. To fully exploit the spatial relationship among objects, we employ either Fully Convolutional Networks (FCN) or Long-Short Term Memory (LSTM) on top of SFM for final recognition. For better training, we also introduce a multi-task learning framework to train the model in an end-to-end manner. It is composed of an overall image classification loss as well as a grid labeling loss, which predicts the objects label at each SFM grid. Extensive experiments are conducted to verify the effectiveness of the proposed approach. For image classification, very promising results are obtained on Pascal VOC 2007/2012 and MS-COCO benchmarks. We also directly transfer the SFM learned on image domain to the video classification task. The results on CCV benchmark demonstrate the robustness and generalization capability of the proposed approach.
引用
收藏
页码:1291 / 1299
页数:9
相关论文
共 50 条
  • [1] Visual Content Recognition by Exploiting Semantic Feature Map with Attention and Multi-task Learning
    Zhao, Rui-Wei
    Zhang, Qi
    Wu, Zuxuan
    Li, Jianguo
    Jiang, Yu-Gang
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [2] Learning Navigational Visual Representations with Semantic Map Supervision
    Hong, Yicong
    Zhou, Yang
    Zhang, Ruiyi
    Dernoncourt, Franck
    Bui, Trung
    Gould, Stephen
    Tan, Hao
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3032 - 3044
  • [3] Semantic Reinforced Attention Learning for Visual Place Recognition
    Peng, Guohao
    Yue, Yufeng
    Zhang, Jun
    Wu, Zhenyu
    Tang, Xiaoyu
    Wang, Danwei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13415 - 13422
  • [4] Learning Semantic Visual Dictionaries: A new Method For Local Feature Encoding
    Shuai, Bing
    Zuo, Zhen
    Wang, Gang
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2015, : 901 - 905
  • [5] Hepatic Lesion Recognition Based on Deep Visual Feature Learning
    Zhai, Shengqing
    Ou, Wenbo
    Yang, Yusi
    Lin, Lan
    [J]. 2019 PHOTONICS & ELECTROMAGNETICS RESEARCH SYMPOSIUM - FALL (PIERS - FALL), 2019, : 1744 - 1748
  • [6] Distributed learning of deep feature embeddings for visual recognition tasks
    Bhattacharjee, B.
    Hill, M. L.
    Wu, H.
    Chandakkar, P. S.
    Smith, J. R.
    Wegman, M. N.
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2017, 61 (4-5)
  • [7] Unsupervised Feature Learning for Visual Place Recognition in Changing Environments
    Zhao, Dongye
    Si, Bailu
    Tang, Fengzhen
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [8] Discriminative feature learning from big data for visual recognition
    Jiang, Zhuolin
    Lin, Zhe
    Ling, Haibin
    Porikli, Fatih
    Shao, Ling
    Turaga, Pavan
    [J]. PATTERN RECOGNITION, 2015, 48 (10) : 2961 - 2963
  • [9] A traffic state recognition model based on feature map and deep learning
    Wang, Chun
    Zhang, Weihua
    Wu, Cong
    Hu, Heng
    Ding, Heng
    Zhu, Wenjia
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2022, 607
  • [10] The presence of semantic content in a visual recognition memory task reduces the severity of neglect
    Moreh, Elior
    Zohary, Ehud
    Orlov, Tanya
    [J]. NEUROPSYCHOLOGIA, 2021, 157