Recognizing actions in images by fusing multiple body structure cues

被引:13
|
作者
Li, Yang [1 ]
Li, Kan [1 ]
Wang, Xinxin [1 ]
机构
[1] Beijing Inst Technol, 5 South Zhongguancun St, Beijing 100081, Peoples R China
基金
北京市自然科学基金;
关键词
Image-based action recognition; Convolutional neural network; Body structure cues;
D O I
10.1016/j.patcog.2020.107341
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Visual tracking by fusing multiple cues with context-sensitive reliabilities
    Erdem, Erkut
    Dubuisson, Severine
    Bloch, Isabelle
    PATTERN RECOGNITION, 2012, 45 (05) : 1948 - 1959
  • [22] Recognizing human group action by layered model with multiple cues
    Cheng, Zhongwei
    Qin, Lei
    Huang, Qingming
    Yan, Shuicheng
    Tian, Qi
    NEUROCOMPUTING, 2014, 136 : 124 - 135
  • [23] Jointly registering and fusing images from multiple sensors
    Li, Yinghao
    He, Zhongshi
    Zhu, Hao
    Zhang, Weiwei
    Wu, Yuhao
    INFORMATION FUSION, 2016, 27 : 85 - 94
  • [24] RECOGNIZING ACTIONS VIA SPARSE CODING ON STRUCTURE PROJECTION
    Zhang, Lei
    Wang, Tao
    Zhen, Xiantong
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 2412 - 2415
  • [25] RECASPIA: RECOGNIZING CARRYING ACTIONS IN SINGLE IMAGES USING PRIVILEGED INFORMATION
    Smailis, Christos
    Vrigkas, Michalis
    Kakadiaris, Ioannis A.
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 26 - 30
  • [26] RETRACTED ARTICLE: Image quality tendency modeling by fusing multiple visual cues
    Yiyang Yao
    Tengfei Wu
    Jun Li
    Multimedia Tools and Applications, 2020, 79 : 9643 - 9643
  • [27] Vehicle Tracking by Fusing Multiple Cues in Structured Environments Using Particle Filter
    Rezaee, Hamideh
    Aghagolzadeh, Ali
    Seyedarabi, Hadi
    PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS), 2010, : 999 - 1002
  • [28] Automatic Calibration for Mobile Cameras by Fusing Multiple Relative and Absolute Visual Cues
    Aerts, Maarten
    Six, Erwin
    BELL LABS TECHNICAL JOURNAL, 2012, 16 (04) : 187 - 202
  • [29] Fusing multiple panchromatic and multispectral images using Multivariate regressions
    Jing, LH
    Cheng, QM
    Yan, GS
    Chen, M
    GIS and Spatial Analysis, Vol 1and 2, 2005, : 99 - 104
  • [30] An embedding strategy on fusing multiple image features for data hiding in multiple images
    Yang, Junxue
    Liao, Xin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71