Recognizing actions in images by fusing multiple body structure cues

被引:13
|
作者
Li, Yang [1 ]
Li, Kan [1 ]
Wang, Xinxin [1 ]
机构
[1] Beijing Inst Technol, 5 South Zhongguancun St, Beijing 100081, Peoples R China
基金
北京市自然科学基金;
关键词
Image-based action recognition; Convolutional neural network; Body structure cues;
D O I
10.1016/j.patcog.2020.107341
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] RECOGNIZING AND LOCATING A KNOWN OBJECT FROM MULTIPLE IMAGES
    NAGATA, T
    ZHA, HB
    IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1991, 7 (04): : 434 - 448
  • [32] Fusing images with multiple focuses using support vector machines
    Li, ST
    Kwok, JT
    Wang, YN
    ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 1287 - 1292
  • [33] Predicting Body Movement and Recognizing Actions: an Integrated Framework for Mutual Benefits
    Wang, Boyu
    Minh Hoai
    PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 341 - 348
  • [34] Recognizing and discovering human actions from on-body sensor data
    Minnen, D
    Starner, T
    Ward, JA
    Lukowicz, P
    Tröster, G
    2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 1546 - 1549
  • [35] Multiple actions of secretin in the human body
    Lam, Ian P. Y.
    Siu, Francis K. Y.
    Chu, Jessica Y. S.
    Chow, Billy K. C.
    INTERNATIONAL REVIEW OF CYTOLOGY: A SURVEY OF CELL BIOLOGY, VOL 265, 2008, 265 : 159 - 190
  • [36] Recognizing Human Activity in Still Images by Integrating Group-Based Contextual Cues
    Zhou, Zheng
    Li, Kan
    He, Xiangjian
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1135 - 1138
  • [37] A Pedestrian Multiple Hypothesis Tracker Fusing Head and Body Detections
    Sherrah, Jamie
    Ristic, Branko
    Kamenetsky, Dmitri
    2013 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES & APPLICATIONS (DICTA), 2013, : 192 - 199
  • [38] Daytime water detection by fusing multiple cues for autonomous off-road navigation
    Rankin, A. L.
    Matthies, L. H.
    Huertas, A.
    TRANSFORMATIONAL SCIENCE AND TECHNOLOGY FOR THE CURRENT AND FUTURE FORCE, 2006, 42 : 177 - +
  • [39] BI-LAYER SEGMENTATION FROM STEREO VIDEO SEQUENCES BY FUSING MULTIPLE CUES
    Wu, Yi
    Wang, Patricia P.
    Li, Jianguo
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1353 - +
  • [40] Locating and recognizing multiple human actions by searching for maximum score subsequences
    Zhang, Hong-Bo
    Li, Shao-Zi
    Chen, Shu-Yuan
    Su, Song-Zhi
    Lin, Xian-Ming
    Cao, Dong-Lin
    SIGNAL IMAGE AND VIDEO PROCESSING, 2015, 9 (03) : 705 - 714