Recognizing actions in images by fusing multiple body structure cues

被引:13
|
作者
Li, Yang [1 ]
Li, Kan [1 ]
Wang, Xinxin [1 ]
机构
[1] Beijing Inst Technol, 5 South Zhongguancun St, Beijing 100081, Peoples R China
基金
北京市自然科学基金;
关键词
Image-based action recognition; Convolutional neural network; Body structure cues;
D O I
10.1016/j.patcog.2020.107341
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Locating and recognizing multiple human actions by searching for maximum score subsequences
    Hong-Bo Zhang
    Shao-Zi Li
    Shu-Yuan Chen
    Song-Zhi Su
    Xian-Ming Lin
    Dong-Lin Cao
    Signal, Image and Video Processing, 2015, 9 : 705 - 714
  • [42] Recognizing Manipulation Actions in Arts and Crafts Shows using Domain-Specific Visual and Textual Cues
    Sapp, Benjamin
    Chaudhry, Rizwan
    Yu, Xiaodong
    Singh, Gautam
    Perera, Ian
    Ferraro, Francis
    Tzoukermann, Evelyne
    Kosecka, Jana
    Neumann, Jan
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
  • [43] Recognizing People by Body Shape Using Deep Networks of Images and Words
    Myers, Blake A.
    Jaggernauth, Lucas
    Metz, Thomas M.
    Hill, Matthew Q.
    Gandi, Veda Nandan
    Castillo, Carlos D.
    O'Toole, Alice J.
    2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB, 2023,
  • [44] A Novel Multiple Cues based Image Fusing Algorithm for High Dynamic Range Image Generation
    Wu, Xiaojun
    Song, Zhan
    Yu, Gang
    Zheng, Feng
    FOURTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2011): MACHINE VISION, IMAGE PROCESSING, AND PATTERN ANALYSIS, 2012, 8349
  • [45] Recognizing Human Actions From Noisy Videos via Multiple Instance Learning
    Sener, Fadime
    Samet, Nermin
    Duygulu, Pinar
    Ikizler-Cinbis, Nazli
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [46] Tracking and recognizing actions of multiple hockey players using the boosted particle filter
    Lu, Wei-Lwun
    Okuma, Kenji
    Little, James J.
    IMAGE AND VISION COMPUTING, 2009, 27 (1-2) : 189 - 205
  • [47] A new pose-based representation for recognizing actions from multiple cameras
    Pehlivan, Selen
    Duygulu, Pinar
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2011, 115 (02) : 140 - 151
  • [48] Visual tracking based on adaptive interacting multiple model particle filter by fusing multiples cues
    Younes Dhassi
    Abdellah Aarab
    Multimedia Tools and Applications, 2018, 77 : 26259 - 26292
  • [49] Visual tracking based on adaptive interacting multiple model particle filter by fusing multiples cues
    Dhassi, Younes
    Aarab, Abdellah
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (20) : 26259 - 26292