Combining multiple deep cues for action recognition

被引:4
|
作者
Wang, Ruiqi [1 ]
Wu, Xinxiao [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing Lab Intelligent Informat Technol, Beijing 100081, Peoples R China
关键词
Action recognition; Multiple deep cues; l(p)-norm multiple kernel learning;
D O I
10.1007/s11042-018-6509-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel deep learning based framework to fuse multiple cues of action motions, objects and scenes for complex action recognition. Since the deep features achieve promising results, three deep representations are extracted for capturing both temporal and contextual information of actions. Particularly, for the action cue, we first adopt a deep detection model to detect persons frame by frame and then feed the deep representations of persons into a Gated Recurrent Unit model to generate the action features. Different from the existing deep action features, our feature is capable of modeling the global dynamics of long human motion. The scene and object cues are also represented by deep features pooling on all the frames in a video. Moreover, we introduce an l(p)-norm multiple kernel learning method to effectively combine the multiple deep representations of the video to learn robust classifiers of actions by capturing the contextual relationships between action, object and scene. Extensive experiments on two real-world action datasets (i.e., UCF101 and HMDB51) clearly demonstrate the effectiveness of our method.
引用
收藏
页码:9933 / 9950
页数:18
相关论文
共 50 条
  • [31] Combining multiple evidences for gait recognition
    Cuntoor, N
    Kale, A
    Chellappa, R
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 113 - 116
  • [32] Deep learning for depression recognition with audiovisual cues: A review
    He, Lang
    Niu, Mingyue
    Tiwari, Prayag
    Marttinen, Pekka
    Su, Rui
    Jiang, Jiewei
    Guo, Chenguang
    Wang, Hongyu
    Ding, Songtao
    Wang, Zhongmin
    Pan, Xiaoying
    Dang, Wei
    [J]. INFORMATION FUSION, 2022, 80 : 56 - 86
  • [33] Face Attributes as Cues for Deep Face Recognition Understanding
    Diniz, Matheus Alves
    Schwartz, William Robson
    [J]. 2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 307 - 313
  • [34] Combining Video Subsequences for Human Action Recognition
    Onofri, Leonardo
    Soda, Paolo
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 597 - 600
  • [35] Combining 2D and 3D deep models for action recognition with depth information
    Ali Seydi Keçeli
    Aydın Kaya
    Ahmet Burak Can
    [J]. Signal, Image and Video Processing, 2018, 12 : 1197 - 1205
  • [36] Combining 2D and 3D deep models for action recognition with depth information
    Keceli, Ali Seydi
    Kaya, Aydin
    Can, Ahmet Burak
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (06) : 1197 - 1205
  • [37] HAND POSTURE RECOGNITION IN VIDEO USING MULTIPLE CUES
    Sha, Liang
    Wang, Guijin
    Yao, Anbang
    Lin, Xinggang
    Chai, Xiujuan
    [J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 886 - +
  • [38] Online Dynamic Hand Gesture Recognition with Multiple Cues
    Zhao, Ying
    Yan, Jiayong
    [J]. 2015 8TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2015, : 219 - 223
  • [39] Human Activity Recognition: A Scheme Using Multiple Cues
    Sadek, Samy
    Al-Hamadi, Ayoub
    Michaelis, Bernd
    Sayed, Usama
    [J]. ADVANCES IN VISUAL COMPUTING, PT II, 2010, 6454 : 574 - +
  • [40] Simulation and weights of multiple cues for robust object recognition
    Aboutalib, Sarah
    Veloso, Manuela
    [J]. 2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 2415 - 2420