Combining multiple deep cues for action recognition

被引:4
|
作者
Wang, Ruiqi [1 ]
Wu, Xinxiao [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing Lab Intelligent Informat Technol, Beijing 100081, Peoples R China
关键词
Action recognition; Multiple deep cues; l(p)-norm multiple kernel learning;
D O I
10.1007/s11042-018-6509-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel deep learning based framework to fuse multiple cues of action motions, objects and scenes for complex action recognition. Since the deep features achieve promising results, three deep representations are extracted for capturing both temporal and contextual information of actions. Particularly, for the action cue, we first adopt a deep detection model to detect persons frame by frame and then feed the deep representations of persons into a Gated Recurrent Unit model to generate the action features. Different from the existing deep action features, our feature is capable of modeling the global dynamics of long human motion. The scene and object cues are also represented by deep features pooling on all the frames in a video. Moreover, we introduce an l(p)-norm multiple kernel learning method to effectively combine the multiple deep representations of the video to learn robust classifiers of actions by capturing the contextual relationships between action, object and scene. Extensive experiments on two real-world action datasets (i.e., UCF101 and HMDB51) clearly demonstrate the effectiveness of our method.
引用
收藏
页码:9933 / 9950
页数:18
相关论文
共 50 条
  • [1] Combining multiple deep cues for action recognition
    Ruiqi Wang
    Xinxiao Wu
    [J]. Multimedia Tools and Applications, 2019, 78 : 9933 - 9950
  • [2] Combining Multiple Sources of Knowledge in Deep CNNs for Action Recognition
    Park, Eunbyung
    Han, Xufeng
    Berg, Tamara L.
    Berg, Alexander C.
    [J]. 2016 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2016), 2016,
  • [3] A Deep Model Combining Structural Features and Context Cues for Action Recognition in Static Images
    Wang, Xinxin
    Li, Kan
    Li, Yang
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT VI, 2017, 10639 : 622 - 632
  • [4] Deep Fusion of Multiple Semantic Cues for Complex Event Recognition
    Zhang, Xishan
    Zhang, Hanwang
    Zhang, Yongdong
    Yang, Yang
    Wang, Meng
    Luan, Huanbo
    Li, Jintao
    Chua, Tat-Seng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (03) : 1033 - 1046
  • [5] Deep multiple aggregation networks for action recognition
    Ahmed Mazari
    Hichem Sahbi
    [J]. International Journal of Multimedia Information Retrieval, 2024, 13
  • [6] Deep multiple aggregation networks for action recognition
    Mazari, Ahmed
    Sahbi, Hichem
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (01)
  • [7] Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
    Li, Tao
    Xiong, Wenjun
    Zhang, Zheng
    Pei, Lishen
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023,
  • [8] Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
    Li, Tao
    Xiong, Wenjun
    Zhang, Zheng
    Pei, Lishen
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023,
  • [9] Object recognition with multiple cues
    Reno, AL
    Booth, DM
    [J]. MELECON 2000: INFORMATION TECHNOLOGY AND ELECTROTECHNOLOGY FOR THE MEDITERRANEAN COUNTRIES, VOLS 1-3, PROCEEDINGS, 2000, : 538 - 541
  • [10] Object, Scene and Actions: Combining Multiple Features for Human Action Recognition
    Ikizler-Cinbis, Nazli
    Sclaroff, Stan
    [J]. COMPUTER VISION-ECCV 2010, PT I, 2010, 6311 : 494 - 507