A Key Volume Mining Deep Framework for Action Recognition

被引:161
|
作者
Zhu, Wangjiang [1 ]
Hu, Jie [2 ]
Sun, Gang [2 ]
Cao, Xudong [2 ]
Qiao, Yu [3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] SenseTime Grp Ltd, Hong Kong, Hong Kong, Peoples R China
[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
关键词
D O I
10.1109/CVPR.2016.219
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, deep learning approaches have demonstrated remarkable progresses for action recognition in videos. Most existing deep frameworks equally treat every volume i.e. spatial-temporal video clip, and directly assign a video label to all volumes sampled from it. However, within a video, discriminative actions may occur sparsely in a few key volumes, and most other volumes are irrelevant to the labeled action category. Training with a large proportion of irrelevant volumes will hurt performance. To address this issue, we propose a key volume mining deep framework to identify key volumes and conduct classification simultaneously. Specifically, our framework is trained is optimized in an alternative way integrated to the forward and backward stages of Stochastic Gradient Descent (SGD). In the forward pass, our network mines key volumes for each action class. In the backward pass, it updates network parameters with the help of these mined key volumes. In addition, we propose "Stochastic out" to model key volumes from multi-modalities, and an effective yet simple "unsupervised key volume proposal" method for high quality volume sampling. Our experiments show that action recognition performance can be significantly improved by mining key volumes, and we achieve state-of-the-art per-formance on HMDB51 and UCF101 (93.1%).
引用
收藏
页码:1991 / 1999
页数:9
相关论文
共 50 条
  • [41] Learning Discriminative Key Poses for Action Recognition
    Liu, Li
    Shao, Ling
    Zhen, Xiantong
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (06) : 1860 - 1870
  • [42] Key-Skeleton-Pattern Mining on 3D Skeletons Represented by Lie Group for Action Recognition
    Li, Guang
    Liu, Kai
    Ding, Wenwen
    Cheng, Fei
    Chen, Boyang
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [43] Temporal Key Poses for Human Action Recognition
    Eweiwi, Abdalrahman
    Cheema, Shahzad
    Thurau, Christian
    Bauckhage, Christian
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
  • [44] Boost action recognition through computed volume
    [J]. Wang, L. (wang.li.njfu@gmail.com), 1871, Universitas Ahmad Dahlan (11):
  • [45] Mechanical model of control of key strata in deep mining
    Pu HaiabZhang Jianab a State Key Laboratory of Geomechanics and Deep Underground EngineeringXuzhou China b School of Mechanics and Civil EngineeringChina University of Mining TechnologyXuzhou China
    [J]. Mining Science and Technology., 2011, 21 (02) - 272
  • [47] Mining Motion Atoms and Phrases for Complex Action Recognition
    Wang, LiMin
    Qiao, Yu
    Tang, Xiaoou
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2680 - 2687
  • [48] Mining Spatial Temporal Saliency Structure for Action Recognition
    Liu, Yinan
    Wu, Qingbo
    Xu, Linfeng
    Wu, Bo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2643 - 2646
  • [49] Mining Actionlet Ensemble for Action Recognition with Depth Cameras
    Wang, Jiang
    Liu, Zicheng
    Wu, Ying
    Yuan, Junsong
    [J]. 2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 1290 - 1297
  • [50] Mining human movement evolution for complex action recognition
    Yi, Yang
    Cheng, Yang
    Xu, Chuping
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 78 : 259 - 272