A Key Volume Mining Deep Framework for Action Recognition

被引:161
|
作者
Zhu, Wangjiang [1 ]
Hu, Jie [2 ]
Sun, Gang [2 ]
Cao, Xudong [2 ]
Qiao, Yu [3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] SenseTime Grp Ltd, Hong Kong, Hong Kong, Peoples R China
[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
关键词
D O I
10.1109/CVPR.2016.219
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, deep learning approaches have demonstrated remarkable progresses for action recognition in videos. Most existing deep frameworks equally treat every volume i.e. spatial-temporal video clip, and directly assign a video label to all volumes sampled from it. However, within a video, discriminative actions may occur sparsely in a few key volumes, and most other volumes are irrelevant to the labeled action category. Training with a large proportion of irrelevant volumes will hurt performance. To address this issue, we propose a key volume mining deep framework to identify key volumes and conduct classification simultaneously. Specifically, our framework is trained is optimized in an alternative way integrated to the forward and backward stages of Stochastic Gradient Descent (SGD). In the forward pass, our network mines key volumes for each action class. In the backward pass, it updates network parameters with the help of these mined key volumes. In addition, we propose "Stochastic out" to model key volumes from multi-modalities, and an effective yet simple "unsupervised key volume proposal" method for high quality volume sampling. Our experiments show that action recognition performance can be significantly improved by mining key volumes, and we achieve state-of-the-art per-formance on HMDB51 and UCF101 (93.1%).
引用
收藏
页码:1991 / 1999
页数:9
相关论文
共 50 条
  • [1] DEEP KEY CLIPS-VIDEO FEATURE FUSION FRAMEWORK FOR ACTION RECOGNITION
    Li, Chao
    Ming, Yue
    Shen, Yuan
    Yu, Hui
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 156 - 161
  • [2] Moving Foreground-Aware Visual Attention and Key Volume Mining for Human Action Recognition
    Zhang, Junxuan
    Hu, Haifeng
    Lu, Xinlong
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (03)
  • [3] Action Recognition with Skeletal Volume and Deep Learning
    Keceli, Ali Seydi
    Kaya, Aydin
    Can, Ahmct Burak
    [J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [4] A deep unified framework for suspicious action recognition
    Amine Ilidrissi
    Joo Kooi Tan
    [J]. Artificial Life and Robotics, 2019, 24 : 219 - 224
  • [5] A deep unified framework for suspicious action recognition
    Ilidrissi, Amine
    Tan, Joo Kooi
    [J]. ARTIFICIAL LIFE AND ROBOTICS, 2019, 24 (02) : 219 - 224
  • [6] Deep action: A mobile action recognition framework using edge offloading
    Deyu Zhang
    Heguo Zhang
    Sijing Duan
    Yunzhen Luo
    Fucheng Jia
    Feng Liu
    [J]. Peer-to-Peer Networking and Applications, 2022, 15 : 324 - 339
  • [7] Deep action: A mobile action recognition framework using edge offloading
    Zhang, Deyu
    Zhang, Heguo
    Duan, Sijing
    Luo, Yunzhen
    Jia, Fucheng
    Liu, Feng
    [J]. PEER-TO-PEER NETWORKING AND APPLICATIONS, 2022, 15 (01) : 324 - 339
  • [8] A weighting scheme for mining key skeletal joints for human action recognition
    Elham Shabaninia
    Ahmad Reza Naghsh-Nilchi
    Shohreh Kasaei
    [J]. Multimedia Tools and Applications, 2019, 78 : 31319 - 31345
  • [9] A weighting scheme for mining key skeletal joints for human action recognition
    Shabaninia, Elham
    Naghsh-Nilchi, Ahmad Reza
    Kasaei, Shohreh
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 31319 - 31345
  • [10] Mining 3D Key-Pose-Motifs for Action Recognition
    Wang, Chunyu
    Wang, Yizhou
    Yuille, Alan L.
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2639 - 2647