Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection

被引:182
|
作者
Song, Sijie [1 ]
Lan, Cuiling [2 ]
Xing, Junliang [4 ]
Zeng, Wenjun [2 ,3 ]
Liu, Jiaying [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100080, Peoples R China
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
[3] Microsoft Res Asia, Senior Leadership Team, Beijing 100080, Peoples R China
[4] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100080, Peoples R China
基金
中国国家自然科学基金;
关键词
Spatio attention; temporal attention; action recognition; action detection; skeleton data; MOTION; MODEL;
D O I
10.1109/TIP.2018.2818328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action analytics has attracted a lot of attention for decades in computer vision. It is important to extract discriminative spatio-temporal features to model the spatial and temporal evolutions of different actions. In this paper, we propose a spatial and temporal attention model to explore the spatial and temporal discriminative features for human action recognition and detection from skeleton data. We build our networks based on the recurrent neural networks with long short-term memory units. The learned model is capable of selectively focusing on discriminative joints of skeletons within each input frame and paying different levels of attention to the outputs of different frames. To ensure effective training of the network for action recognition, we propose a regularized cross-entropy loss to drive the learning process and develop a joint training strategy accordingly. Moreover, based on temporal attention, we develop a method to generate the action temporal proposals for action detection. We evaluate the proposed method on the SBU Kinect Interaction data set, the NTU RGB + D data set, and the PKU-MMD data set, respectively. Experiment results demonstrate the effectiveness of our proposed model on both action recognition and action detection.
引用
收藏
页码:3459 / 3471
页数:13
相关论文
共 50 条
  • [1] Spatio-Temporal Attention Networks for Action Recognition and Detection
    Li, Jun
    Liu, Xianglong
    Zhang, Wenxuan
    Zhang, Mingyuan
    Song, Jingkuan
    Sebe, Nicu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) : 2990 - 3001
  • [2] Attention-based Spatio-Temporal Graphic LSTM for EEG Emotion Recognition
    Li, Xiaoxu
    Zheng, Wenming
    Zong, Yuan
    Chang, Hongli
    Lu, Cheng
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [3] Spatio-temporal deformable 3D ConvNets with attention for action recognition
    Li, Jun
    Liu, Xianglong
    Zhang, Mingyuan
    Wang, Deqing
    [J]. PATTERN RECOGNITION, 2020, 98
  • [4] Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition
    Liu, Jun
    Shahroudy, Amir
    Xu, Dong
    Wang, Gang
    [J]. COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 816 - 833
  • [5] Spatio-temporal attention on manifold space for 3D human action recognition
    Chongyang Ding
    Kai Liu
    Fei Cheng
    Evgeny Belyaev
    [J]. Applied Intelligence, 2021, 51 : 560 - 570
  • [6] Spatio-temporal attention on manifold space for 3D human action recognition
    Ding, Chongyang
    Liu, Kai
    Cheng, Fei
    Belyaev, Evgeny
    [J]. APPLIED INTELLIGENCE, 2021, 51 (01) : 560 - 570
  • [7] Global Spatio-Temporal Attention for Action Recognition Based on 3D Human Skeleton Data
    Han, Yun
    Chung, Sheng-Luen
    Xiao, Qiang
    Lin, Wei You
    Su, Shun-Feng
    [J]. IEEE ACCESS, 2020, 8 : 88604 - 88616
  • [8] Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition
    Hara, Kensho
    Kataoka, Hirokatsu
    Satoh, Yutaka
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 3154 - 3160
  • [9] Spatio-Temporal 3D Action Recognition with Hierarchical Self-Attention Mechanism
    Araei, Soheil
    Nadian-Ghomsheh, Ali
    [J]. 2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC), 2021,
  • [10] Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
    Li, Dong
    Yao, Ting
    Duan, Ling-Yu
    Mei, Tao
    Rui, Yong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 416 - 428