Two-stream spatial-temporal neural networks for pose-based action recognition

被引:2
|
作者
Wang, Zixuan [1 ]
Zhu, Aichun [1 ,2 ]
Hu, Fangqiang [1 ]
Wu, Qianyu [1 ]
Li, Yifeng [1 ]
机构
[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou, Jiangsu, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
action recognition; pose estimation; convolutional neural network; long short-term memory;
D O I
10.1117/1.JEI.29.4.043025
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With recent advances in human pose estimation and human skeleton capture systems, pose-based action recognition has drawn lots of attention among researchers. Although most existing action recognition methods are based on convolutional neural network and long short-term memory, which present outstanding performance, one of the shortcomings of these methods is that they lack the ability to explicitly exploit the rich spatial-temporal information between the skeletons in the behavior, so they are not conducive to improving the accuracy of action recognition. To better address this issue, the two-stream spatial-temporal neural networks for pose-based action recognition is introduced. First, the pose features that are extracted from the raw video are processed by an action modeling module. Then, the temporal information and the spatial information, in the form of relative speed and relative distance, are fed into the temporal neural network and the spatial neural network, respectively. Afterward, the outputs of two-stream networks are fused for better action recognition. Finally, we perform comprehensive experiments on the SUB-JHMDB, SYSU, MPII-Cooking, and NTU RGB+D datasets, the results of which demonstrate the effectiveness of the proposed model. (C) 2020 SPIE and IS&T
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Two-stream spatiotemporal networks for skeleton action recognition
    Wang, Lei
    Zhang, Jianwei
    Yang, Shanmin
    Gu, Song
    IET IMAGE PROCESSING, 2023, 17 (11) : 3358 - 3370
  • [22] Human Activities Recognition Based on Two-stream NonLocal Spatial Temporal Residual Convolution Neural Network
    Qian H.
    Chen S.
    Huangfu X.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (03): : 1100 - 1108
  • [23] Two-Stream Adaptive Weight Convolutional Neural Network Based on Spatial Attention for Human Action Recognition
    Chen, Guanzhou
    Yao, Lu
    Xu, Jingting
    Liu, Qianxi
    Chen, Shengyong
    INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT IV, 2022, 13458 : 319 - 330
  • [24] Direction-guided two-stream convolutional neural networks for skeleton-based action recognition
    Benyue Su
    Peng Zhang
    Manzhen Sun
    Min Sheng
    Soft Computing, 2023, 27 : 11833 - 11842
  • [25] Direction-guided two-stream convolutional neural networks for skeleton-based action recognition
    Su, Benyue
    Zhang, Peng
    Sun, Manzhen
    Sheng, Min
    SOFT COMPUTING, 2023, 27 (16) : 11833 - 11842
  • [26] Human Action Recognition based on Two-Stream Ind Recurrent Neural Network
    Ge Penghua
    Zhi Min
    TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
  • [27] Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification
    Peng, Yuxin
    Zhao, Yunzhen
    Zhang, Junchao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (03) : 773 - 786
  • [28] Two-Stream Convolutional Neural Network for Video Action Recognition
    Qiao, Han
    Liu, Shuang
    Xu, Qingzhen
    Liu, Shouqiang
    Yang, Wanggan
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (10): : 3668 - 3684
  • [29] Human action recognition using two-stream attention based LSTM networks
    Dai, Cheng
    Liu, Xingang
    Lai, Jinfeng
    APPLIED SOFT COMPUTING, 2020, 86
  • [30] Human Action Recognition by Fusion of Convolutional Neural Networks and spatial-temporal Information
    Li, Weisheng
    Ding, Yahui
    8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 255 - 259