Multi-scale spatial–temporal convolutional neural network for skeleton-based action recognition

被引:0
|
作者
Qin Cheng
Jun Cheng
Ziliang Ren
Qieshi Zhang
Jianming Liu
机构
[1] Guilin University of Electronic Technology,School of Electronic Engineering and Automation
[2] Chinese Academy of Sciences,CAS Key Laboratory of Human
[3] The Chinese University of Hong Kong,Machine Intelligence
[4] Guilin University of Electronic Technology,Synergy Systems, Shenzhen Institute of Advanced Technology
[5] Dongguan University of Technology,School of Computer Science and Information Security
关键词
Computer vision; Deep learning; Action recognition; Skeleton-based; Multi-scale convolution; Less computation;
D O I
暂无
中图分类号
学科分类号
摘要
The skeleton data convey significant information for action recognition since they can robustly against cluttered backgrounds and illumination variation. In recent years, due to the limited ability to extract spatial–temporal features from skeleton data, the methods based on convolutional neural network (CNN) or recurrent neural network are inferior in recognition accuracy. A series of methods based on graph convolutional networks (GCN) have achieved remarkable performance and gradually become dominant. However, the computational cost of GCN-based methods is quite heavy, several works even over 100 GFLOPs. This is contrary to the highly condensed attributes of skeleton data. In this paper, a novel multi-scale spatial–temporal convolutional (MSST) module is proposed to take the implicit complementary advantages across spatial–temporal representations with different scales. Instead of converting skeleton data into pseudo-images like some previous CNN-based methods or using complex graph convolution, we take full use of multi-scale convolutions on temporal and spatial dimensions to capture comprehensive dependencies of skeleton joints. Unifying the MSST module, a multi-scale spatial–temporal convolutional neural network (MSSTNet) is proposed to capture high-level spatial–temporal semantic features for action recognition. Unlike previous methods which boost performance at the cost of computation, MSSTNet can be easily implemented with light model size and fast inference. Moreover, MSSTNet is used in a four-stream framework to fuse data of different modalities, providing notable improvement to recognition accuracy. On NTU RGB+D 60, NTU RGB+D 120, UAV-Human and Northwestern-UCLA datasets, the proposed MSSTNet achieves competitive performance with much less computational cost than state-of-the-art methods.
引用
收藏
页码:1303 / 1315
页数:12
相关论文
共 50 条
  • [1] Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition
    Cheng, Qin
    Cheng, Jun
    Ren, Ziliang
    Zhang, Qieshi
    Liu, Jianming
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1303 - 1315
  • [2] Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
    Chen, Zhan
    Li, Sicheng
    Yang, Bing
    Li, Qinghan
    LiU, Hong
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1113 - 1122
  • [3] Multi-Scale Spatial Temporal Graph Neural Network for Skeleton-Based Action Recognition
    Feng, Dong
    Wu, ZhongCheng
    Zhang, Jun
    Ren, TingTing
    [J]. IEEE ACCESS, 2021, 9 : 58256 - 58265
  • [4] Multi-scale skeleton simplification graph convolutional network for skeleton-based action recognition
    Fan, Zhang
    Ding, Chongyang
    Kai, Liu
    Liu, Hongjin
    [J]. IET COMPUTER VISION, 2024,
  • [5] Multi-Scale Structural Graph Convolutional Network for Skeleton-Based Action Recognition
    Jang, Sungjun
    Lee, Heansung
    Kim, Woo Jin
    Lee, Jungho
    Woo, Sungmin
    Lee, Sangyoun
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (08) : 7244 - 7258
  • [6] Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition
    Shu, Yang
    Li, Wanggen
    Li, Doudou
    Gao, Kun
    Jie, Biao
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 16 - 28
  • [7] Multi-Scale Adaptive Aggregate Graph Convolutional Network for Skeleton-Based Action Recognition
    Zheng, Zhiyun
    Wang, Yizhou
    Zhang, Xingjin
    Wang, Junfeng
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [8] MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition
    Kong, Jun
    Bian, Yuhang
    Jiang, Min
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 528 - 532
  • [9] Lighter and faster: A multi-scale adaptive graph convolutional network for skeleton-based action recognition
    Jiang, Yuanjian
    Deng, Hongmin
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
  • [10] Spatial Graph Convolutional and Temporal Involution Network for Skeleton-based Action Recognition
    Wan, Huifan
    Pan, Guanghui
    Chen, Yu
    Ding, Danni
    Zou, Maoyang
    [J]. PROCEEDINGS OF ACM TURING AWARD CELEBRATION CONFERENCE, ACM TURC 2021, 2021, : 204 - 209