MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition

被引:29
|
作者
Kong, Jun [1 ]
Bian, Yuhang [2 ]
Jiang, Min [2 ]
机构
[1] Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214122, Jiangsu, Peoples R China
[2] Jiangnan Univ, Jiangsu Prov Engn Lab Pattern Recognit & Computat, Wuxi 214122, Jiangsu, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Feature extraction; Transformers; Kernel; Skeleton; Data mining; Task analysis; Convolution; Skeleton-based action recognition; transformer; lateral connection; multi-scale temporal embedding;
D O I
10.1109/LSP.2022.3142675
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the task of skeleton-based action recognition, long-term temporal dependencies are significant cues for sequential skeleton data. State-of-the-art methods rarely have access to long-term temporal information, due to the limitations of their receptive fields. Meanwhile, most of the recent multiple branches methods only consider different input modalities but ignore the information in various temporal scales. To address the above issues, we propose a multi-scale temporal transformer (MTT) in this letter, for skeleton-based action recognition. Firstly, the raw skeleton data are embedded by graph convolutional network (GCN) blocks and multi-scale temporal embedding modules (MT-EMs), which are designed as multiple branches to extract features in various temporal scales. Secondly, we introduce transformer encoders (TE) to integrate embeddings and model the long-term temporal pattern. Moreover, we propose a task-oriented lateral connection (LaC) aiming to align semantical hierarchies. LaC distributes input embeddings to the downstream transformer encoders (TE), according to semantical levels. The classification headers aggregate results from TE and predict the action categories at last. The proposed method is shown efficiency and universality during experiments and achieves the state-of-the-art on three large datasets, NTU-RGBD 60, NTU-RGBD 120 and Kinetics-Skeleton 400.
引用
收藏
页码:528 / 532
页数:5
相关论文
共 50 条
  • [1] Multi-scale spatial–temporal convolutional neural network for skeleton-based action recognition
    Qin Cheng
    Jun Cheng
    Ziliang Ren
    Qieshi Zhang
    Jianming Liu
    [J]. Pattern Analysis and Applications, 2023, 26 (3) : 1303 - 1315
  • [2] Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
    Chen, Zhan
    Li, Sicheng
    Yang, Bing
    Li, Qinghan
    LiU, Hong
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1113 - 1122
  • [3] Multi-Scale Spatial Temporal Graph Neural Network for Skeleton-Based Action Recognition
    Feng, Dong
    Wu, ZhongCheng
    Zhang, Jun
    Ren, TingTing
    [J]. IEEE ACCESS, 2021, 9 : 58256 - 58265
  • [4] Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition
    Cheng, Qin
    Cheng, Jun
    Ren, Ziliang
    Zhang, Qieshi
    Liu, Jianming
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1303 - 1315
  • [5] Multi-scale skeleton simplification graph convolutional network for skeleton-based action recognition
    Fan, Zhang
    Ding, Chongyang
    Kai, Liu
    Liu, Hongjin
    [J]. IET COMPUTER VISION, 2024,
  • [6] STDM-transformer: Space-time dual multi-scale transformer network for skeleton-based action recognition
    Zhao, Zhifu
    Chen, Ziwei
    Li, Jianan
    Xie, Xuemei
    Chen, Kai
    Wang, Xiaotian
    Shi, Guangming
    [J]. NEUROCOMPUTING, 2024, 563
  • [7] SPATIO-TEMPORAL MULTI-SCALE SOFT QUANTIZATION LEARNING FOR SKELETON-BASED HUMAN ACTION RECOGNITION
    Yang, Jianyu
    Zhu, Chen
    Yuan, Junsong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1078 - 1083
  • [8] Multi-scale spatio-temporal network for skeleton-based gait recognition
    He, Dongzhi
    Xue, Yongle
    Li, Yunyu
    Sun, Zhijie
    Xiao, Xingmei
    Wang, Jin
    [J]. AI COMMUNICATIONS, 2023, 36 (04) : 297 - 310
  • [9] Multi-Scale Adaptive Graph Convolution Network for Skeleton-Based Action Recognition
    Hu, Huangshui
    Fang, Yue
    Han, Mei
    Qi, Xingshuo
    [J]. IEEE ACCESS, 2024, 12 : 16868 - 16880
  • [10] Multi-Scale Structural Graph Convolutional Network for Skeleton-Based Action Recognition
    Jang, Sungjun
    Lee, Heansung
    Kim, Woo Jin
    Lee, Jungho
    Woo, Sungmin
    Lee, Sangyoun
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (08) : 7244 - 7258