Multi-Scale Adaptive Skeleton Transformer for action

被引:1
|
作者
Wang, Xiaotian [1 ]
Chen, Kai [2 ]
Zhao, Zhifu [1 ]
Shi, Guangming [1 ,3 ]
Xie, Xuemei [1 ]
Jiang, Xiang [2 ]
Yang, Yifan [2 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Xian 710000, Peoples R China
[2] Xidian Univ, Cyberspace Inst Technol, Guangzhou 510300, Peoples R China
[3] Pengcheng Lab, Shenzhen 518000, Peoples R China
关键词
Skeleton-based action recognition; Transformer; Position encoding; Multi-scale representation; NETWORK;
D O I
10.1016/j.cviu.2024.104229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer has demonstrated remarkable performance in various computer vision tasks. However, its potential is not fully explored in skeleton-based action recognition. On one hand, existing methods primarily utilize fixed function or pre-learned matrix to encode position information, while overlooking the sample-specific position information. On the other hand, these approaches focus on single-scale spatial relationships, while neglecting the discriminative fine-grained and coarse-grained spatial features. To address these issues, we propose a Multi-Scale Adaptive Skeleton Transformer (MSAST), including Adaptive Skeleton Position Encoding Module (ASPEM), Multi-Scale Embedding Module (MSEM), and Adaptive Relative Location Module (ARLM). ASPEM decouples spatial-temporal information in the position encoding procedure, which acquires inherent dependencies of skeleton sequences. ASPEM is also designed to be dependent on input tokens, which can learn sample-specific position information. The MSEM employs multi-scale pooling to generate multi-scale tokens that contain multi-grained features. Then, the spatial transformer captures multi-scale relations to address the subtle differences between various actions. Another contribution of this paper is that ARLM is presented to mine suitable location information for better recognition performance. Extensive experiments conducted on three benchmark datasets demonstrate that the proposed model achieves Top-1 accuracy of 94.9%/97.5% on NTU-60 C-Sub/C-View, 88.7%/91.6% on NTU-120 X-Sub/X-Set and 97.4% on NW-UCLA, respectively.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition
    Kong, Jun
    Bian, Yuhang
    Jiang, Min
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 528 - 532
  • [2] Multi-scale skeleton adaptive weighted GCN for skeleton-based human action recognition in IoT
    Xu Weiyao
    Wu Muqing
    Zhu Jie
    Zhao Min
    APPLIED SOFT COMPUTING, 2021, 104
  • [3] Multi-Scale Adaptive Graph Convolution Network for Skeleton-Based Action Recognition
    Hu, Huangshui
    Fang, Yue
    Han, Mei
    Qi, Xingshuo
    IEEE ACCESS, 2024, 12 : 16868 - 16880
  • [4] Gated Multi-Scale Transformer for Temporal Action Localization
    Yang, Jin
    Wei, Ping
    Ren, Ziyang
    Zheng, Nanning
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5705 - 5717
  • [5] Adaptive Multi-Scale Transformer Tracker for Satellite Videos
    Zhang, Xin
    Jiao, Licheng
    Li, Lingling
    Liu, Xu
    Liu, Fang
    Ma, Wenping
    Yang, Shuyuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [6] Multi-Scale Adaptive Aggregate Graph Convolutional Network for Skeleton-Based Action Recognition
    Zheng, Zhiyun
    Wang, Yizhou
    Zhang, Xingjin
    Wang, Junfeng
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [7] Adaptive Multi-Scale Difference Graph Convolution Network for Skeleton-Based Action Recognition
    Wang, Xiaojuan
    Gan, Ziliang
    Jin, Lei
    Xiao, Yabo
    He, Mingshu
    ELECTRONICS, 2023, 12 (13)
  • [8] STDM-transformer: Space-time dual multi-scale transformer network for skeleton-based action recognition
    Zhao, Zhifu
    Chen, Ziwei
    Li, Jianan
    Xie, Xuemei
    Chen, Kai
    Wang, Xiaotian
    Shi, Guangming
    NEUROCOMPUTING, 2024, 563
  • [9] MALT: Multi-scale Action Learning Transformer for Online Action Detection
    Xie, Liping (lpxie@seu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc.
  • [10] Hierarchical adaptive multi-scale hypergraph attention convolution network for skeleton-based action recognition
    Yang, Honghong
    Wang, Sai
    Jiang, Lu
    Su, Yuping
    Zhang, Yumei
    APPLIED SOFT COMPUTING, 2025, 172