Multi-Scale Adaptive Skeleton Transformer for action

被引:1
|
作者
Wang, Xiaotian [1 ]
Chen, Kai [2 ]
Zhao, Zhifu [1 ]
Shi, Guangming [1 ,3 ]
Xie, Xuemei [1 ]
Jiang, Xiang [2 ]
Yang, Yifan [2 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Xian 710000, Peoples R China
[2] Xidian Univ, Cyberspace Inst Technol, Guangzhou 510300, Peoples R China
[3] Pengcheng Lab, Shenzhen 518000, Peoples R China
关键词
Skeleton-based action recognition; Transformer; Position encoding; Multi-scale representation; NETWORK;
D O I
10.1016/j.cviu.2024.104229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer has demonstrated remarkable performance in various computer vision tasks. However, its potential is not fully explored in skeleton-based action recognition. On one hand, existing methods primarily utilize fixed function or pre-learned matrix to encode position information, while overlooking the sample-specific position information. On the other hand, these approaches focus on single-scale spatial relationships, while neglecting the discriminative fine-grained and coarse-grained spatial features. To address these issues, we propose a Multi-Scale Adaptive Skeleton Transformer (MSAST), including Adaptive Skeleton Position Encoding Module (ASPEM), Multi-Scale Embedding Module (MSEM), and Adaptive Relative Location Module (ARLM). ASPEM decouples spatial-temporal information in the position encoding procedure, which acquires inherent dependencies of skeleton sequences. ASPEM is also designed to be dependent on input tokens, which can learn sample-specific position information. The MSEM employs multi-scale pooling to generate multi-scale tokens that contain multi-grained features. Then, the spatial transformer captures multi-scale relations to address the subtle differences between various actions. Another contribution of this paper is that ARLM is presented to mine suitable location information for better recognition performance. Extensive experiments conducted on three benchmark datasets demonstrate that the proposed model achieves Top-1 accuracy of 94.9%/97.5% on NTU-60 C-Sub/C-View, 88.7%/91.6% on NTU-120 X-Sub/X-Set and 97.4% on NW-UCLA, respectively.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Multi-scale spatiotemporal topology unveiled: enhancing skeleton-based action recognition
    Chen, Hongwei
    Wang, Jianpeng
    Chen, Zexi
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [22] Multi-Scale Structural Graph Convolutional Network for Skeleton-Based Action Recognition
    Jang, Sungjun
    Lee, Heansung
    Kim, Woo Jin
    Lee, Jungho
    Woo, Sungmin
    Lee, Sangyoun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7244 - 7258
  • [23] MSA-GCN: Exploiting Multi-Scale Temporal Dynamics With Adaptive Graph Convolution for Skeleton-Based Action Recognition
    Alowonou, Kowovi Comivi
    Han, Ji-Hyeong
    IEEE ACCESS, 2024, 12 : 193552 - 193563
  • [24] MUSIQ: Multi-scale Image Quality Transformer
    Ke, Junjie
    Wang, Qifei
    Wang, Yilin
    Milanfar, Peyman
    Yang, Feng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5128 - 5137
  • [25] Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition
    Shu, Yang
    Li, Wanggen
    Li, Doudou
    Gao, Kun
    Jie, Biao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 16 - 28
  • [26] Multi-scale spatial–temporal convolutional neural network for skeleton-based action recognition
    Qin Cheng
    Jun Cheng
    Ziliang Ren
    Qieshi Zhang
    Jianming Liu
    Pattern Analysis and Applications, 2023, 26 (3) : 1303 - 1315
  • [27] Multi-scale sampling attention graph convolutional networks for skeleton-based action recognition
    Tian, Haoyu
    Zhang, Yipeng
    Wu, Hanbo
    Ma, Xin
    Li, Yibin
    NEUROCOMPUTING, 2024, 597
  • [28] Structure-Aware Multi-scale Hierarchical Graph Convolutional Network for Skeleton Action Recognition
    He, Changxiang
    Liu, Shuting
    Zhao, Ying
    Qin, Xiaofei
    Zeng, Jiayuan
    Zhang, Xuedian
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT III, 2021, 12893 : 293 - 304
  • [29] Multi-Scale Mixed Dense Graph Convolution Network for Skeleton-Based Action Recognition
    Xia, Hailun
    Gao, Xinkai
    IEEE ACCESS, 2021, 9 (09): : 36475 - 36484
  • [30] Multi-Scale Spatial Temporal Graph Neural Network for Skeleton-Based Action Recognition
    Feng, Dong
    Wu, ZhongCheng
    Zhang, Jun
    Ren, TingTing
    IEEE ACCESS, 2021, 9 : 58256 - 58265