Multi-Scale Adaptive Skeleton Transformer for action

被引:1
|
作者
Wang, Xiaotian [1 ]
Chen, Kai [2 ]
Zhao, Zhifu [1 ]
Shi, Guangming [1 ,3 ]
Xie, Xuemei [1 ]
Jiang, Xiang [2 ]
Yang, Yifan [2 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Xian 710000, Peoples R China
[2] Xidian Univ, Cyberspace Inst Technol, Guangzhou 510300, Peoples R China
[3] Pengcheng Lab, Shenzhen 518000, Peoples R China
关键词
Skeleton-based action recognition; Transformer; Position encoding; Multi-scale representation; NETWORK;
D O I
10.1016/j.cviu.2024.104229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer has demonstrated remarkable performance in various computer vision tasks. However, its potential is not fully explored in skeleton-based action recognition. On one hand, existing methods primarily utilize fixed function or pre-learned matrix to encode position information, while overlooking the sample-specific position information. On the other hand, these approaches focus on single-scale spatial relationships, while neglecting the discriminative fine-grained and coarse-grained spatial features. To address these issues, we propose a Multi-Scale Adaptive Skeleton Transformer (MSAST), including Adaptive Skeleton Position Encoding Module (ASPEM), Multi-Scale Embedding Module (MSEM), and Adaptive Relative Location Module (ARLM). ASPEM decouples spatial-temporal information in the position encoding procedure, which acquires inherent dependencies of skeleton sequences. ASPEM is also designed to be dependent on input tokens, which can learn sample-specific position information. The MSEM employs multi-scale pooling to generate multi-scale tokens that contain multi-grained features. Then, the spatial transformer captures multi-scale relations to address the subtle differences between various actions. Another contribution of this paper is that ARLM is presented to mine suitable location information for better recognition performance. Extensive experiments conducted on three benchmark datasets demonstrate that the proposed model achieves Top-1 accuracy of 94.9%/97.5% on NTU-60 C-Sub/C-View, 88.7%/91.6% on NTU-120 X-Sub/X-Set and 97.4% on NW-UCLA, respectively.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Channel attention and multi-scale graph neural networks for skeleton-based action recognition
    Dang, Ronghao
    Liu, Chengju
    Liu, Ming
    Chen, Qijun
    AI COMMUNICATIONS, 2022, 35 (03) : 187 - 205
  • [32] Skeleton-weighted and multi-scale temporal-driven network for video action recognition
    Xu, Ziqi
    Zhang, Jie
    Zhang, Peng
    Ding, Pengfei
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)
  • [33] Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
    Chen, Zhan
    Li, Sicheng
    Yang, Bing
    Li, Qinghan
    LiU, Hong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1113 - 1122
  • [34] Multi-Scale Bidirectional FCN for Object Skeleton Extraction
    Yang, Fan
    Li, Xin
    Cheng, Hong
    Guo, Yuxiao
    Chen, Leiting
    Li, Jianping
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7461 - 7468
  • [35] Multi-scale Adaptive Dehazing Network
    Chen, Shuxin
    Chen, Yizi
    Qu, Yanyun
    Huang, Jingying
    Hong, Ming
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 2051 - 2059
  • [36] Transformer guided self-adaptive network for multi-scale skin lesion image segmentation
    Xin, Chao
    Liu, Zhifang
    Ma, Yizhao
    Wang, Dianchen
    Zhang, Jing
    Li, Lingzhi
    Zhou, Qiongyan
    Xu, Suling
    Zhang, Yingying
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 169
  • [37] AMSformer: A Transformer for Grain Storage Temperature Prediction Using Adaptive Multi-Scale Feature Fusion
    Zhang, Qinghui
    Zhang, Weixiang
    Huang, Quanzhen
    Wan, Chenxia
    Li, Zhihui
    AGRICULTURE-BASEL, 2025, 15 (01):
  • [38] SKELETON BASED ACTION RECOGNITION USING TRANSLATION-SCALE INVARIANT IMAGE MAPPING AND MULTI-SCALE DEEP CNN
    Li, Bo
    Dai, Yuchao
    Cheng, Xuelian
    Chen, Huahui
    Lin, Yi
    He, Mingyi
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2017,
  • [39] Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition
    Cheng, Qin
    Cheng, Jun
    Ren, Ziliang
    Zhang, Qieshi
    Liu, Jianming
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1303 - 1315
  • [40] Multi-scale motion contrastive learning for self-supervised skeleton-based action recognition
    Wu, Yushan
    Xu, Zengmin
    Yuan, Mengwei
    Tang, Tianchi
    Meng, Ruxing
    Wang, Zhongyuan
    MULTIMEDIA SYSTEMS, 2024, 30 (05)