Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition

被引:3
|
作者
Wen, Yuhang [1 ]
Tang, Zixuan [1 ]
Pang, Yunsheng [2 ]
Ding, Beichen [1 ]
Liu, Mengyuan [3 ]
机构
[1] Sun Yat Sen Univ, Shenzhen 518107, Peoples R China
[2] Tencent Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
[3] Peking Univ, Shenzhen Grad Sch, Key Lab Machine Percept, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
DATASET;
D O I
10.1109/IROS55552.2023.10342472
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing interactive action plays an important role in human-robot interaction and collaboration. Previous methods use late fusion and co-attention mechanism to capture interactive relations, which have limited learning capability or inefficiency to adapt to more interacting entities. With assumption that priors of each entity are already known, they also lack evaluations on a more general setting addressing the diversity of subjects. To address these problems, we propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously model spatial, temporal, and interactive relations. Specifically, our network contains a tokenizer to partition Interactive Spatiotemporal Tokens (ISTs), which is a unified way to represent motions of multiple diverse entities. By extending the entity dimension, ISTs provide better interactive representations. To jointly learn along three dimensions in ISTs, multi-head self-attention blocks integrated with 3D convolutions are designed to capture inter-token correlations. When modeling correlations, a strict entity ordering is usually irrelevant for recognizing interactive actions. To this end, Entity Rearrangement is proposed to eliminate the orderliness in ISTs for interchangeable entities. Extensive experiments on four datasets verify the effectiveness of ISTA-Net by outperforming state-of-the-art methods. Our code is publicly available at https://github.com/Necolizer/ISTA-Net.
引用
收藏
页码:7886 / 7892
页数:7
相关论文
共 50 条
  • [21] Spatiotemporal Progressive Inward-Outward Aggregation Network for skeleton-based action recognition
    Yin, Xinpeng
    Zhong, Jianqi
    Lian, Deliang
    Cao, Wenming
    PATTERN RECOGNITION, 2024, 150
  • [22] Spatiotemporal decoupling attention transformer for 3D skeleton-based driver action recognition
    Zhuoyan Xu
    Jingke Xu
    Complex & Intelligent Systems, 2025, 11 (4)
  • [23] Hierarchical graph attention network with pseudo-metapath for skeleton-based action recognition
    Wang, Mingdao
    Li, XueMing
    Zhang, Xianlin
    Zhang, Yue
    NEUROCOMPUTING, 2022, 501 : 822 - 833
  • [24] Interactive semantics neural networks for skeleton-based human interaction recognition
    Huang, Junkai
    Zheng, Rui
    Cheng, Youyong
    Hu, Jiaqian
    Hu, Weijun
    Shang, Wenli
    Zhang, Man
    Cao, Zhong
    VISUAL COMPUTER, 2024, 40 (10): : 7147 - 7160
  • [25] Spatial-Temporal gated graph attention network for skeleton-based action recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 929 - 939
  • [26] Self-Adaptive Graph With Nonlocal Attention Network for Skeleton-Based Action Recognition
    Pang, Chen
    Gao, Xingyu
    Chen, Zhenyu
    Lyu, Lei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 13
  • [27] Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    Saba, Tanzila
    Rehman, Amjad
    Bahaj, Saeed Ali
    IEEE ACCESS, 2023, 11 : 21546 - 21553
  • [28] Parallel Attention Interaction Network for Few-Shot Skeleton-based Action Recognition
    Liu, Xingyu
    Zhou, Sanping
    Wang, Le
    Hua, Gang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1379 - 1388
  • [29] SelfGCN: Graph Convolution Network With Self-Attention for Skeleton-Based Action Recognition
    Wu, Zhize
    Sun, Pengpeng
    Chen, Xin
    Tang, Keke
    Xu, Tong
    Zou, Le
    Wang, Xiaofeng
    Tan, Ming
    Cheng, Fan
    Weise, Thomas
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4391 - 4403
  • [30] Fully Attentional Network for Skeleton-Based Action Recognition
    Liu, Caifeng
    Zhou, Hongcheng
    IEEE ACCESS, 2023, 11 : 20478 - 20485