Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition

被引:3
|
作者
Wen, Yuhang [1 ]
Tang, Zixuan [1 ]
Pang, Yunsheng [2 ]
Ding, Beichen [1 ]
Liu, Mengyuan [3 ]
机构
[1] Sun Yat Sen Univ, Shenzhen 518107, Peoples R China
[2] Tencent Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
[3] Peking Univ, Shenzhen Grad Sch, Key Lab Machine Percept, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
DATASET;
D O I
10.1109/IROS55552.2023.10342472
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing interactive action plays an important role in human-robot interaction and collaboration. Previous methods use late fusion and co-attention mechanism to capture interactive relations, which have limited learning capability or inefficiency to adapt to more interacting entities. With assumption that priors of each entity are already known, they also lack evaluations on a more general setting addressing the diversity of subjects. To address these problems, we propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously model spatial, temporal, and interactive relations. Specifically, our network contains a tokenizer to partition Interactive Spatiotemporal Tokens (ISTs), which is a unified way to represent motions of multiple diverse entities. By extending the entity dimension, ISTs provide better interactive representations. To jointly learn along three dimensions in ISTs, multi-head self-attention blocks integrated with 3D convolutions are designed to capture inter-token correlations. When modeling correlations, a strict entity ordering is usually irrelevant for recognizing interactive actions. To this end, Entity Rearrangement is proposed to eliminate the orderliness in ISTs for interchangeable entities. Extensive experiments on four datasets verify the effectiveness of ISTA-Net by outperforming state-of-the-art methods. Our code is publicly available at https://github.com/Necolizer/ISTA-Net.
引用
收藏
页码:7886 / 7892
页数:7
相关论文
共 50 条
  • [1] Skeleton-Based Mutual Action Recognition Using Interactive Skeleton Graph and Joint Attention
    Jia, Xiangze
    Zhang, Ji
    Wang, Zhen
    Luo, Yonglong
    Chen, Fulong
    Yang, Gaoming
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT II, 2022, 13427 : 110 - 116
  • [2] A Spatiotemporal Fusion Network For Skeleton-Based Action Recognition
    Bao, Wenxia
    Wang, Junyi
    Yang, Xianjun
    Chen, Hemu
    2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, : 347 - 352
  • [3] Improving skeleton-based action recognition with interactive object information
    Hao Wen
    Ziqian Lu
    Fengli Shen
    Zhe-Ming Lu
    Jialin Cui
    International Journal of Multimedia Information Retrieval, 2025, 14 (1)
  • [4] Interactive two-stream graph neural network for skeleton-based action recognition
    Yang, Dun
    Zhou, Qing
    Wen, Ju
    JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (03)
  • [5] Sequence Segmentation Attention Network for Skeleton-Based Action Recognition
    Zhang, Yujie
    Cai, Haibin
    ELECTRONICS, 2023, 12 (07)
  • [6] SpatioTemporal focus for skeleton-based action recognition
    Wu, Liyu
    Zhang, Can
    Zou, Yuexian
    PATTERN RECOGNITION, 2023, 136
  • [7] Spatiotemporal Graph Autoencoder Network for Skeleton-Based Human Action Recognition
    Abduljalil, Hosam
    Elhayek, Ahmed
    Marish Ali, Abdullah
    Alsolami, Fawaz
    AI, 2024, 5 (03) : 1695 - 1708
  • [8] An efficient self-attention network for skeleton-based action recognition
    Xiaofei Qin
    Rui Cai
    Jiabin Yu
    Changxiang He
    Xuedian Zhang
    Scientific Reports, 12 (1)
  • [9] An efficient self-attention network for skeleton-based action recognition
    Qin, Xiaofei
    Cai, Rui
    Yu, Jiabin
    He, Changxiang
    Zhang, Xuedian
    SCIENTIFIC REPORTS, 2022, 12 (01):
  • [10] Self-Attention Network for Skeleton-based Human Action Recognition
    Cho, Sangwoo
    Maqbool, Muhammad Hasan
    Liu, Fei
    Foroosh, Hassan
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 624 - 633