SlowFast Multimodality Compensation Fusion Swin Transformer Networks for RGB-D Action Recognition

被引:2
|
作者
Xiao, Xiongjiang [1 ]
Ren, Ziliang [1 ]
Li, Huan [1 ]
Wei, Wenhong [1 ]
Yang, Zhiyong [2 ]
Yang, Huaide [3 ]
机构
[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan 523820, Peoples R China
[2] Yantai Inst Technol, Sch Artificial Intelligence, Yantai 264003, Peoples R China
[3] Dongguan Polytech, Sch Elect Informat, Dongguan 523109, Peoples R China
基金
中国国家自然科学基金;
关键词
action recognition; multimodality compensation; SlowFast pathways; swin transformer; dual-stream; NEURAL-NETWORKS; REPRESENTATION;
D O I
10.3390/math11092115
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
RGB-D-based technology combines the advantages of RGB and depth sequences which can effectively recognize human actions in different environments. However, the spatio-temporal information between different modalities is difficult to effectively learn from each other. To enhance the information exchange between different modalities, we introduce a SlowFast multimodality compensation block (SFMCB) which is designed to extract compensation features. Concretely, the SFMCB fuses features from two independent pathways with different frame rates into a single convolutional neural network to achieve performance gains for the model. Furthermore, we explore two fusion schemes to combine the feature from two independent pathways with different frame rates. To facilitate the learning of features from independent multiple pathways, multiple loss functions are utilized for joint optimization. To evaluate the effectiveness of our proposed architecture, we conducted experiments on four challenging datasets: NTU RGB+D 60, NTU RGB+D 120, THU-READ, and PKU-MMD. Experimental results demonstrate the effectiveness of our proposed model, which utilizes the SFMCB mechanism to capture complementary features of multimodal inputs.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Fusion of Skeleton and RGB Features for RGB-D Human Action Recognition
    Weiyao, Xu
    Muqing, Wu
    Min, Zhao
    Ting, Xia
    [J]. IEEE SENSORS JOURNAL, 2021, 21 (17) : 19157 - 19164
  • [2] Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition
    Cheng, Jun
    Ren, Ziliang
    Zhang, Qieshi
    Gao, Xiangyang
    Hao, Fusheng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1498 - 1509
  • [3] Dual-stream cross-modality fusion transformer for RGB-D action recognition
    Liu, Zhen
    Cheng, Jun
    Liu, Libo
    Ren, Ziliang
    Zhang, Qieshi
    Song, Chengqun
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 255
  • [4] MULTIMODAL FEATURE FUSION MODEL FOR RGB-D ACTION RECOGNITION
    Xu Weiyao
    Wu Muqing
    Zhao Min
    Xia Ting
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [5] Trear: Transformer-Based RGB-D Egocentric Action Recognition
    Li, Xiangyu
    Hou, Yonghong
    Wang, Pichao
    Gao, Zhimin
    Xu, Mingliang
    Li, Wanqing
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (01) : 246 - 252
  • [6] Swin-Fusion: Swin-Transformer with Feature Fusion for Human Action Recognition
    Tiansheng Chen
    Lingfei Mo
    [J]. Neural Processing Letters, 2023, 55 : 11109 - 11130
  • [7] Swin-Fusion: Swin-Transformer with Feature Fusion for Human Action Recognition
    Chen, Tiansheng
    Mo, Lingfei
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (08) : 11109 - 11130
  • [8] Child Action Recognition in RGB and RGB-D Data
    Turarova, Aizada
    Zhanatkyzy, Aida
    Telisheva, Zhansaule
    Sabyrov, Arman
    Sandygulova, Anara
    [J]. HRI'20: COMPANION OF THE 2020 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2020, : 491 - 492
  • [9] Transformer fusion for indoor RGB-D semantic segmentation
    Wu, Zongwei
    Zhou, Zhuyun
    Allibert, Guillaume
    Stolz, Christophe
    Demonceaux, Cédric
    Ma, Chao
    [J]. Computer Vision and Image Understanding, 2024, 249
  • [10] Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition
    Wang, Pichao
    Li, Wanqing
    Wan, Jun
    Ogunbona, Philip
    Liu, Xinwang
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7404 - 7411