Dual-stream cross-modality fusion transformer for RGB-D action recognition

被引：22

作者：

Liu, Zhen ^{[1
,2
]}

Cheng, Jun ^{[1
]}

Liu, Libo ^{[1
,2
]}

Ren, Ziliang ^{[1
,3
]}

Zhang, Qieshi ^{[1
]}

Song, Chengqun ^{[1
]}

机构：

[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Prov Key Lab Robot & Intelligent Syst, Shenzhen 518055, Peoples R China

[2] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[3] Dongguan Univ Technol, Sch Sci & Technol, Dongguan 523808, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2022年 / 255卷

基金：

中国国家自然科学基金;

关键词：

Action recognition; Multimodal fusion; Transformer; ConvNets; NEURAL-NETWORKS;

D O I：

10.1016/j.knosys.2022.109741

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

RGB-D-based action recognition can achieve accurate and robust performance due to rich comple-mentary information, and thus has many application scenarios. However, existing works combine multiple modalities by late fusion or learn multimodal representation with simple feature-level fusion methods, which fail to effectively utilize complementary semantic information and model interactions between unimodal features. In this paper, we design a self-attention-based modal enhancement module (MEM) and a cross-attention-based modal interaction module (MIM) to enhance and fuse RGB and depth features. Moreover, a novel bottleneck excitation feed-forward block (BEF) is proposed to enhance the expression ability of the model with few extra parameters and computational overhead. By integrating these two modules with BEFs, one basic fusion layer of the cross-modality fusion transformer is obtained. We apply the transformer on top of the dual-stream convolutional neural networks (ConvNets) to build a dual-stream cross-modality fusion transformer (DSCMT) for RGB-D action recognition. Extensive experiments on the NTU RGB+D 120, PKU-MMD, and THU-READ datasets verify the effectiveness and superiority of the DSCMT. Furthermore, our DSCMT can still make considerable improvements when changing convolutional backbones or when applied to different multimodal combinations, indicating its universality and scalability. The code is available at https: //github.com/liuzwin98/DSCMT. (c) 2022 Published by Elsevier B.V.

引用

页数：11

共 50 条

[1] Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition
Cheng, Jun
Ren, Ziliang
Zhang, Qieshi
Gao, Xiangyang
Hao, Fusheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1498 - 1509
[2] CMOT: A cross-modality transformer for RGB-D fusion in person re-identification with online learning capabilities
Mukhtar, Hamza
Khan, Muhammad Usman Ghani
KNOWLEDGE-BASED SYSTEMS, 2024, 283
[3] Feature Fusion for Dual-Stream Cooperative Action Recognition
Chen, Dong
Wu, Mengtao
Zhang, Tao
Li, Chuanqi
IEEE ACCESS, 2023, 11 : 116732 - 116740
[4] Learning Discriminative Cross-Modality Features for RGB-D Saliency Detection
Wang, Fengyun
Pan, Jinshan
Xu, Shoukun
Tang, Jinhui
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1285 - 1297
[5] DGFNet: Depth-Guided Cross-Modality Fusion Network for RGB-D Salient Object Detection
Xiao, Fen
Pu, Zhengdong
Chen, Jiaqi
Gao, Xieping
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2648 - 2658
[6] SlowFast Multimodality Compensation Fusion Swin Transformer Networks for RGB-D Action Recognition
Xiao, Xiongjiang
Ren, Ziliang
Li, Huan
Wei, Wenhong
Yang, Zhiyong
Yang, Huaide
MATHEMATICS, 2023, 11 (09)
[7] RGB-D road segmentation based on cross-modality feature maintenance and encouragement
Yuan, Xia
Wu, Xinyi
Cui, Yanchao
Zhao, Chunxia
IET INTELLIGENT TRANSPORT SYSTEMS, 2024, 18 (07) : 1355 - 1368
[8] DCMNet: Discriminant and cross-modality network for RGB-D salient object detection
Wang, Fasheng
Wang, Ruimin
Sun, Fuming
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
[9] RGB-D Domain adaptive semantic segmentation with cross-modality feature recalibration
Fan, Qizhe
Shen, Xiaoqin
Ying, Shihui
Wang, Juan
Du, Shaoyi
INFORMATION FUSION, 2025, 120
[10] Asymmetric cross-modality interaction network for RGB-D salient object detection
Su, Yiming
Gao, Haoran
Wang, Mengyin
Wang, Fasheng
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 275

← 1 2 3 4 5 →