TAG-fusion: Two-stage attention guided multi-modal fusion network for semantic segmentation

被引:0
|
作者
机构
[1] Zhang, Zhizhou
[2] Wang, Wenwu
[3] Zhu, Lei
[4] Tang, Zhibin
来源
关键词
Information fusion - Semantics;
D O I
10.1016/j.dsp.2024.104807
中图分类号
学科分类号
摘要
In the current research, leveraging auxiliary modalities, such as depth information or point cloud information, to improve RGB semantic segmentation has shown significant potential. However, existing methods mainly use convolutional modules for aggregating features from auxiliary modalities, thereby lacking sufficient exploitation of long-range dependencies. Moreover, fusion strategies are typically limited to singular approaches. In this paper, we propose a transformer-based multimodal fusion framework to better utilize auxiliary modalities for enhancing semantic segmentation results. Specifically, we employ a dual-stream architecture for extracting features from RGB and auxiliary modalities, respectively. We incorporate both early fusion and deep feature fusion techniques. At each layer, we introduce mixed attention mechanisms to leverage features from other modalities, guiding and enhancing the current modality's features before propagating them to the subsequent stage of feature extraction. After the extraction of features from different modalities, we employ an enhanced cross-attention mechanism for feature interaction, followed by channel fusion to obtain the final semantic features. Subsequently, we provide separate supervision to the network on the RGB stream, auxiliary stream, and fusion stream to facilitate the learning of representations for different modalities. The experimental results demonstrate that our framework exhibits superior performance across diverse modalities. Specifically, our approach achieves state-of-the-art results on the NYU Depth V2, SUN-RGBD, DELIVER and MFNet datasets. © 2024 Elsevier Inc.
引用
收藏
相关论文
共 50 条
  • [1] A Tri-Attention fusion guided multi-modal segmentation network
    Zhou, Tongxue
    Ruan, Su
    Vera, Pierre
    Canu, Stephane
    [J]. PATTERN RECOGNITION, 2022, 124
  • [2] Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation
    Liu, Yunlong
    Yoshie, Osamu
    Watanabe, Hiroshi
    [J]. COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 378 - 397
  • [3] DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds
    Mingjie Li
    Gaihua Wang
    Minghao Zhu
    Chunzheng Li
    Hong Liu
    Xuran Pan
    Qian Long
    [J]. Applied Intelligence, 2024, 54 : 3169 - 3180
  • [4] DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds
    Li, Mingjie
    Wang, Gaihua
    Zhu, Minghao
    Li, Chunzheng
    Liu, Hong
    Pan, Xuran
    Long, Qian
    [J]. APPLIED INTELLIGENCE, 2024, 54 (04) : 3169 - 3180
  • [5] EISNet: A Multi-Modal Fusion Network for Semantic Segmentation With Events and Images
    Xie, Bochen
    Deng, Yongjian
    Shao, Zhanpeng
    Li, Youfu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8639 - 8650
  • [6] A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition
    Hu, Dongni
    Chen, Chengxin
    Zhang, Pengyuan
    Li, Junfeng
    Yan, Yonghong
    Zhao, Qingwei
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (08) : 1391 - 1394
  • [7] Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion
    Li, Siqi
    Zou, Changqing
    Li, Yipeng
    Zhao, Xibin
    Gao, Yue
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11402 - 11409
  • [8] Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation
    Zhao, Jiaqi
    Zhou, Yong
    Shi, Boyu
    Yang, Jingsong
    Zhang, Di
    Yao, Rui
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (06)
  • [9] Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation
    Zhao, Jiaqi
    Zhou, Yong
    Shi, Boyu
    Yang, Jingsong
    Zhang, Di
    Yao, Rui
    [J]. ACM Transactions on Intelligent Systems and Technology, 2021, 12 (06):
  • [10] Dual-Attention Deep Fusion Network for Multi-modal Medical Image Segmentation
    Zheng, Shenhai
    Ye, Xin
    Tan, Jiaxin
    Yang, Yifei
    Li, Laquan
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING, ICGIP 2022, 2022, 12705