MambaSOD: Dual Mamba-driven cross-modal fusion network for RGB-D Salient Object Detection

被引:0
|
作者
Zhan, Yue [2 ]
Zeng, Zhihong [1 ,3 ]
Liu, Haijun [3 ]
Tan, Xiaoheng [3 ]
Tian, Yinli [4 ]
机构
[1] Institute of Interdisciplinary Studies, Guangdong Polytechnic Normal University, Guangzhou, China
[2] Department of Electrical and Electronic Engineering, the University of Hong Kong, Hong Kong
[3] School of Microelectronics and Communication Engineering, Chongqing University, Chongqing,400044, China
[4] School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing,400065, China
基金
中国国家自然科学基金;
关键词
Modal analysis - Object detection - Object recognition;
D O I
10.1016/j.neucom.2025.129718
中图分类号
学科分类号
摘要
The purpose of RGB-D Salient Object Detection (SOD) is to pinpoint the most visually conspicuous areas within images accurately. Numerous conventional models heavily rely on CNN and overlook the long-range contextual dependencies, subsequent transformer-based models have addressed the issue to some extent but introduce quadratic computational complexity. Moreover, incorporating spatial information from depth maps has been proven effective for this task and the primary challenge is how to effectively fuse the complementary information from RGB and depth. Recent advancements in Mamba, particularly its superior ability to perform long-range modeling within linear efficiency, have motivated our exploration of its potential in the RGB-D SOD task. In this paper, we propose a dual Mamba-driven cross-modal fusion network for RGB-D SOD, named MambaSOD, which effectively leverages Mamba's long-range dependency modeling capability. Specifically, we employ a dual Mamba-driven feature extractor to process RGB and depth inputs to obtain features with global contextual information. Then, we design a cross-modal fusion Mamba to perform modality-specific feature enhancement and model the inter-modal correlation between the RGB and depth features. To the best of our knowledge, this work is an innovative attempt to explore the potential of the pure Mamba in the RGB-D SOD task, offering a novel perspective. Numerous experiments conducted on seven prevailing datasets demonstrate our method's superiority over eighteen state-of-the-art RGB-D SOD models. The source code will be released at https://github.com/YueZhan721/MambaSOD. © 2025 Elsevier B.V.
引用
收藏
相关论文
共 50 条
  • [31] Multi-scale Cross-Modal Transformer Network for RGB-D Object Detection
    Xiao, Zhibin
    Xie, Pengwei
    Wang, Guijin
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 352 - 363
  • [32] Feature Enhancement and Multi-scale Cross-Modal Attention for RGB-D Salient Object Detection
    Wan, Xin
    Yang, Gang
    Zhou, Boyi
    Liu, Chang
    Wang, Hangxu
    Wang, Yutao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 409 - 420
  • [33] RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss
    Zhu, Xinxin
    Li, Yi
    Fu, Huazhu
    Fan, Xiaoting
    Shi, Yanan
    Lei, Jianjun
    NEUROCOMPUTING, 2021, 453 : 623 - 635
  • [34] An adaptive guidance fusion network for RGB-D salient object detection
    Sun, Haodong
    Wang, Yu
    Ma, Xinpeng
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (02) : 1683 - 1693
  • [35] Scale Adaptive Fusion Network for RGB-D Salient Object Detection
    Kong, Yuqiu
    Zheng, Yushuo
    Yao, Cuili
    Liu, Yang
    Wang, He
    COMPUTER VISION - ACCV 2022, PT III, 2023, 13843 : 608 - 625
  • [36] An adaptive guidance fusion network for RGB-D salient object detection
    Haodong Sun
    Yu Wang
    Xinpeng Ma
    Signal, Image and Video Processing, 2024, 18 : 1683 - 1693
  • [37] Cross-Modal Attentional Context Learning for RGB-D Object Detection
    Li, Guanbin
    Gan, Yukang
    Wu, Hejun
    Xiao, Nong
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 1591 - 1601
  • [38] Cross-modal attention fusion network for RGB-D semantic segmentation
    Zhao, Qiankun
    Wan, Yingcai
    Xu, Jiqian
    Fang, Lijin
    NEUROCOMPUTING, 2023, 548
  • [39] Cross-Modal Adaptation for RGB-D Detection
    Hoffman, Judy
    Gupta, Saurabh
    Leong, Jian
    Guadarrama, Sergio
    Darrell, Trevor
    2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016, : 5032 - 5039
  • [40] Adaptive Fusion for RGB-D Salient Object Detection
    Wang, Ningning
    Gong, Xiaojin
    IEEE ACCESS, 2019, 7 : 55277 - 55284