Three-stream RGB-D salient object detection network based on cross-level and cross-modal dual-attention fusion

被引:0
|
作者
Meng, Lingbing [1 ]
Yuan, Mengya [1 ]
Shi, Xuehan [1 ]
Liu, Qingqing [1 ]
Cheng, Fei [1 ,2 ,4 ]
Li, Lingli [3 ]
机构
[1] Sch Anhui Inst Informat Technol, Wuhu, Peoples R China
[2] Sch Hangzhou Dianzi Univ, Hangzhou, Peoples R China
[3] Sch Heilongjiang Univ, Harbin, Peoples R China
[4] Sch Anhui Inst Informat Technol, Wuhu 241199, Peoples R China
关键词
computer vision; cross-modal fusion; depth map; dual-attention fusion; images; salient object detection; three-stream model; REFINEMENT NETWORK; SEGMENTATION; IMAGE;
D O I
10.1049/ipr2.12862
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The effective integration of RGB and depth map features to improve the performance of RGB-D salient object detection (SOD) has garnered significant research interest. The existing dual-stream models can be used for high-level feature fusion or unidirectionally transferring depth features to RGB features; however, they are unable to fully exploit the differences in modality. Furthermore, owing to the influence of image background information, the generated salient object is affected by background swallow. Herein, a three-stream RGB-D SOD method based on cross-layer and cross-modal dual-attention (CMDA) fusion is proposed. In the encoding stage, the CMDA fusion module is used to fuse RGB and depth features layer by layer. Through this module, merged interactive features may be used to extract the richer features of salient objects, realize the commonality and complementarity of fusion features, and achieve effective cross-modal fusion. In addition, for the decoding stage, a cross-level feature fusion module that introduces global context features into the up-sampling process, reduces the impact of salient objects being swallowed by the background, and helps to accurately detect salient areas is proposed. Three different branch features are used for simultaneous end-to-end training. The experimental results demonstrate that the proposed method outperforms other methods in terms of multiple evaluation metrics on four datasets. Furthermore, the authors visualize the precision-recall curve, F-measure curve, and saliency map, which indicate that the detection effect of the proposed method is superior to those of other methods. During the testing stage, our model ran at 14 frames per second (FPS).
引用
收藏
页码:3292 / 3308
页数:17
相关论文
共 50 条
  • [1] Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection
    Chen, Hao
    Li, You-Fu
    Su, Dan
    [J]. 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 6821 - 6826
  • [2] RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion
    Peng, Yanbin
    Zhai, Zhinian
    Feng, Mingkun
    [J]. IEEE ACCESS, 2024, 12 : 45134 - 45146
  • [3] RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion
    Peng, Yanbin
    Zhai, Zhinian
    Feng, Mingkun
    [J]. IEEE Access, 2024, 12 : 45134 - 45146
  • [4] Progressive cross-level fusion network for RGB-D salient object detection
    Li, Jianbao
    Pan, Chen
    Zheng, Yilin
    Zhang, Dongping
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
  • [5] Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection
    Chen, Hao
    Li, Youfu
    Su, Dan
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (11) : 4808 - 4820
  • [6] Cross-Modal and Cross-Level Attention Interaction Network for Salient Object Detection
    Wang, Fasheng
    Su, Yiming
    Wang, Ruimin
    Sun, Jing
    Sun, Fuming
    Li, Haojie
    [J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2907 - 2920
  • [7] RGB-D salient object detection with asymmetric cross-modal fusion
    Yu, Ming
    Xing, Zhang-Hao
    Liu, Yi
    [J]. Kongzhi yu Juece/Control and Decision, 2023, 38 (09): : 2487 - 2495
  • [8] Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection
    Hu, Xihang
    Sun, Fuming
    Sun, Jing
    Wang, Fasheng
    Li, Haojie
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 3067 - 3085
  • [9] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection
    Chen, Hao
    Li, Youfu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) : 2825 - 2835
  • [10] Cross-modal hierarchical interaction network for RGB-D salient object detection
    Bi, Hongbo
    Wu, Ranwan
    Liu, Ziqi
    Zhu, Huihui
    Zhang, Cong
    Xiang, Tian -Zhu
    [J]. PATTERN RECOGNITION, 2023, 136