Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

被引:1
|
作者
Xiao, Yun [1 ,2 ,4 ]
Huang, Yameng [3 ]
Li, Chenglong [1 ,2 ,4 ]
Liu, Lei [3 ]
Zhou, Aiwu [3 ]
Tang, Jin [3 ]
机构
[1] Informat Mat & Intelligent Sensing Lab Anhui Prov, Hefei 230601, Peoples R China
[2] Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
[3] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Peoples R China
[4] Anhui Univ, Sch Artificial Intelligence, Hefei 230601, Peoples R China
基金
中国国家自然科学基金;
关键词
Salient object detection; Depth estimation; Lightweight network; Multi-modal representation learning; NETWORK;
D O I
10.1007/s12559-023-10148-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of salient object detection (SOD) often faces various challenges such as complex backgrounds and low appearance contrast. Depth information, which reflects the geometric shape of an object's surface, can be used as a supplement to visible information and receives increasing interest in SOD. However, depth sensors suffer from limited conditions and range (e.g., 4-5 ms at most in indoor scenes), and the imaging quality is usually low. We design a lightweight network in order to infer depth features while reducing computational complexities, which only needs a few parameters to effectively capture depth-specific features by fusing high-level features from the RGB modality. Both RGB features and inferred depth features might contain noises, and thus we design a fusion network, which includes a self-attention-based feature interaction module and a foreground-background enhancement module, to achieve an adaptive fusion of RGB and depth features. In addition, we introduce a multi-scale fusion module with different dilated convolutions to leverage useful local and global context clues. Experimental results on five benchmark datasets show that our approach significantly outperforms the state-of-the-art RGBD SOD methods, and also performs comparably against the state-of-the-art RGB SOD methods. The experimental results show that our multi-modal representation learning method can deal with the imaging limitations of single-modality data for RGB salient object detection, and the experimental results on multiple RGBD and RGB SOD datasets illustrate the effectiveness of our method.
引用
收藏
页码:1868 / 1883
页数:16
相关论文
共 50 条
  • [31] Scalable multi-modal representation learning networks
    Zihan Fang
    Ying Zou
    Shiyang Lan
    Shide Du
    Yanchao Tan
    Shiping Wang
    Artificial Intelligence Review, 58 (7)
  • [32] M3Net: Multi-scale Multi-path Multi-modal Fusion Network and Example Application to RGB-D Salient Object Detection
    Chen, Hao
    Li, You-Fu
    Su, Dan
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 4911 - 4916
  • [33] Multi-level cross-modal interaction network for RGB-D salient object detection
    Huang, Zhou
    Chen, Huai-Xin
    Zhou, Tao
    Yang, Yun-Zhi
    Liu, Bi-Yuan
    NEUROCOMPUTING, 2021, 452 : 200 - 211
  • [34] MFFNet: Multi-Modal Feature Fusion Network for V-D-T Salient Object Detection
    Wan, Bin
    Zhou, Xiaofei
    Sun, Yaoqi
    Wang, Tingyu
    Lv, Chengtao
    Wang, Shuai
    Yin, Haibing
    Yan, Chenggang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2069 - 2081
  • [35] Cross-Level Multi-Modal Features Learning With Transformer for RGB-D Object Recognition
    Zhang, Ying
    Yin, Maoliang
    Wang, Heyong
    Hua, Changchun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7121 - 7130
  • [36] Multi-modal object detection using unsupervised transfer learning and adaptation techniques
    Abbott, Rachael
    Robertson, Neil
    del Rincon, Jesus Martinez
    Connor, Barry
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS, 2019, 11169
  • [37] Deep learning based object detection from multi-modal sensors: an overview
    Ye Liu
    Shiyang Meng
    Hongzhang Wang
    Jun Liu
    Multimedia Tools and Applications, 2024, 83 : 19841 - 19870
  • [38] Deep learning based object detection from multi-modal sensors: an overview
    Liu, Ye
    Meng, Shiyang
    Wang, Hongzhang
    Liu, Jun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19841 - 19870
  • [39] Exploiting enhanced and robust RGB-D face representation via progressive multi-modal learning
    Zhu, Yizhe
    Gao, Jialin
    Wu, Tianshu
    Liu, Qiong
    Zhou, Xi
    PATTERN RECOGNITION LETTERS, 2023, 166 : 38 - 45
  • [40] Depth-aware lightweight network for RGB-D salient object detection
    Ling, Liuyi
    Wang, Yiwen
    Wang, Chengjun
    Xu, Shanyong
    Huang, Yourui
    IET IMAGE PROCESSING, 2023, 17 (08) : 2350 - 2361