Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

被引:1
|
作者
Xiao, Yun [1 ,2 ,4 ]
Huang, Yameng [3 ]
Li, Chenglong [1 ,2 ,4 ]
Liu, Lei [3 ]
Zhou, Aiwu [3 ]
Tang, Jin [3 ]
机构
[1] Informat Mat & Intelligent Sensing Lab Anhui Prov, Hefei 230601, Peoples R China
[2] Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
[3] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Peoples R China
[4] Anhui Univ, Sch Artificial Intelligence, Hefei 230601, Peoples R China
基金
中国国家自然科学基金;
关键词
Salient object detection; Depth estimation; Lightweight network; Multi-modal representation learning; NETWORK;
D O I
10.1007/s12559-023-10148-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of salient object detection (SOD) often faces various challenges such as complex backgrounds and low appearance contrast. Depth information, which reflects the geometric shape of an object's surface, can be used as a supplement to visible information and receives increasing interest in SOD. However, depth sensors suffer from limited conditions and range (e.g., 4-5 ms at most in indoor scenes), and the imaging quality is usually low. We design a lightweight network in order to infer depth features while reducing computational complexities, which only needs a few parameters to effectively capture depth-specific features by fusing high-level features from the RGB modality. Both RGB features and inferred depth features might contain noises, and thus we design a fusion network, which includes a self-attention-based feature interaction module and a foreground-background enhancement module, to achieve an adaptive fusion of RGB and depth features. In addition, we introduce a multi-scale fusion module with different dilated convolutions to leverage useful local and global context clues. Experimental results on five benchmark datasets show that our approach significantly outperforms the state-of-the-art RGBD SOD methods, and also performs comparably against the state-of-the-art RGB SOD methods. The experimental results show that our multi-modal representation learning method can deal with the imaging limitations of single-modality data for RGB salient object detection, and the experimental results on multiple RGBD and RGB SOD datasets illustrate the effectiveness of our method.
引用
收藏
页码:1868 / 1883
页数:16
相关论文
共 50 条
  • [41] Aggregate interactive learning for RGB-D salient object detection
    Wu, Jingyu
    Sun, Fuming
    Xu, Rui
    Meng, Jie
    Wang, Fasheng
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 195
  • [42] RGB-D salient object detection with asymmetric cross-modal fusion
    Yu M.
    Xing Z.-H.
    Liu Y.
    Kongzhi yu Juece/Control and Decision, 2023, 38 (09): : 2487 - 2495
  • [43] Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
    Fan, Dinghao
    Lu, Hengjie
    Xu, Shugong
    Cao, Shan
    IEEE SENSORS JOURNAL, 2021, 21 (23) : 27026 - 27036
  • [44] Multi-modal deep learning networks for RGB-D pavement waste detection and recognition
    Li, Yangke
    Zhang, Xinman
    WASTE MANAGEMENT, 2024, 177 : 125 - 134
  • [45] Deep Multi-modal Object Detection for Autonomous Driving
    Ennajar, Amal
    Khouja, Nadia
    Boutteau, Remi
    Tlili, Fethi
    2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 7 - 11
  • [46] Fast Multi-Modal Unified Sparse Representation Learning
    Verma, Mridula
    Shukla, Kaushal Kumar
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 448 - 452
  • [47] Multi-modal Representation Learning for Successive POI Recommendation
    Li, Lishan
    Liu, Ying
    Wu, Jianping
    He, Lin
    Ren, Gang
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 441 - 456
  • [48] Joint Representation Learning for Multi-Modal Transportation Recommendation
    Liu, Hao
    Li, Ting
    Hu, Renjun
    Fu, Yanjie
    Gu, Jingjing
    Xiong, Hui
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1036 - 1043
  • [49] Deep contrastive representation learning for multi-modal clustering
    Lu, Yang
    Li, Qin
    Zhang, Xiangdong
    Gao, Quanxue
    NEUROCOMPUTING, 2024, 581
  • [50] Supervised Multi-modal Dictionary Learning for Clothing Representation
    Zhao, Qilu
    Wang, Jiayan
    Li, Zongmin
    PROCEEDINGS OF THE FIFTEENTH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS - MVA2017, 2017, : 51 - 54