Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection

被引：0

作者：

Chen, Hao ^{[1
]}

Li, You-Fu ^{[1
,2
]}

Su, Dan ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Mech Engn, Kowloon, Hong Kong, Peoples R China

[2] City Univ Hong Kong, Shenzhen Res Inst, Hong Kong, Peoples R China

来源：

2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2018年

关键词：

MODEL;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional neural networks have achieved wide success in RGB saliency detection. Recently, the advent of RGB-D sensors such as Kinect provide additional geometric saliency cues. However, the key challenge for RGB-D salient object detection that how to fuse RGB and depth information sufficiently is still under-studied. Traditional works mainly follow the two-stream architecture and combine RGB and depth features/decisions in an early or late point. The multi-modal fusion stage is performed by directly concatenating the features from two modalities without selection. In this work, we address this question by proposing a novel network with a distinguished insight: A selection module is significantly helpful for more informative and sufficient cross-modal cross-level combination. To this end, we introduce a top-down RGB-D fusion network which integrates an attention-aware cross-modal cross-level fusion block in each level to select discriminative features from each level and each modality. Extensive experiments on public datasets show that the proposed network is able to solve the key problems in RGB-D fusion and achieves state-of-the-art performance on RGB-D salient object detection.

引用

页码：6821 / 6826

页数：6

共 50 条

[31] PATNet: Patch-to-pixel attention-aware transformer network for RGB-D and RGB-T salient object detection
Jiang, Mingfeng
Ma, Jianhua
Chen, Jiatong
Wang, Yaming
Fang, Xian
[J]. KNOWLEDGE-BASED SYSTEMS, 2024, 291
[32] Cross-Modal Adaptation for RGB-D Detection
Hoffman, Judy
Gupta, Saurabh
Leong, Jian
Guadarrama, Sergio
Darrell, Trevor
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016, : 5032 - 5039
[33] Multi-scale Cross-Modal Transformer Network for RGB-D Object Detection
Xiao, Zhibin
Xie, Pengwei
Wang, Guijin
[J]. MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 352 - 363
[34] Cross-Modal Attentional Context Learning for RGB-D Object Detection
Li, Guanbin
Gan, Yukang
Wu, Hejun
Xiao, Nong
Lin, Liang
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 1591 - 1601
[35] RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss
Zhu, Xinxin
Li, Yi
Fu, Huazhu
Fan, Xiaoting
Shi, Yanan
Lei, Jianjun
[J]. NEUROCOMPUTING, 2021, 453 : 623 - 635
[36] Cross-Modal Adaptive Interaction Network for RGB-D Saliency Detection
Du, Qinsheng
Bian, Yingxu
Wu, Jianyu
Zhang, Shiyan
Zhao, Jian
[J]. APPLIED SCIENCES-BASEL, 2024, 14 (17):
[37] CCANet: A Collaborative Cross-Modal Attention Network for RGB-D Crowd Counting
Liu, Yanbo
Cao, Guo
Shi, Boshan
Hu, Yingxiang
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 154 - 165
[38] Asymmetric cross-modal activation network for RGB-T salient object detection
Xu, Chang
Li, Qingwu
Zhou, Qingkai
Jiang, Xiongbiao
Yu, Dabing
Zhou, Yaqin
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 258
[39] Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection
Chen, Hao
Li, Youfu
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3051 - 3060
[40] An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images
Yan Chen
Jianjun Ni
Guangyi Tang
Weidong Cao
Simon X. Yang
[J]. Multimedia Tools and Applications, 2024, 83 : 12159 - 12184

← 1 2 3 4 5 →