RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM

被引:0
|
作者
Song, Ziying [1 ,2 ]
Zhang, Guoxing [3 ]
Liu, Lin [1 ,2 ]
Yang, Lei [4 ]
Xu, Shaoqing [5 ]
Jia, Caiyan [1 ,2 ]
Jia, Feiyang [1 ,2 ]
Wang, Li [6 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing, Peoples R China
[2] Beijing Key Lab Traff Data Anal & Min, Beijing, Peoples R China
[3] Hebei Univ Sci & Technol, Shijiazhuang, Hebei, Peoples R China
[4] Tsinghua Univ, Beijing, Peoples R China
[5] Univ Macau, Macau, Peoples R China
[6] Beijing Inst Technol, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD). Although achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. With the emergence of visual foundation models (VFMs), opportunities and challenges are presented for improving the robustness and generalization of multi-modal 3D object detection in AD. Therefore, we propose RoboFusion, a robust framework that leverages VFMs like SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the original SAM for AD scenarios named SAM-AD. To align SAM or SAMAD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM. We employ wavelet decomposition to denoise the depth-guided images for further noise reduction and weather interference. At last, we employ self-attention mechanisms to adaptively reweight the fused features, enhancing informative features while suppressing excess noise. In summary, RoboFusion significantly reduces noise by leveraging the generalization and robustness of VFMs, thereby enhancing the resilience of multimodal 3D object detection. Consequently, RoboFusion achieves SOTA performance in noisy scenarios, as demonstrated by the KITTI-C and nuScenesC benchmarks. Code is available at https://github. com/adept-thu/RoboFusion.
引用
收藏
页码:1272 / 1280
页数:9
相关论文
共 50 条
  • [21] MLF3D: Multi-Level Fusion for Multi-Modal 3D Object Detection
    Jiang, Han
    Wang, Jianbin
    Xiao, Jianru
    Zhao, Yanan
    Chen, Wanqing
    Ren, Yilong
    Yu, Haiyang
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1588 - 1593
  • [22] ActiveAnno3D-An Active Learning Framework for Multi-Modal 3D Object Detection
    Ghita, Ahmed
    Antoniussen, Bjork
    Zimmer, Walter
    Greer, Ross
    Cress, Christian
    Mogelmose, Andreas
    Trivedi, Mohan M.
    Knoll, Alois C.
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1699 - 1706
  • [23] Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization
    Rollo, Federico
    Raiola, Gennaro
    Zunino, Andrea
    Tsagarakis, Nikolaos
    Ajoudani, Arash
    2023 EUROPEAN CONFERENCE ON MOBILE ROBOTS, ECMR, 2023, : 90 - 97
  • [24] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
    Liu, Zhanwen
    Cheng, Juanru
    Fan, Jin
    Lin, Shan
    Wang, Yang
    Zhao, Xiangmo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
  • [25] Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
    Li, Jiahao
    Chen, Lingshan
    Li, Zhen
    IEEE ACCESS, 2025, 13 : 52385 - 52396
  • [26] Frustum FusionNet: Amodal 3D Object Detection with Multi-Modal Feature Fusion
    Zuo, Liangyu
    Li, Yaochen
    Han, Mengtao
    Li, Qiao
    Liu, Yuehu
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 2746 - 2751
  • [27] Enhancing 3D object detection through multi-modal fusion for cooperative perception
    Xia, Bin
    Zhou, Jun
    Kong, Fanyu
    You, Yuhe
    Yang, Jiarui
    Lin, Lin
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 104 : 46 - 55
  • [28] Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
    Huang, Linyan
    Li, Zhiqi
    Sima, Chonghao
    Wang, Wenhai
    Wang, Jingdong
    Qiao, Yu
    Li, Hongyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] TransFusion: Multi-Modal Robust Fusion for 3D Object Detection in Foggy Weather Based on Spatial Vision Transformer
    Zhang, Cheng
    Wang, Hai
    Cai, Yingfeng
    Chen, Long
    Li, Yicheng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (09) : 10652 - 10666
  • [30] MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection
    Jiao, Tianzhe
    Chen, Yuming
    Zhang, Zhe
    Guo, Chaopeng
    Song, Jie
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (03): : 4307 - 4325