RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM

被引:0
|
作者
Song, Ziying [1 ,2 ]
Zhang, Guoxing [3 ]
Liu, Lin [1 ,2 ]
Yang, Lei [4 ]
Xu, Shaoqing [5 ]
Jia, Caiyan [1 ,2 ]
Jia, Feiyang [1 ,2 ]
Wang, Li [6 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing, Peoples R China
[2] Beijing Key Lab Traff Data Anal & Min, Beijing, Peoples R China
[3] Hebei Univ Sci & Technol, Shijiazhuang, Hebei, Peoples R China
[4] Tsinghua Univ, Beijing, Peoples R China
[5] Univ Macau, Macau, Peoples R China
[6] Beijing Inst Technol, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD). Although achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. With the emergence of visual foundation models (VFMs), opportunities and challenges are presented for improving the robustness and generalization of multi-modal 3D object detection in AD. Therefore, we propose RoboFusion, a robust framework that leverages VFMs like SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the original SAM for AD scenarios named SAM-AD. To align SAM or SAMAD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM. We employ wavelet decomposition to denoise the depth-guided images for further noise reduction and weather interference. At last, we employ self-attention mechanisms to adaptively reweight the fused features, enhancing informative features while suppressing excess noise. In summary, RoboFusion significantly reduces noise by leveraging the generalization and robustness of VFMs, thereby enhancing the resilience of multimodal 3D object detection. Consequently, RoboFusion achieves SOTA performance in noisy scenarios, as demonstrated by the KITTI-C and nuScenesC benchmarks. Code is available at https://github. com/adept-thu/RoboFusion.
引用
收藏
页码:1272 / 1280
页数:9
相关论文
共 50 条
  • [1] GraphBEV: Towards Robust BEV Feature Alignment for Multi-modal 3D Object Detection
    Song, Ziying
    Yang, Lei
    Xu, Shaoqing
    Liu, Lin
    Xu, Dongyang
    Jia, Caiyan
    Jia, Feiyang
    Wang, Li
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 347 - 366
  • [2] Multi-Modal Streaming 3D Object Detection
    Abdelfattah, Mazen
    Yuan, Kaiwen
    Wang, Z. Jane
    Ward, Rabab
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6163 - 6170
  • [3] Multi-Modal 3D Object Detection by Box Matching
    Liu, Zhe
    Ye, Xiaoqing
    Zou, Zhikang
    He, Xinwei
    Tan, Xiao
    Ding, Errui
    Wang, Jingdong
    Bai, Xiang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024,
  • [4] LSSAttn: Towards Dense and Accurate View Transformation for Multi-modal 3D Object Detection
    Jiang, Qi
    Sun, Hao
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 6600 - 6606
  • [5] Towards efficient multi-modal 3D object detection: Homogeneous sparse fuse network
    Tang, Yingjuan
    He, Hongwen
    Wang, Yong
    Wu, Jingda
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 256
  • [6] Quantization to accelerate inference in multi-modal 3D object detection
    Geerhart, Billy
    Dasari, Venkat R.
    Rapp, Brian
    Wang, Peng
    Wang, Ju
    Payne, Christopher X.
    DISRUPTIVE TECHNOLOGIES IN INFORMATION SCIENCES VIII, 2024, 13058
  • [7] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
    Wang, Yingjie
    Mao, Qiuyu
    Zhu, Hanqi
    Deng, Jiajun
    Zhang, Yu
    Ji, Jianmin
    Li, Houqiang
    Zhang, Yanyong
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 2122 - 2152
  • [8] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
    Yingjie Wang
    Qiuyu Mao
    Hanqi Zhu
    Jiajun Deng
    Yu Zhang
    Jianmin Ji
    Houqiang Li
    Yanyong Zhang
    International Journal of Computer Vision, 2023, 131 : 2122 - 2152
  • [9] Multi-modal Data Analysis and Fusion for Robust Object Detection in 2D/3D Sensing
    Schierl, Jonathan
    Graehling, Quinn
    Aspiras, Theus
    Asari, Vijay
    Van Rynbach, Andre
    Rabb, Dave
    2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
  • [10] ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion
    Cai, Qi
    Pan, Yingwei
    Yao, Ting
    Ngo, Chong-Wah
    Mei, Tao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18021 - 18030