TransFusion: Multi-Modal Robust Fusion for 3D Object Detection in Foggy Weather Based on Spatial Vision Transformer

被引:0
|
作者
Zhang, Cheng [1 ]
Wang, Hai [1 ]
Cai, Yingfeng [2 ]
Chen, Long [2 ]
Li, Yicheng [2 ]
机构
[1] Jiangsu Univ, Sch Automot & Traff Engn, Zhenjiang 212013, Peoples R China
[2] Jiangsu Univ, Automot Engn Res Inst, Zhenjiang 212013, Peoples R China
基金
中国国家自然科学基金;
关键词
3D object detection; multi-modal data fusion; intelligent vehicle; attention mechanism; spatial vision transformer; temporal-spatial memory fusion; radar; LiDAR;
D O I
10.1109/TITS.2024.3420432
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
A practical approach to realizing the comprehensive perception of the surrounding environment is to use a multi-modal fusion method based on various types of vehicular sensors. In clear weather, the camera and LiDAR can provide high-resolution images and point clouds that can be utilized for 3D object detection. However, in foggy weather, the propagation of light is affected by the fog in the air. Consequently, both images and point clouds become distorted to varying degrees. Thus, it is challenging to implement accurate detection in adverse weather conditions. Compared to cameras and LiDAR, Radar possesses strong penetrating power and is not affected by fog. Therefore, this paper proposes a novel two-stage detection framework called "TransFusion", which leverages LiDAR and Radar fusion to solve the problem of environment perception in foggy weather. The proposed framework is composed of Multi-modal Rotate Region Proposal Network (MM-RRPN) and Multi-modal Refine Network (MM-RFN). Specifically, Spatial Vision Transformer (SVT) and Cross-Modal Attention Mechanism (CMAM) are introduced in the MM-RRPN to improve the robustness of the algorithm in foggy weather. Furthermore, Temporal-Spatial Memory Fusion (TSMF) module in MM-RFN is employed to fuse the spatial-temporal prior information. In addition, the Multi-branches Combination Loss function (MC-Loss) is designed to efficiently supervise the learning of the network. Extensive experiments were conducted on Oxford Radar RobotCar (ORR) dataset. The experimental results show that the proposed algorithm has excellent performance in both foggy and clear weather. Especially in foggy weather, the proposed TransFusion achieves 85.31mAP, outperforming all other competing approaches. The demo is available at: https://youtu.be/ ugjIYHLgn98.
引用
收藏
页码:10652 / 10666
页数:15
相关论文
共 50 条
  • [21] 3D Object Detection with SLS-Fusion Network in Foggy Weather Conditions
    Nguyen Anh Minh Mai
    Duthon, Pierre
    Khoudour, Louahdi
    Crouzil, Alain
    Velastin, Sergio A.
    SENSORS, 2021, 21 (20)
  • [22] Multi-Modal 3D Object Detection by Box Matching
    Liu, Zhe
    Ye, Xiaoqing
    Zou, Zhikang
    He, Xinwei
    Tan, Xiao
    Ding, Errui
    Wang, Jingdong
    Bai, Xiang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024,
  • [23] Unlocking the power of multi-modal fusion in 3D object tracking
    Hu, Yue
    IET COMPUTER VISION, 2025, 19 (01)
  • [24] GraphBEV: Towards Robust BEV Feature Alignment for Multi-modal 3D Object Detection
    Song, Ziying
    Yang, Lei
    Xu, Shaoqing
    Liu, Lin
    Xu, Dongyang
    Jia, Caiyan
    Jia, Feiyang
    Wang, Li
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 347 - 366
  • [25] Dual-domain deformable feature fusion for multi-modal 3D object detection
    Wang, Shihao
    Deng, Tao
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)
  • [26] DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
    Li, Yingwei
    Yu, Adams Wei
    Meng, Tianjian
    Caine, Ben
    Ngiam, Jiquan
    Peng, Daiyi
    Shen, Junyang
    Lu, Yifeng
    Zhou, Denny
    Le, Quoc, V
    Yuille, Alan
    Tan, Mingxing
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17161 - 17170
  • [27] GATR: Transformer Based on Guided Aggregation Decoder for 3D Multi-Modal Detection
    Luo, Yikai
    He, Linyuan
    Ma, Shiping
    Qi, Zisen
    Fan, Zunlin
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (11): : 9725 - 9732
  • [28] Quantization to accelerate inference in multi-modal 3D object detection
    Geerhart, Billy
    Dasari, Venkat R.
    Rapp, Brian
    Wang, Peng
    Wang, Ju
    Payne, Christopher X.
    DISRUPTIVE TECHNOLOGIES IN INFORMATION SCIENCES VIII, 2024, 13058
  • [29] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
    Yan, Junjie
    Liu, Yingfei
    Sun, Jianjian
    Jia, Fan
    Li, Shuailin
    Wang, Tiancai
    Zhang, Xiangyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18222 - 18232
  • [30] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
    Wang, Yingjie
    Mao, Qiuyu
    Zhu, Hanqi
    Deng, Jiajun
    Zhang, Yu
    Ji, Jianmin
    Li, Houqiang
    Zhang, Yanyong
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 2122 - 2152