TransFusion: Multi-Modal Robust Fusion for 3D Object Detection in Foggy Weather Based on Spatial Vision Transformer

被引:0
|
作者
Zhang, Cheng [1 ]
Wang, Hai [1 ]
Cai, Yingfeng [2 ]
Chen, Long [2 ]
Li, Yicheng [2 ]
机构
[1] Jiangsu Univ, Sch Automot & Traff Engn, Zhenjiang 212013, Peoples R China
[2] Jiangsu Univ, Automot Engn Res Inst, Zhenjiang 212013, Peoples R China
基金
中国国家自然科学基金;
关键词
3D object detection; multi-modal data fusion; intelligent vehicle; attention mechanism; spatial vision transformer; temporal-spatial memory fusion; radar; LiDAR;
D O I
10.1109/TITS.2024.3420432
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
A practical approach to realizing the comprehensive perception of the surrounding environment is to use a multi-modal fusion method based on various types of vehicular sensors. In clear weather, the camera and LiDAR can provide high-resolution images and point clouds that can be utilized for 3D object detection. However, in foggy weather, the propagation of light is affected by the fog in the air. Consequently, both images and point clouds become distorted to varying degrees. Thus, it is challenging to implement accurate detection in adverse weather conditions. Compared to cameras and LiDAR, Radar possesses strong penetrating power and is not affected by fog. Therefore, this paper proposes a novel two-stage detection framework called "TransFusion", which leverages LiDAR and Radar fusion to solve the problem of environment perception in foggy weather. The proposed framework is composed of Multi-modal Rotate Region Proposal Network (MM-RRPN) and Multi-modal Refine Network (MM-RFN). Specifically, Spatial Vision Transformer (SVT) and Cross-Modal Attention Mechanism (CMAM) are introduced in the MM-RRPN to improve the robustness of the algorithm in foggy weather. Furthermore, Temporal-Spatial Memory Fusion (TSMF) module in MM-RFN is employed to fuse the spatial-temporal prior information. In addition, the Multi-branches Combination Loss function (MC-Loss) is designed to efficiently supervise the learning of the network. Extensive experiments were conducted on Oxford Radar RobotCar (ORR) dataset. The experimental results show that the proposed algorithm has excellent performance in both foggy and clear weather. Especially in foggy weather, the proposed TransFusion achieves 85.31mAP, outperforming all other competing approaches. The demo is available at: https://youtu.be/ ugjIYHLgn98.
引用
收藏
页码:10652 / 10666
页数:15
相关论文
共 50 条
  • [1] Research on 3D Object Detection Method Based on Multi-Modal Fusion
    Tian, Feng
    Zong, Neili
    Liu, Fang
    Lu, Yuanyuan
    Liu, Chao
    Jiang, Wenwen
    Zhao, Ling
    Han, Yuxiang
    Computer Engineering and Applications, 2024, 60 (13) : 113 - 123
  • [2] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
    Liu, Zhanwen
    Cheng, Juanru
    Fan, Jin
    Lin, Shan
    Wang, Yang
    Zhao, Xiangmo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
  • [3] ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion
    Cai, Qi
    Pan, Yingwei
    Yao, Ting
    Ngo, Chong-Wah
    Mei, Tao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18021 - 18030
  • [4] Multi-modal Data Analysis and Fusion for Robust Object Detection in 2D/3D Sensing
    Schierl, Jonathan
    Graehling, Quinn
    Aspiras, Theus
    Asari, Vijay
    Van Rynbach, Andre
    Rabb, Dave
    2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
  • [5] Deep multi-scale and multi-modal fusion for 3D object detection
    Guo, Rui
    Li, Deng
    Han, Yahong
    PATTERN RECOGNITION LETTERS, 2021, 151 : 236 - 242
  • [6] Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
    Li, Xin
    Shi, Botian
    Hou, Yuenan
    Wu, Xingjiao
    Ma, Tianlong
    Li, Yikang
    He, Liang
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 691 - 707
  • [7] Multi-modal feature fusion for 3D object detection in the production workshop
    Hou, Rui
    Chen, Guangzhu
    Han, Yinhe
    Tang, Zaizuo
    Ru, Qingjun
    APPLIED SOFT COMPUTING, 2022, 115
  • [8] Deformable Feature Fusion Network for Multi-Modal 3D Object Detection
    Guo, Kun
    Gan, Tong
    Ding, Zhao
    Ling, Qiang
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 363 - 367
  • [9] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ruixin Ma
    Yong Yin
    Jing Chen
    Rihao Chang
    Multimedia Tools and Applications, 2024, 83 : 7995 - 8012
  • [10] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ma, Ruixin
    Yin, Yong
    Chen, Jing
    Chang, Rihao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7995 - 8012