TransFusion: Multi-Modal Robust Fusion for 3D Object Detection in Foggy Weather Based on Spatial Vision Transformer

被引:0
|
作者
Zhang, Cheng [1 ]
Wang, Hai [1 ]
Cai, Yingfeng [2 ]
Chen, Long [2 ]
Li, Yicheng [2 ]
机构
[1] Jiangsu Univ, Sch Automot & Traff Engn, Zhenjiang 212013, Peoples R China
[2] Jiangsu Univ, Automot Engn Res Inst, Zhenjiang 212013, Peoples R China
基金
中国国家自然科学基金;
关键词
3D object detection; multi-modal data fusion; intelligent vehicle; attention mechanism; spatial vision transformer; temporal-spatial memory fusion; radar; LiDAR;
D O I
10.1109/TITS.2024.3420432
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
A practical approach to realizing the comprehensive perception of the surrounding environment is to use a multi-modal fusion method based on various types of vehicular sensors. In clear weather, the camera and LiDAR can provide high-resolution images and point clouds that can be utilized for 3D object detection. However, in foggy weather, the propagation of light is affected by the fog in the air. Consequently, both images and point clouds become distorted to varying degrees. Thus, it is challenging to implement accurate detection in adverse weather conditions. Compared to cameras and LiDAR, Radar possesses strong penetrating power and is not affected by fog. Therefore, this paper proposes a novel two-stage detection framework called "TransFusion", which leverages LiDAR and Radar fusion to solve the problem of environment perception in foggy weather. The proposed framework is composed of Multi-modal Rotate Region Proposal Network (MM-RRPN) and Multi-modal Refine Network (MM-RFN). Specifically, Spatial Vision Transformer (SVT) and Cross-Modal Attention Mechanism (CMAM) are introduced in the MM-RRPN to improve the robustness of the algorithm in foggy weather. Furthermore, Temporal-Spatial Memory Fusion (TSMF) module in MM-RFN is employed to fuse the spatial-temporal prior information. In addition, the Multi-branches Combination Loss function (MC-Loss) is designed to efficiently supervise the learning of the network. Extensive experiments were conducted on Oxford Radar RobotCar (ORR) dataset. The experimental results show that the proposed algorithm has excellent performance in both foggy and clear weather. Especially in foggy weather, the proposed TransFusion achieves 85.31mAP, outperforming all other competing approaches. The demo is available at: https://youtu.be/ ugjIYHLgn98.
引用
收藏
页码:10652 / 10666
页数:15
相关论文
共 50 条
  • [31] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
    Yingjie Wang
    Qiuyu Mao
    Hanqi Zhu
    Jiajun Deng
    Yu Zhang
    Jianmin Ji
    Houqiang Li
    Yanyong Zhang
    International Journal of Computer Vision, 2023, 131 : 2122 - 2152
  • [32] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [33] VPC-VoxelNet: multi-modal fusion 3D object detection networks based on virtual point clouds
    Zhang, Qiang
    Shi, Qin
    Cheng, Teng
    Zhang, Junning
    Chen, Jiong
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2025, 14 (01)
  • [34] PPF-Det: Point-Pixel Fusion for Multi-Modal 3D Object Detection
    Xie, Guotao
    Chen, Zhiyuan
    Gao, Ming
    Hu, Manjiang
    Qin, Xiaohui
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (06) : 5598 - 5611
  • [35] Teaching robots to do object assembly using multi-modal 3D vision
    Wan, Weiwei
    Lu, Feng
    Wu, Zepei
    Harada, Kensuke
    NEUROCOMPUTING, 2017, 259 : 85 - 93
  • [36] Multi-modal object detection via transformer network
    Liu, Wenbing
    Wang, Haibo
    Gao, Quanxue
    Zhu, Zhaorui
    IET IMAGE PROCESSING, 2023, 17 (12) : 3541 - 3550
  • [37] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
    Chen, Zehui
    Li, Zhenyu
    Zhang, Shiquan
    Fang, Liangji
    Jiang, Qinhong
    Zhao, Feng
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
  • [38] Improving Deep Multi-modal 3D Object Detection for Autonomous Driving
    Khamsehashari, Razieh
    Schill, Kerstin
    2021 7TH INTERNATIONAL CONFERENCE ON AUTOMATION, ROBOTICS AND APPLICATIONS (ICARA 2021), 2021, : 263 - 267
  • [39] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy
    Wang, Li
    Zhang, Xinyu
    Song, Ziying
    Bi, Jiangfeng
    Zhang, Guoxin
    Wei, Haiyue
    Tang, Liyao
    Yang, Lei
    Li, Jun
    Jia, Caiyan
    Zhao, Lijun
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (07): : 3781 - 3798
  • [40] Robust 3D Semantic Segmentation Based on Multi-Phase Multi-Modal Fusion for Intelligent Vehicles
    Ni, Peizhou
    Li, Xu
    Xu, Wang
    Kong, Dong
    Hu, Yue
    Wei, Kun
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1602 - 1614