TransFusion: Multi-Modal Robust Fusion for 3D Object Detection in Foggy Weather Based on Spatial Vision Transformer

被引：0

作者：

Zhang, Cheng ^{[1
]}

Wang, Hai ^{[1
]}

Cai, Yingfeng ^{[2
]}

Chen, Long ^{[2
]}

Li, Yicheng ^{[2
]}

机构：

[1] Jiangsu Univ, Sch Automot & Traff Engn, Zhenjiang 212013, Peoples R China

[2] Jiangsu Univ, Automot Engn Res Inst, Zhenjiang 212013, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 09期

基金：

中国国家自然科学基金;

关键词：

3D object detection; multi-modal data fusion; intelligent vehicle; attention mechanism; spatial vision transformer; temporal-spatial memory fusion; radar; LiDAR;

D O I：

10.1109/TITS.2024.3420432

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

A practical approach to realizing the comprehensive perception of the surrounding environment is to use a multi-modal fusion method based on various types of vehicular sensors. In clear weather, the camera and LiDAR can provide high-resolution images and point clouds that can be utilized for 3D object detection. However, in foggy weather, the propagation of light is affected by the fog in the air. Consequently, both images and point clouds become distorted to varying degrees. Thus, it is challenging to implement accurate detection in adverse weather conditions. Compared to cameras and LiDAR, Radar possesses strong penetrating power and is not affected by fog. Therefore, this paper proposes a novel two-stage detection framework called "TransFusion", which leverages LiDAR and Radar fusion to solve the problem of environment perception in foggy weather. The proposed framework is composed of Multi-modal Rotate Region Proposal Network (MM-RRPN) and Multi-modal Refine Network (MM-RFN). Specifically, Spatial Vision Transformer (SVT) and Cross-Modal Attention Mechanism (CMAM) are introduced in the MM-RRPN to improve the robustness of the algorithm in foggy weather. Furthermore, Temporal-Spatial Memory Fusion (TSMF) module in MM-RFN is employed to fuse the spatial-temporal prior information. In addition, the Multi-branches Combination Loss function (MC-Loss) is designed to efficiently supervise the learning of the network. Extensive experiments were conducted on Oxford Radar RobotCar (ORR) dataset. The experimental results show that the proposed algorithm has excellent performance in both foggy and clear weather. Especially in foggy weather, the proposed TransFusion achieves 85.31mAP, outperforming all other competing approaches. The demo is available at: https://youtu.be/ ugjIYHLgn98.

引用

页码：10652 / 10666

页数：15

共 50 条

[1] Research on 3D Object Detection Method Based on Multi-Modal Fusion
Tian, Feng
Zong, Neili
Liu, Fang
Lu, Yuanyuan
Liu, Chao
Jiang, Wenwen
Zhao, Ling
Han, Yuxiang
Computer Engineering and Applications, 2024, 60 (13) : 113 - 123
[2] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
Liu, Zhanwen
Cheng, Juanru
Fan, Jin
Lin, Shan
Wang, Yang
Zhao, Xiangmo
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
[3] ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion
Cai, Qi
Pan, Yingwei
Yao, Ting
Ngo, Chong-Wah
Mei, Tao
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18021 - 18030
[4] Multi-modal Data Analysis and Fusion for Robust Object Detection in 2D/3D Sensing
Schierl, Jonathan
Graehling, Quinn
Aspiras, Theus
Asari, Vijay
Van Rynbach, Andre
Rabb, Dave
2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
[5] Deep multi-scale and multi-modal fusion for 3D object detection
Guo, Rui
Li, Deng
Han, Yahong
PATTERN RECOGNITION LETTERS, 2021, 151 : 236 - 242
[6] Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
Li, Xin
Shi, Botian
Hou, Yuenan
Wu, Xingjiao
Ma, Tianlong
Li, Yikang
He, Liang
COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 691 - 707
[7] Multi-modal feature fusion for 3D object detection in the production workshop
Hou, Rui
Chen, Guangzhu
Han, Yinhe
Tang, Zaizuo
Ru, Qingjun
APPLIED SOFT COMPUTING, 2022, 115
[8] Deformable Feature Fusion Network for Multi-Modal 3D Object Detection
Guo, Kun
Gan, Tong
Ding, Zhao
Ling, Qiang
2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 363 - 367
[9] Multi-modal information fusion for LiDAR-based 3D object detection framework
Ruixin Ma
Yong Yin
Jing Chen
Rihao Chang
Multimedia Tools and Applications, 2024, 83 : 7995 - 8012
[10] Multi-modal information fusion for LiDAR-based 3D object detection framework
Ma, Ruixin
Yin, Yong
Chen, Jing
Chang, Rihao
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7995 - 8012

← 1 2 3 4 5 →