Fusion4DAL: Offline Multi-modal 3D Object Detection for 4D Auto-labeling

被引：0

作者：

Yang, Zhiyuan ^{[1
]}

Wang, Xuekuan ^{[2
]}

Zhang, Wei ^{[2
]}

Tan, Xiao ^{[2
]}

Lu, Jincheng ^{[2
]}

Wang, Jingdong ^{[2
]}

Ding, Errui ^{[2
]}

Zhao, Cairong ^{[1
]}

机构：

[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China

[2] Baidu Inc, Dept Comp Vis Technol VIS, Beijing 100085, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2025年

关键词：

Offline 3D object detection; Multi-modal mixed feature fusion module; Global point attention; Virtual point loss;

D O I：

10.1007/s11263-025-02370-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Integrating LiDAR and camera information has been a widely adopted approach for 3D object detection in autonomous driving. Nevertheless, the unexplored potential of multi-modal fusion remains in the realm of offline 4D detection. We experimentally find that the root lies in two reasons: (1) the sparsity of point clouds poses a challenge in extracting long-term image features and thereby results in information loss. (2) some of the LiDAR points may be obstructed in the image, leading to incorrect image features. To tackle these problems, we first propose a simple yet effective offline multi-modal 3D object detection method, named Fusion4DAL, for 4D auto-labeling with long-term multi-modal sequences. Specifically, in order to address the sparsity of points within objects, we propose a multi-modal mixed feature fusion module (MMFF). In the MMFF module, we introduce virtual points based on a dense 3D grid and combine them with real LiDAR points. The mixed points are then utilized to extract dense point-level image features, thereby enhancing multi-modal feature fusion without being constrained by the sparse real LiDAR points. As to the obstructed LiDAR points, we leverage the occlusion relationship among objects to ensure depth consistency between LiDAR points and their corresponding depth feature maps, thus filtering out erroneous image features. In addition, we define a virtual point loss (VP Loss) to distinguish different types of mixed points and preserve the geometric shape of objects. Furthermore, in order to promote long-term receptive field and capture finer-grained features, we propose a global point attention decoder with a box-level self-attention module and a global point attention module. Finally, comprehensive experiments show that Fusion4DAL outperforms state-of-the-art offline 3D detection methods on nuScenes and Waymo dataset.

引用

页数：19

共 50 条

[21] Multi-modal information fusion for LiDAR-based 3D object detection framework
Ruixin Ma
Yong Yin
Jing Chen
Rihao Chang
Multimedia Tools and Applications, 2024, 83 : 7995 - 8012
[22] Dual-domain deformable feature fusion for multi-modal 3D object detection
Wang, Shihao
Deng, Tao
JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)
[23] DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
Li, Yingwei
Yu, Adams Wei
Meng, Tianjian
Caine, Ben
Ngiam, Jiquan
Peng, Daiyi
Shen, Junyang
Lu, Yifeng
Zhou, Denny
Le, Quoc, V
Yuille, Alan
Tan, Mingxing
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17161 - 17170
[24] Multi-modal information fusion for LiDAR-based 3D object detection framework
Ma, Ruixin
Yin, Yong
Chen, Jing
Chang, Rihao
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7995 - 8012
[25] Quantization to accelerate inference in multi-modal 3D object detection
Geerhart, Billy
Dasari, Venkat R.
Rapp, Brian
Wang, Peng
Wang, Ju
Payne, Christopher X.
DISRUPTIVE TECHNOLOGIES IN INFORMATION SCIENCES VIII, 2024, 13058
[26] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
Wang, Yingjie
Mao, Qiuyu
Zhu, Hanqi
Deng, Jiajun
Zhang, Yu
Ji, Jianmin
Li, Houqiang
Zhang, Yanyong
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 2122 - 2152
[27] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
Yingjie Wang
Qiuyu Mao
Hanqi Zhu
Jiajun Deng
Yu Zhang
Jianmin Ji
Houqiang Li
Yanyong Zhang
International Journal of Computer Vision, 2023, 131 : 2122 - 2152
[28] LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection
Zeng, Yihan
Zhang, Da
Wang, Chunwei
Miao, Zhenwei
Liu, Ting
Zhan, Xin
Hao, Dayang
Ma, Chao
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17151 - 17160
[29] A Multi-Modal Fusion-Based 3D Multi-Object Tracking Framework With Joint Detection
Wang, Xiyang
Fu, Chunyun
He, Jiawei
Huang, Mingguang
Meng, Ting
Zhang, Siyu
Zhou, Hangning
Xu, Ziyao
Zhang, Chi
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 532 - 539
[30] Generating Adversarial Point Clouds on Multi-modal Fusion Based 3D Object Detection Model
Wang, Huiying
Shen, Huixin
Zhang, Boyang
Wen, Yu
Meng, Dan
INFORMATION AND COMMUNICATIONS SECURITY (ICICS 2021), PT I, 2021, 12918 : 187 - 203

← 1 2 3 4 5 →