Fusion4DAL: Offline Multi-modal 3D Object Detection for 4D Auto-labeling

被引：0

作者：

Yang, Zhiyuan ^{[1
]}

Wang, Xuekuan ^{[2
]}

Zhang, Wei ^{[2
]}

Tan, Xiao ^{[2
]}

Lu, Jincheng ^{[2
]}

Wang, Jingdong ^{[2
]}

Ding, Errui ^{[2
]}

Zhao, Cairong ^{[1
]}

机构：

[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China

[2] Baidu Inc, Dept Comp Vis Technol VIS, Beijing 100085, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2025年

关键词：

Offline 3D object detection; Multi-modal mixed feature fusion module; Global point attention; Virtual point loss;

D O I：

10.1007/s11263-025-02370-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Integrating LiDAR and camera information has been a widely adopted approach for 3D object detection in autonomous driving. Nevertheless, the unexplored potential of multi-modal fusion remains in the realm of offline 4D detection. We experimentally find that the root lies in two reasons: (1) the sparsity of point clouds poses a challenge in extracting long-term image features and thereby results in information loss. (2) some of the LiDAR points may be obstructed in the image, leading to incorrect image features. To tackle these problems, we first propose a simple yet effective offline multi-modal 3D object detection method, named Fusion4DAL, for 4D auto-labeling with long-term multi-modal sequences. Specifically, in order to address the sparsity of points within objects, we propose a multi-modal mixed feature fusion module (MMFF). In the MMFF module, we introduce virtual points based on a dense 3D grid and combine them with real LiDAR points. The mixed points are then utilized to extract dense point-level image features, thereby enhancing multi-modal feature fusion without being constrained by the sparse real LiDAR points. As to the obstructed LiDAR points, we leverage the occlusion relationship among objects to ensure depth consistency between LiDAR points and their corresponding depth feature maps, thus filtering out erroneous image features. In addition, we define a virtual point loss (VP Loss) to distinguish different types of mixed points and preserve the geometric shape of objects. Furthermore, in order to promote long-term receptive field and capture finer-grained features, we propose a global point attention decoder with a box-level self-attention module and a global point attention module. Finally, comprehensive experiments show that Fusion4DAL outperforms state-of-the-art offline 3D detection methods on nuScenes and Waymo dataset.

引用

页数：19

共 50 条

[31] PPF-Det: Point-Pixel Fusion for Multi-Modal 3D Object Detection
Xie, Guotao
Chen, Zhiyuan
Gao, Ming
Hu, Manjiang
Qin, Xiaohui
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (06) : 5598 - 5611
[32] InterFusion: Interaction-based 4D Radar and LiDAR Fusion for 3D Object Detection
Wang, Li
Zhang, Xinyu
Xv, Baowei
Zhang, Jinzhao
Fu, Rong
Wang, Xiaoyu
Zhu, Lei
Ren, Haibing
Lu, Pingping
Li, Jun
Liu, Huaping
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 12247 - 12253
[33] Multi-modal 3D object detection by 2D-guided precision anchor proposal and multi-layer fusion
Wu, Yi
Jiang, Xiaoyan
Fang, Zhijun
Gao, Yongbin
Fujita, Hamido
APPLIED SOFT COMPUTING, 2021, 108
[34] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
Chen, Zehui
Li, Zhenyu
Zhang, Shiquan
Fang, Liangji
Jiang, Qinhong
Zhao, Feng
COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
[35] Improving Deep Multi-modal 3D Object Detection for Autonomous Driving
Khamsehashari, Razieh
Schill, Kerstin
2021 7TH INTERNATIONAL CONFERENCE ON AUTOMATION, ROBOTICS AND APPLICATIONS (ICARA 2021), 2021, : 263 - 267
[36] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy
Wang, Li
Zhang, Xinyu
Song, Ziying
Bi, Jiangfeng
Zhang, Guoxin
Wei, Haiyue
Tang, Liyao
Yang, Lei
Li, Jun
Jia, Caiyan
Zhao, Lijun
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (07): : 3781 - 3798
[37] ActiveAnno3D-An Active Learning Framework for Multi-Modal 3D Object Detection
Ghita, Ahmed
Antoniussen, Bjork
Zimmer, Walter
Greer, Ross
Cress, Christian
Mogelmose, Andreas
Trivedi, Mohan M.
Knoll, Alois C.
2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1699 - 1706
[38] SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection
Zhao, Haimei
Zhang, Qiming
Zhao, Shanshan
Chen, Zhe
Zhang, Jing
Tao, Dacheng
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7460 - 7468
[39] SGDet3D: Semantics and Geometry Fusion for 3D Object Detection Using 4D Radar and Camera
Bai, Xiaokai
Yu, Zhu
Zheng, Lianqing
Zhang, Xiaohan
Zhou, Zili
Zhang, Xue
Wang, Fang
Bai, Jie
Shen, Hui-Liang
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 828 - 835
[40] EPNet plus plus : Cascade Bi-Directional Fusion for Multi-Modal 3D Object Detection
Liu, Zhe
Huang, Tengteng
Li, Bingling
Chen, Xiwu
Wang, Xi
Bai, Xiang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8324 - 8341

← 1 2 3 4 5 →