Fusion4DAL: Offline Multi-modal 3D Object Detection for 4D Auto-labeling

被引:0
|
作者
Yang, Zhiyuan [1 ]
Wang, Xuekuan [2 ]
Zhang, Wei [2 ]
Tan, Xiao [2 ]
Lu, Jincheng [2 ]
Wang, Jingdong [2 ]
Ding, Errui [2 ]
Zhao, Cairong [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Baidu Inc, Dept Comp Vis Technol VIS, Beijing 100085, Peoples R China
关键词
Offline 3D object detection; Multi-modal mixed feature fusion module; Global point attention; Virtual point loss;
D O I
10.1007/s11263-025-02370-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Integrating LiDAR and camera information has been a widely adopted approach for 3D object detection in autonomous driving. Nevertheless, the unexplored potential of multi-modal fusion remains in the realm of offline 4D detection. We experimentally find that the root lies in two reasons: (1) the sparsity of point clouds poses a challenge in extracting long-term image features and thereby results in information loss. (2) some of the LiDAR points may be obstructed in the image, leading to incorrect image features. To tackle these problems, we first propose a simple yet effective offline multi-modal 3D object detection method, named Fusion4DAL, for 4D auto-labeling with long-term multi-modal sequences. Specifically, in order to address the sparsity of points within objects, we propose a multi-modal mixed feature fusion module (MMFF). In the MMFF module, we introduce virtual points based on a dense 3D grid and combine them with real LiDAR points. The mixed points are then utilized to extract dense point-level image features, thereby enhancing multi-modal feature fusion without being constrained by the sparse real LiDAR points. As to the obstructed LiDAR points, we leverage the occlusion relationship among objects to ensure depth consistency between LiDAR points and their corresponding depth feature maps, thus filtering out erroneous image features. In addition, we define a virtual point loss (VP Loss) to distinguish different types of mixed points and preserve the geometric shape of objects. Furthermore, in order to promote long-term receptive field and capture finer-grained features, we propose a global point attention decoder with a box-level self-attention module and a global point attention module. Finally, comprehensive experiments show that Fusion4DAL outperforms state-of-the-art offline 3D detection methods on nuScenes and Waymo dataset.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Bridging the View Disparity Between Radar and Camera Features for Multi-Modal Fusion 3D Object Detection
    Zhou, Taohua
    Chen, Junjie
    Shi, Yining
    Jiang, Kun
    Yang, Mengmeng
    Yang, Diange
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (02): : 1523 - 1535
  • [42] MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection
    Beemelmanns, Till
    Zhang, Quan
    Geller, Christian
    Eckstein, Lutz
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 3255 - 3261
  • [43] Occlusion-guided multi-modal fusion for vehicle-infrastructure cooperative 3D object detection
    Chu, Huazhen
    Liu, Haizhuang
    Zhuo, Junbao
    Chen, Jiansheng
    Ma, Huimin
    PATTERN RECOGNITION, 2025, 157
  • [44] LXL: LiDAR Excluded Lean 3D Object Detection With 4D Imaging Radar and Camera Fusion
    Xiong, Weiyi
    Liu, Jianan
    Huang, Tao
    Han, Qing-Long
    Xia, Yuxuan
    Zhu, Bing
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 79 - 92
  • [45] Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions
    Chae, Yujeong
    Kim, Hyeonseong
    Yoon, Kuk-Jin
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15162 - 15172
  • [46] LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion
    Xiong, Weiyi
    Liu, Jianan
    Huang, Tao
    Han, Qing-Long
    Xia, Yuxuan
    Zhu, Bing
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 3142 - 3142
  • [47] MAFF-Net: Enhancing 3D Object Detection With 4D Radar via Multi-Assist Feature Fusion
    Bi, Xin
    Weng, Caien
    Tong, Panpan
    Fan, Baojie
    Eichberge, Arno
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (05): : 4284 - 4291
  • [48] Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization
    Rollo, Federico
    Raiola, Gennaro
    Zunino, Andrea
    Tsagarakis, Nikolaos
    Ajoudani, Arash
    2023 EUROPEAN CONFERENCE ON MOBILE ROBOTS, ECMR, 2023, : 90 - 97
  • [49] RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM
    Song, Ziying
    Zhang, Guoxing
    Liu, Lin
    Yang, Lei
    Xu, Shaoqing
    Jia, Caiyan
    Jia, Feiyang
    Wang, Li
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1272 - 1280
  • [50] Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
    Huang, Linyan
    Li, Zhiqi
    Sima, Chonghao
    Wang, Wenhai
    Wang, Jingdong
    Qiao, Yu
    Li, Hongyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,