Fusion4DAL: Offline Multi-modal 3D Object Detection for 4D Auto-labeling

被引:0
|
作者
Yang, Zhiyuan [1 ]
Wang, Xuekuan [2 ]
Zhang, Wei [2 ]
Tan, Xiao [2 ]
Lu, Jincheng [2 ]
Wang, Jingdong [2 ]
Ding, Errui [2 ]
Zhao, Cairong [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Baidu Inc, Dept Comp Vis Technol VIS, Beijing 100085, Peoples R China
关键词
Offline 3D object detection; Multi-modal mixed feature fusion module; Global point attention; Virtual point loss;
D O I
10.1007/s11263-025-02370-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Integrating LiDAR and camera information has been a widely adopted approach for 3D object detection in autonomous driving. Nevertheless, the unexplored potential of multi-modal fusion remains in the realm of offline 4D detection. We experimentally find that the root lies in two reasons: (1) the sparsity of point clouds poses a challenge in extracting long-term image features and thereby results in information loss. (2) some of the LiDAR points may be obstructed in the image, leading to incorrect image features. To tackle these problems, we first propose a simple yet effective offline multi-modal 3D object detection method, named Fusion4DAL, for 4D auto-labeling with long-term multi-modal sequences. Specifically, in order to address the sparsity of points within objects, we propose a multi-modal mixed feature fusion module (MMFF). In the MMFF module, we introduce virtual points based on a dense 3D grid and combine them with real LiDAR points. The mixed points are then utilized to extract dense point-level image features, thereby enhancing multi-modal feature fusion without being constrained by the sparse real LiDAR points. As to the obstructed LiDAR points, we leverage the occlusion relationship among objects to ensure depth consistency between LiDAR points and their corresponding depth feature maps, thus filtering out erroneous image features. In addition, we define a virtual point loss (VP Loss) to distinguish different types of mixed points and preserve the geometric shape of objects. Furthermore, in order to promote long-term receptive field and capture finer-grained features, we propose a global point attention decoder with a box-level self-attention module and a global point attention module. Finally, comprehensive experiments show that Fusion4DAL outperforms state-of-the-art offline 3D detection methods on nuScenes and Waymo dataset.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ruixin Ma
    Yong Yin
    Jing Chen
    Rihao Chang
    Multimedia Tools and Applications, 2024, 83 : 7995 - 8012
  • [22] Dual-domain deformable feature fusion for multi-modal 3D object detection
    Wang, Shihao
    Deng, Tao
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)
  • [23] DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
    Li, Yingwei
    Yu, Adams Wei
    Meng, Tianjian
    Caine, Ben
    Ngiam, Jiquan
    Peng, Daiyi
    Shen, Junyang
    Lu, Yifeng
    Zhou, Denny
    Le, Quoc, V
    Yuille, Alan
    Tan, Mingxing
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17161 - 17170
  • [24] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ma, Ruixin
    Yin, Yong
    Chen, Jing
    Chang, Rihao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7995 - 8012
  • [25] Quantization to accelerate inference in multi-modal 3D object detection
    Geerhart, Billy
    Dasari, Venkat R.
    Rapp, Brian
    Wang, Peng
    Wang, Ju
    Payne, Christopher X.
    DISRUPTIVE TECHNOLOGIES IN INFORMATION SCIENCES VIII, 2024, 13058
  • [26] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
    Wang, Yingjie
    Mao, Qiuyu
    Zhu, Hanqi
    Deng, Jiajun
    Zhang, Yu
    Ji, Jianmin
    Li, Houqiang
    Zhang, Yanyong
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 2122 - 2152
  • [27] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
    Yingjie Wang
    Qiuyu Mao
    Hanqi Zhu
    Jiajun Deng
    Yu Zhang
    Jianmin Ji
    Houqiang Li
    Yanyong Zhang
    International Journal of Computer Vision, 2023, 131 : 2122 - 2152
  • [28] LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection
    Zeng, Yihan
    Zhang, Da
    Wang, Chunwei
    Miao, Zhenwei
    Liu, Ting
    Zhan, Xin
    Hao, Dayang
    Ma, Chao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17151 - 17160
  • [29] A Multi-Modal Fusion-Based 3D Multi-Object Tracking Framework With Joint Detection
    Wang, Xiyang
    Fu, Chunyun
    He, Jiawei
    Huang, Mingguang
    Meng, Ting
    Zhang, Siyu
    Zhou, Hangning
    Xu, Ziyao
    Zhang, Chi
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 532 - 539
  • [30] Generating Adversarial Point Clouds on Multi-modal Fusion Based 3D Object Detection Model
    Wang, Huiying
    Shen, Huixin
    Zhang, Boyang
    Wen, Yu
    Meng, Dan
    INFORMATION AND COMMUNICATIONS SECURITY (ICICS 2021), PT I, 2021, 12918 : 187 - 203