Fusion4DAL: Offline Multi-modal 3D Object Detection for 4D Auto-labeling

被引:0
|
作者
Yang, Zhiyuan [1 ]
Wang, Xuekuan [2 ]
Zhang, Wei [2 ]
Tan, Xiao [2 ]
Lu, Jincheng [2 ]
Wang, Jingdong [2 ]
Ding, Errui [2 ]
Zhao, Cairong [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Baidu Inc, Dept Comp Vis Technol VIS, Beijing 100085, Peoples R China
关键词
Offline 3D object detection; Multi-modal mixed feature fusion module; Global point attention; Virtual point loss;
D O I
10.1007/s11263-025-02370-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Integrating LiDAR and camera information has been a widely adopted approach for 3D object detection in autonomous driving. Nevertheless, the unexplored potential of multi-modal fusion remains in the realm of offline 4D detection. We experimentally find that the root lies in two reasons: (1) the sparsity of point clouds poses a challenge in extracting long-term image features and thereby results in information loss. (2) some of the LiDAR points may be obstructed in the image, leading to incorrect image features. To tackle these problems, we first propose a simple yet effective offline multi-modal 3D object detection method, named Fusion4DAL, for 4D auto-labeling with long-term multi-modal sequences. Specifically, in order to address the sparsity of points within objects, we propose a multi-modal mixed feature fusion module (MMFF). In the MMFF module, we introduce virtual points based on a dense 3D grid and combine them with real LiDAR points. The mixed points are then utilized to extract dense point-level image features, thereby enhancing multi-modal feature fusion without being constrained by the sparse real LiDAR points. As to the obstructed LiDAR points, we leverage the occlusion relationship among objects to ensure depth consistency between LiDAR points and their corresponding depth feature maps, thus filtering out erroneous image features. In addition, we define a virtual point loss (VP Loss) to distinguish different types of mixed points and preserve the geometric shape of objects. Furthermore, in order to promote long-term receptive field and capture finer-grained features, we propose a global point attention decoder with a box-level self-attention module and a global point attention module. Finally, comprehensive experiments show that Fusion4DAL outperforms state-of-the-art offline 3D detection methods on nuScenes and Waymo dataset.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] PPF-Det: Point-Pixel Fusion for Multi-Modal 3D Object Detection
    Xie, Guotao
    Chen, Zhiyuan
    Gao, Ming
    Hu, Manjiang
    Qin, Xiaohui
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (06) : 5598 - 5611
  • [32] InterFusion: Interaction-based 4D Radar and LiDAR Fusion for 3D Object Detection
    Wang, Li
    Zhang, Xinyu
    Xv, Baowei
    Zhang, Jinzhao
    Fu, Rong
    Wang, Xiaoyu
    Zhu, Lei
    Ren, Haibing
    Lu, Pingping
    Li, Jun
    Liu, Huaping
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 12247 - 12253
  • [33] Multi-modal 3D object detection by 2D-guided precision anchor proposal and multi-layer fusion
    Wu, Yi
    Jiang, Xiaoyan
    Fang, Zhijun
    Gao, Yongbin
    Fujita, Hamido
    APPLIED SOFT COMPUTING, 2021, 108
  • [34] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
    Chen, Zehui
    Li, Zhenyu
    Zhang, Shiquan
    Fang, Liangji
    Jiang, Qinhong
    Zhao, Feng
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
  • [35] Improving Deep Multi-modal 3D Object Detection for Autonomous Driving
    Khamsehashari, Razieh
    Schill, Kerstin
    2021 7TH INTERNATIONAL CONFERENCE ON AUTOMATION, ROBOTICS AND APPLICATIONS (ICARA 2021), 2021, : 263 - 267
  • [36] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy
    Wang, Li
    Zhang, Xinyu
    Song, Ziying
    Bi, Jiangfeng
    Zhang, Guoxin
    Wei, Haiyue
    Tang, Liyao
    Yang, Lei
    Li, Jun
    Jia, Caiyan
    Zhao, Lijun
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (07): : 3781 - 3798
  • [37] ActiveAnno3D-An Active Learning Framework for Multi-Modal 3D Object Detection
    Ghita, Ahmed
    Antoniussen, Bjork
    Zimmer, Walter
    Greer, Ross
    Cress, Christian
    Mogelmose, Andreas
    Trivedi, Mohan M.
    Knoll, Alois C.
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1699 - 1706
  • [38] SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection
    Zhao, Haimei
    Zhang, Qiming
    Zhao, Shanshan
    Chen, Zhe
    Zhang, Jing
    Tao, Dacheng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7460 - 7468
  • [39] SGDet3D: Semantics and Geometry Fusion for 3D Object Detection Using 4D Radar and Camera
    Bai, Xiaokai
    Yu, Zhu
    Zheng, Lianqing
    Zhang, Xiaohan
    Zhou, Zili
    Zhang, Xue
    Wang, Fang
    Bai, Jie
    Shen, Hui-Liang
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 828 - 835
  • [40] EPNet plus plus : Cascade Bi-Directional Fusion for Multi-Modal 3D Object Detection
    Liu, Zhe
    Huang, Tengteng
    Li, Bingling
    Chen, Xiwu
    Wang, Xi
    Bai, Xiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8324 - 8341