Multi-Modal 3D Object Detection by Box Matching

被引:0
|
作者
Liu, Zhe [1 ]
Ye, Xiaoqing [2 ]
Zou, Zhikang [2 ]
He, Xinwei [3 ]
Tan, Xiao [2 ]
Ding, Errui [2 ]
Wang, Jingdong [2 ]
Bai, Xiang [4 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China
[2] Baidu Inc, Beijing 100085, Peoples R China
[3] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Software, Wuhan 430074, Peoples R China
关键词
Three-dimensional displays; Laser radar; Feature extraction; Cameras; Sensors; Proposals; Object detection; Multi-modal; 3D object detection; feature alignment; box matching;
D O I
10.1109/TITS.2024.3453963
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D point clouds and RGB images. However, such an assumption is not reliable in a real-world self-driving system, as the alignment between different modalities is easily affected by asynchronous sensors and disturbed sensor placement. We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection, which provides an alternative way for cross-modal feature alignment by learning the correspondence at the bounding box level to free up the dependency of calibration during inference. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combining their ROI features. Extensive experiments on the nuScenes dataset demonstrate that our method is much more robust in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods. We hope that our could provide an available solution to dealing with these challenging cases for safety in real autonomous driving scenarios.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Multi-Modal Streaming 3D Object Detection
    Abdelfattah, Mazen
    Yuan, Kaiwen
    Wang, Z. Jane
    Ward, Rabab
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6163 - 6170
  • [2] GraphAlign: Enhancing Accurate Feature Alignment by Graph matching for Multi-Modal 3D Object Detection
    Song, Ziying
    Wei, Haiyue
    Bai, Lin
    Yang, Lei
    Jia, Caiyan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3335 - 3346
  • [3] Quantization to accelerate inference in multi-modal 3D object detection
    Geerhart, Billy
    Dasari, Venkat R.
    Rapp, Brian
    Wang, Peng
    Wang, Ju
    Payne, Christopher X.
    [J]. DISRUPTIVE TECHNOLOGIES IN INFORMATION SCIENCES VIII, 2024, 13058
  • [4] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
    Wang, Yingjie
    Mao, Qiuyu
    Zhu, Hanqi
    Deng, Jiajun
    Zhang, Yu
    Ji, Jianmin
    Li, Houqiang
    Zhang, Yanyong
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 2122 - 2152
  • [5] Multi-Modal 3D Object Detection in Autonomous Driving: A Survey
    Yingjie Wang
    Qiuyu Mao
    Hanqi Zhu
    Jiajun Deng
    Yu Zhang
    Jianmin Ji
    Houqiang Li
    Yanyong Zhang
    [J]. International Journal of Computer Vision, 2023, 131 : 2122 - 2152
  • [6] ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion
    Cai, Qi
    Pan, Yingwei
    Yao, Ting
    Ngo, Chong-Wah
    Mei, Tao
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18021 - 18030
  • [7] Deep multi-scale and multi-modal fusion for 3D object detection
    Guo, Rui
    Li, Deng
    Han, Yahong
    [J]. PATTERN RECOGNITION LETTERS, 2021, 151 : 236 - 242
  • [8] GraphAlign plus plus : An Accurate Feature Alignment by Graph Matching for Multi-Modal 3D Object Detection
    Song, Ziying
    Jia, Caiyan
    Yang, Lei
    Wei, Haiyue
    Liu, Lin
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2619 - 2632
  • [9] Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
    Li, Xin
    Shi, Botian
    Hou, Yuenan
    Wu, Xingjiao
    Ma, Tianlong
    Li, Yikang
    He, Liang
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 691 - 707
  • [10] Multi-modal feature fusion for 3D object detection in the production workshop
    Hou, Rui
    Chen, Guangzhu
    Han, Yinhe
    Tang, Zaizuo
    Ru, Qingjun
    [J]. APPLIED SOFT COMPUTING, 2022, 115