VPFNet: Improving 3D Object Detection With Virtual Point Based LiDAR and Stereo Data Fusion

被引:71
|
作者
Zhu, Hanqi [1 ]
Deng, Jiajun [2 ]
Zhang, Yu [1 ]
Ji, Jianmin [1 ]
Mao, Qiuyu [1 ]
Li, Houqiang [2 ]
Zhang, Yanyong [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Peoples R China
[2] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230027, Peoples R China
关键词
3D object detection; multiple sensors; point clouds; stereo images; R-CNN;
D O I
10.1109/TMM.2022.3189778
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It has been well recognized that fusing the complementary information from depth-aware LiDAR point clouds and semantic-rich stereo images would benefit 3D object detection. Nevertheless, it is non-trivial to explore the inherently unnatural interaction between sparse 3D points and dense 2D pixels. To ease this difficulty, the recent approaches generally project the 3D points onto the 2D image plane to sample the image data and then aggregate the data at the points. However, these approaches often suffer from the mismatch between the resolution of point clouds and RGB images, leading to sub-optimal performance. Specifically, taking the sparse points as the multi-modal data aggregation locations causes severe information loss for high-resolution images, which in turn undermines the effectiveness of multi-sensor fusion. In this paper, we present VPFNet -a new architecture that cleverly aligns and aggregates the point cloud and image data at the "virtual" points. Particularly, with their density lying between that of the 3D points and 2D pixels, the virtual points can nicely bridge the resolution gap between the two sensors, and thus preserve more information for processing. Moreover, we also investigate the data augmentation techniques that can be applied to both point clouds and RGB images, as the data augmentation has made non-negligible contribution towards 3D object detectors to date. We have conducted extensive experiments on KITTI dataset, and have observed good performance compared to the state-of-the-art methods. Remarkably, our VPFNet achieves 83.21% moderate $AP_{3D}$ and 91.86% moderate $AP_{BEV}$ on the KITTI test set. The network design also takes computation efficiency into consideration - we can achieve a FPS of 15 on a single NVIDIA RTX 2080Ti GPU.
引用
收藏
页码:5291 / 5304
页数:14
相关论文
共 50 条
  • [41] 3D Mask-Based Shape Loss Function for LIDAR Data for Improved 3D Object Detection
    Park, R.
    Lee, C.
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON VEHICLE TECHNOLOGY AND INTELLIGENT TRANSPORT SYSTEMS, VEHITS 2023, 2023, : 305 - 312
  • [42] Stereo vision-based 3D pointer for virtual object interaction
    Juarez-Salazar, Rigoberto
    Esquivel-Hernandez, Sofia
    Zheng, Juan
    Diaz-Ramirez, Victor H.
    OPTICS AND PHOTONICS FOR INFORMATION PROCESSING XVI, 2022, 12225
  • [43] VPC-VoxelNet: multi-modal fusion 3D object detection networks based on virtual point clouds
    Zhang, Qiang
    Shi, Qin
    Cheng, Teng
    Zhang, Junning
    Chen, Jiong
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2025, 14 (01)
  • [44] ACDet: Attentive Cross-view Fusion for LiDAR-based 3D Object Detection
    Xu, Jiaolong
    Wang, Guojun
    Zhang, Xiao
    Wan, Guowei
    2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 74 - 83
  • [45] Multiattention Mechanism 3D Object Detection Algorithm Based on RGB and LiDAR Fusion for Intelligent Driving
    Zhang, Xiucai
    He, Lei
    Chen, Junyi
    Wang, Baoyun
    Wang, Yuhai
    Zhou, Yuanle
    SENSORS, 2023, 23 (21)
  • [46] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ruixin Ma
    Yong Yin
    Jing Chen
    Rihao Chang
    Multimedia Tools and Applications, 2024, 83 : 7995 - 8012
  • [47] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ma, Ruixin
    Yin, Yong
    Chen, Jing
    Chang, Rihao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7995 - 8012
  • [48] An object boundary detection system based on a 3D stereo monitor
    Zhang, Shuqun
    Furia, Bryan
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXVII, 2014, 9217
  • [49] RGB-LiDAR fusion for accurate 2D and 3D object detection
    Mousa-Pasandi, Morteza
    Liu, Tianran
    Massoud, Yahya
    Laganiere, Robert
    MACHINE VISION AND APPLICATIONS, 2023, 34 (05)
  • [50] RGB-LiDAR fusion for accurate 2D and 3D object detection
    Morteza Mousa-Pasandi
    Tianran Liu
    Yahya Massoud
    Robert Laganière
    Machine Vision and Applications, 2023, 34