End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds

被引:0
|
作者
Zhou, Yin [1 ]
Sun, Pei [1 ]
Zhang, Yu [1 ]
Anguelov, Dragomir [1 ]
Gao, Jiyang [1 ]
Ouyang, Tom [1 ]
Guo, James [1 ]
Ngiam, Jiquan [2 ]
Vasudevan, Vijay [2 ]
机构
[1] Waymo LLC, Mountain View, CA 94043 USA
[2] Google Brain, Mountain View, CA USA
来源
关键词
Object Detection; Deep Learning; Sensor Fusion;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent work on 3D object detection advocates point cloud voxelization in birds-eye view, where objects preserve their physical dimensions and are naturally separable. When represented in this view, however, point clouds are sparse and have highly variable point density, which may cause detectors difficulties in detecting distant or small objects (pedestrians, traffic signs, etc.). On the other hand, perspective view provides dense observations, which could allow more favorable feature encoding for such cases. In this paper, we aim to synergize the birds-eye view and the perspective view and propose a novel end-to-end multi-view fusion (MVF) algorithm, which can effectively learn to utilize the complementary information from both. Specifically, we introduce dynamic voxelization, which has four merits compared to existing voxelization methods, i) removing the need of pre-allocating a tensor with fixed size; ii) overcoming the information loss due to stochastic point/voxel dropout; iii) yielding deterministic voxel embeddings and more stable detection outcomes; iv) establishing the bi-directional relationship between points and voxels, which potentially lays a natural foundation for cross-view feature fusion. By employing dynamic voxelization, the proposed feature fusion architecture enables each point to learn to fuse context information from different views. MVF operates on points and can be naturally extended to other approaches using LiDAR point clouds. We evaluate our MVF model extensively on the newly released Waymo Open Dataset and on the KITTI dataset and demonstrate that it significantly improves detection accuracy over the comparable single-view PointPillars baseline.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] End-to-end 3D object model retrieval by projecting the point cloud onto a unique discriminating 2D view
    Chen, Xuzhan
    Chen, Youping
    Najjaran, Homayoun
    NEUROCOMPUTING, 2020, 402 : 336 - 345
  • [32] BLPNet: An End-to-End Model Towards Voxelization Free 3D Object Detection
    Cui, Zhihao
    Zhang, Zhenhua
    2020 JOINT 9TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2020 4TH INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2020,
  • [33] High-Accuracy Mapping Design Based on Multi-view Images and 3D LiDAR Point Clouds
    Chen, Jian-Hong
    Lin, Guo-Han
    Yelamandala, Chitra Meghala
    Fan, Yu-Cheng
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2020, : 90 - 91
  • [34] An end-to-end model for multi-view scene text recognition
    Banerjee, Ayan
    Shivakumara, Palaiahnakote
    Bhattacharya, Saumik
    Pal, Umapada
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2024, 149
  • [35] End-to-end Learning of Multi-sensor 3D Tracking by Detection
    Frossard, Davi
    Urtasun, Raquel
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 635 - 642
  • [36] MLOD: A multi-view 3D object detection based on robust feature fusion method
    Deng, Jian
    Czarnecki, Krzysztof
    2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 279 - 284
  • [37] 3D object detection based on DST fusion multi-view fuzzy reasoning assignment
    Zhang C.-F.
    Li C.-W.-L.
    Zou Y.-Q.
    Jin N.
    Kongzhi yu Juece/Control and Decision, 2021, 36 (04): : 867 - 875
  • [38] PointGait: Boosting End-to-End 3D Gait Recognition with Point Clouds via Spatiotemporal Modeling
    Wang, Rui
    Shen, Chuanfu
    Fan, Chao
    Huang, George Q.
    Yu, Shiqi
    2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB, 2023,
  • [39] Multi-view semantic learning network for point cloud based 3D object detection
    Yang, Yongguang
    Chen, Feng
    Wu, Fei
    Zeng, Deliang
    Ji, Yi-mu
    Jing, Xiao-Yuan
    NEUROCOMPUTING, 2020, 397 (397) : 477 - 485
  • [40] 3DVSD: An end-to-end 3D convolutional object detection network for video smoke detection
    Huo, Yinuo
    Zhang, Qixing
    Zhang, Yongming
    Zhu, Jiping
    Wang, Jinjun
    FIRE SAFETY JOURNAL, 2022, 134