End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds

被引：0

作者：

Zhou, Yin ^{[1
]}

Sun, Pei ^{[1
]}

Zhang, Yu ^{[1
]}

Anguelov, Dragomir ^{[1
]}

Gao, Jiyang ^{[1
]}

Ouyang, Tom ^{[1
]}

Guo, James ^{[1
]}

Ngiam, Jiquan ^{[2
]}

Vasudevan, Vijay ^{[2
]}

机构：

[1] Waymo LLC, Mountain View, CA 94043 USA

[2] Google Brain, Mountain View, CA USA

来源：

CONFERENCE ON ROBOT LEARNING, VOL 100 | 2019年 / 100卷

关键词：

Object Detection; Deep Learning; Sensor Fusion;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Recent work on 3D object detection advocates point cloud voxelization in birds-eye view, where objects preserve their physical dimensions and are naturally separable. When represented in this view, however, point clouds are sparse and have highly variable point density, which may cause detectors difficulties in detecting distant or small objects (pedestrians, traffic signs, etc.). On the other hand, perspective view provides dense observations, which could allow more favorable feature encoding for such cases. In this paper, we aim to synergize the birds-eye view and the perspective view and propose a novel end-to-end multi-view fusion (MVF) algorithm, which can effectively learn to utilize the complementary information from both. Specifically, we introduce dynamic voxelization, which has four merits compared to existing voxelization methods, i) removing the need of pre-allocating a tensor with fixed size; ii) overcoming the information loss due to stochastic point/voxel dropout; iii) yielding deterministic voxel embeddings and more stable detection outcomes; iv) establishing the bi-directional relationship between points and voxels, which potentially lays a natural foundation for cross-view feature fusion. By employing dynamic voxelization, the proposed feature fusion architecture enables each point to learn to fuse context information from different views. MVF operates on points and can be naturally extended to other approaches using LiDAR point clouds. We evaluate our MVF model extensively on the newly released Waymo Open Dataset and on the KITTI dataset and demonstrate that it significantly improves detection accuracy over the comparable single-view PointPillars baseline.

引用

页数：10

共 50 条

[31] End-to-end 3D object model retrieval by projecting the point cloud onto a unique discriminating 2D view
Chen, Xuzhan
Chen, Youping
Najjaran, Homayoun
NEUROCOMPUTING, 2020, 402 : 336 - 345
[32] BLPNet: An End-to-End Model Towards Voxelization Free 3D Object Detection
Cui, Zhihao
Zhang, Zhenhua
2020 JOINT 9TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2020 4TH INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2020,
[33] High-Accuracy Mapping Design Based on Multi-view Images and 3D LiDAR Point Clouds
Chen, Jian-Hong
Lin, Guo-Han
Yelamandala, Chitra Meghala
Fan, Yu-Cheng
2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2020, : 90 - 91
[34] An end-to-end model for multi-view scene text recognition
Banerjee, Ayan
Shivakumara, Palaiahnakote
Bhattacharya, Saumik
Pal, Umapada
Liu, Cheng-Lin
PATTERN RECOGNITION, 2024, 149
[35] End-to-end Learning of Multi-sensor 3D Tracking by Detection
Frossard, Davi
Urtasun, Raquel
2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 635 - 642
[36] MLOD: A multi-view 3D object detection based on robust feature fusion method
Deng, Jian
Czarnecki, Krzysztof
2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 279 - 284
[37] 3D object detection based on DST fusion multi-view fuzzy reasoning assignment
Zhang C.-F.
Li C.-W.-L.
Zou Y.-Q.
Jin N.
Kongzhi yu Juece/Control and Decision, 2021, 36 (04): : 867 - 875
[38] PointGait: Boosting End-to-End 3D Gait Recognition with Point Clouds via Spatiotemporal Modeling
Wang, Rui
Shen, Chuanfu
Fan, Chao
Huang, George Q.
Yu, Shiqi
2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB, 2023,
[39] Multi-view semantic learning network for point cloud based 3D object detection
Yang, Yongguang
Chen, Feng
Wu, Fei
Zeng, Deliang
Ji, Yi-mu
Jing, Xiao-Yuan
NEUROCOMPUTING, 2020, 397 (397) : 477 - 485
[40] 3DVSD: An end-to-end 3D convolutional object detection network for video smoke detection
Huo, Yinuo
Zhang, Qixing
Zhang, Yongming
Zhu, Jiping
Wang, Jinjun
FIRE SAFETY JOURNAL, 2022, 134

← 1 2 3 4 5 →