Virtual Sparse Convolution for Multimodal 3D Object Detection

被引：70

作者：

Wu, Hai ^{[1
]}

Wen, Chenglu ^{[1
]}

Shi, Shaoshuai ^{[2
]}

Li, Xin ^{[3
]}

Wang, Cheng ^{[1
]}

机构：

[1] Xiamen Univ, Xiamen, Peoples R China

[2] Max Planck Inst, Munich, Germany

[3] Texas A&M Univ, College Stn, TX 77843 USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52729.2023.02074

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, virtual/pseudo-point-based 3D object detection that seamlessly fuses RGB images and LiDAR data by depth completion has gained great attention. However, virtual points generated from an image are very dense, introducing a huge amount of redundant computation during detection. Meanwhile, noises brought by inaccurate depth completion significantly degrade detection precision. This paper proposes a fast yet effective backbone, termed VirConvNet, based on a new operator VirConv (Virtual Sparse Convolution), for virtual-point-based 3D object detection. VirConv consists of two key designs: (1) StVD (Stochastic Voxel Discard) and (2) NRConv (Noise-Resistant Sub-manifold Convolution). StVD alleviates the computation problem by discarding large amounts of nearby redundant voxels. NRConv tackles the noise problem by encoding voxel features in both 2D image and 3D LiDAR space. By integrating VirConv, we first develop an efficient pipeline VirConv-L based on an early fusion design. Then, we build a high-precision pipeline VirConv-T based on a transformed refinement scheme. Finally, we develop a semi-supervised pipeline VirConv-S based on a pseudo-label framework. On the KITTI car 3D detection test leaderboard, our VirConv-L achieves 85% AP with a fast running speed of 56ms. Our VirConv-T and VirConv-S attains a high-precision of 86.3% and 87.2% AP, and currently rank 2nd and 1st(1), respectively. The code is available at https://github.com/hailanyi/VirConv.

引用

页码：21653 / 21662

页数：10

共 50 条

[21] FSD V2: Improving Fully Sparse 3D Object Detection With Virtual Voxels
Fan, Lue
Wang, Feng
Wang, Naiyan
Zhang, Zhaoxiang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (02) : 1279 - 1292
[22] MonoDCN: Monocular 3D object detection based on dynamic convolution
Qu, Shenming
Yang, Xinyu
Gao, Yiming
Liang, Shengbin
PLOS ONE, 2022, 17 (10):
[23] 3D object detection based on sparse convolution neural network and feature fusion for autonomous driving in smart cities
Wang, Lei
Fan, Xiaoyun
Chen, Jiahao
Cheng, Jun
Tan, Jun
Ma, Xiaoliang
SUSTAINABLE CITIES AND SOCIETY, 2020, 54 (54)
[24] FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection
Xu, Shaoqing
Zhou, Dingfu
Fang, Jin
Yin, Junbo
Bin, Zhou
Zhang, Liangjun
2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 3047 - 3054
[25] Real-Time Multimodal 3D Object Detection with Transformers
Liu, Hengsong
Duan, Tongle
WORLD ELECTRIC VEHICLE JOURNAL, 2024, 15 (07):
[26] VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
Chen, Yukang
Liu, Jianhui
Zhang, Xiangyu
Qi, Xiaojuan
Jia, Jiaya
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21674 - 21683
[27] MVX-Net: Multimodal VoxelNet for 3D Object Detection
Sindagi, Vishwanath A.
Zhou, Yin
Tuzel, Oncel
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 7276 - 7282
[28] Multimodal Sparse Features for Object Detection
Haker, Martin
Martinetz, Thomas
Barth, Erhardt
ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT II, 2009, 5769 : 923 - 932
[29] Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection
Wang, Tianyu
Hu, Xiaowei
Liu, Zhengzhe
Fu, Chi-Wing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[30] Monocular 3D Object Detection Utilizing Auxiliary Learning With Deformable Convolution
Chen, Jiun-Han
Shieh, Jeng-Lun
Haq, Muhamad Amirul
Ruan, Shanq-Jang
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (03) : 2424 - 2436

← 1 2 3 4 5 →