Virtual Sparse Convolution for Multimodal 3D Object Detection

被引:70
|
作者
Wu, Hai [1 ]
Wen, Chenglu [1 ]
Shi, Shaoshuai [2 ]
Li, Xin [3 ]
Wang, Cheng [1 ]
机构
[1] Xiamen Univ, Xiamen, Peoples R China
[2] Max Planck Inst, Munich, Germany
[3] Texas A&M Univ, College Stn, TX 77843 USA
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.02074
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, virtual/pseudo-point-based 3D object detection that seamlessly fuses RGB images and LiDAR data by depth completion has gained great attention. However, virtual points generated from an image are very dense, introducing a huge amount of redundant computation during detection. Meanwhile, noises brought by inaccurate depth completion significantly degrade detection precision. This paper proposes a fast yet effective backbone, termed VirConvNet, based on a new operator VirConv (Virtual Sparse Convolution), for virtual-point-based 3D object detection. VirConv consists of two key designs: (1) StVD (Stochastic Voxel Discard) and (2) NRConv (Noise-Resistant Sub-manifold Convolution). StVD alleviates the computation problem by discarding large amounts of nearby redundant voxels. NRConv tackles the noise problem by encoding voxel features in both 2D image and 3D LiDAR space. By integrating VirConv, we first develop an efficient pipeline VirConv-L based on an early fusion design. Then, we build a high-precision pipeline VirConv-T based on a transformed refinement scheme. Finally, we develop a semi-supervised pipeline VirConv-S based on a pseudo-label framework. On the KITTI car 3D detection test leaderboard, our VirConv-L achieves 85% AP with a fast running speed of 56ms. Our VirConv-T and VirConv-S attains a high-precision of 86.3% and 87.2% AP, and currently rank 2nd and 1st(1), respectively. The code is available at https://github.com/hailanyi/VirConv.
引用
收藏
页码:21653 / 21662
页数:10
相关论文
共 50 条
  • [1] PVConvNet: Pixel-Voxel Sparse Convolution for multimodal 3D object detection
    Liu, Huaijin
    Du, Jixiang
    Zhang, Yong
    Zhang, Hongbo
    Zeng, Jiandian
    PATTERN RECOGNITION, 2024, 149
  • [2] DROP SPARSE CONVOLUTION FOR 3D OBJECT DETECTION
    Zhu, Taohong
    Shen, Jun
    Wang, Chali
    Xiong, Huiyuan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3185 - 3189
  • [3] Spatial Pruned Sparse Convolution for Efficient 3D Object Detection
    Liu, Jianhui
    Chen, Yukang
    Ye, Xiaoqing
    Tian, Zhuotao
    Tan, Xiao
    Qi, Xiaojuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] Multimodal 3D Object Detection Based on Sparse Interaction in Internet of Vehicles
    Li, Hui
    Ge, Tongao
    Bai, Keqiang
    Nie, Gaofeng
    Xu, Lingwei
    Ai, Xiaoxue
    Cao, Song
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2025, 74 (02) : 2174 - 2186
  • [5] MULTI-DIMENSIONAL PRUNED SPARSE CONVOLUTION FOR EFFICIENT 3D OBJECT DETECTION
    Li, Linye
    Yue, Xiaodong
    Xu, Zhikang
    Xie, Shaorong
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3190 - 3194
  • [6] VirPNet: A Multimodal Virtual Point Generation Network for 3D Object Detection
    Wang, Lin
    Sun, Shiliang
    Zhao, Jing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10597 - 10609
  • [7] Super Sparse 3D Object Detection
    Fan, Lue
    Yang, Yuxue
    Wang, Feng
    Wang, Naiyan
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12490 - 12505
  • [8] Fully Sparse 3D Object Detection
    Fan, Lue
    Wang, Feng
    Wang, Naiyan
    Zhang, Zhaoxiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Sparse Embedded Convolution Based Dual Feature Aggregation 3D Object Detection Network
    Li, Hai-Sheng
    Lu, Yan-Ling
    NEURAL PROCESSING LETTERS, 2024, 56 (01)
  • [10] Sparse Embedded Convolution Based Dual Feature Aggregation 3D Object Detection Network
    Hai-Sheng Li
    Yan-Ling Lu
    Neural Processing Letters, 56