Optimized voxel transformer for 3D detection with spatial-semantic feature aggregation

被引:3
|
作者
Li, Yingfei [1 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
关键词
Artificial intelligence; 3D object detection; Point cloud; Single stage object detector;
D O I
10.1016/j.compeleceng.2023.109023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel 3D object detection model that leverages the advantages of the Voxel Transformer (VoTr) and the Confident IoU-Aware Single-Stage Object Detector (CIA-SSD) to address the challenges of detecting objects in 3D point clouds. Our model adopts the VoTr as its backbone, which enables long-range interactions between voxels via a self-attention mechanism. This overcomes the limitations of conventional voxel-based 3D detectors, which struggle to capture sufficient contextual information due to their restricted receptive fields. Our model also integrates the sparse voxel module and the submanifold voxel module, which efficiently process empty and non-empty voxel positions, effectively handling the natural sparsity and abundance of non-empty voxels. Moreover, inspired by the CIA-SSD design, our model incorporates the SpatialSemantic Feature Aggregation (SSFA) module, which allows for the adaptive fusion of high-level abstract semantic features and low-level spatial features, ensuring accurate predictions of bounding boxes and classification confidence. Furthermore, based on the IoU-aware confidence rectification module, which refines the alignment between confidence scores and localization accuracy, we devise an Optimized RPN (Region Proposal Network) Detection Head module as a dense head to further predict the IoU loss and improve the accuracy. In this paper, we combine two state-of-the-art techniques to provide a precise and efficient solution for 3D object detection in point clouds. We evaluate our model on the KITTI dataset1 and achieve 76.56 % accuracy in terms of AP3D (%) Hard.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Efficient 3D Semantic Segmentation with Superpoint Transformer
    Robert, Damien
    Raguet, Hugo
    Landrieu, Loic
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17149 - 17158
  • [32] EGNet: 3D Semantic Segmentation Through Point–Voxel–Mesh Data for Euclidean–Geodesic Feature Fusion
    Li, Qi
    Song, Yu
    Jin, Xiaoqian
    Wu, Yan
    Zhang, Hang
    Zhao, Di
    Sensors, 2024, 24 (24)
  • [33] DVFENet: Dual-branch voxel feature extraction network for 3D object detection
    He, Yunqian
    Xia, Guihua
    Luo, Yongkang
    Su, Li
    Zhang, Zhi
    Li, Wanyi
    Wang, Peng
    NEUROCOMPUTING, 2021, 459 : 201 - 211
  • [34] SMS-Net: Sparse multi-scale voxel feature aggregation network for LiDAR-based 3D object detection
    Liu, Sheng
    Huang, Wenhao
    Cao, Yifeng
    Li, Dingda
    Chen, Shengyong
    NEUROCOMPUTING, 2022, 501 : 555 - 565
  • [35] MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds
    Dong, Shaocong
    Ding, Lihe
    Wang, Haiyang
    Xu, Tingfa
    Xu, Xinli
    Bian, Ziyang
    Wang, Ying
    Wang, Jie
    Li, Jianan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [36] FETR: Feature Transformer for vehicle-infrastructure cooperative 3D object detection
    Yan, Wenchao
    Cao, Hua
    Chen, Jiazhong
    Wu, Tao
    NEUROCOMPUTING, 2024, 600
  • [37] ET-PointPillars: improved PointPillars for 3D object detection based on optimized voxel downsampling
    Liu, Yiyi
    Yang, Zhengyi
    Tong, JianLin
    Yang, Jiajia
    Peng, Jiongcheng
    Zhang, Lihang
    Cheng, Wangxin
    MACHINE VISION AND APPLICATIONS, 2024, 35 (03)
  • [38] Three-dimensional object detection with spatial-semantic features of point clouds
    Chen, Tianxiang
    Han, Chao
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (05)
  • [39] Dense Voxel Fusion for 3D Object Detection
    Mahmoud, Anas
    Hu, Jordan S. K.
    Waslander, Steven L.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 663 - 672
  • [40] Voxel Field Fusion for 3D Object Detection
    Li, Yanwei
    Qi, Xiaojuan
    Chen, Yukang
    Wang, Liwei
    Li, Zeming
    Sun, Jian
    Jia, Jiaya
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1110 - 1119