Optimized voxel transformer for 3D detection with spatial-semantic feature aggregation

被引:3
|
作者
Li, Yingfei [1 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
关键词
Artificial intelligence; 3D object detection; Point cloud; Single stage object detector;
D O I
10.1016/j.compeleceng.2023.109023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel 3D object detection model that leverages the advantages of the Voxel Transformer (VoTr) and the Confident IoU-Aware Single-Stage Object Detector (CIA-SSD) to address the challenges of detecting objects in 3D point clouds. Our model adopts the VoTr as its backbone, which enables long-range interactions between voxels via a self-attention mechanism. This overcomes the limitations of conventional voxel-based 3D detectors, which struggle to capture sufficient contextual information due to their restricted receptive fields. Our model also integrates the sparse voxel module and the submanifold voxel module, which efficiently process empty and non-empty voxel positions, effectively handling the natural sparsity and abundance of non-empty voxels. Moreover, inspired by the CIA-SSD design, our model incorporates the SpatialSemantic Feature Aggregation (SSFA) module, which allows for the adaptive fusion of high-level abstract semantic features and low-level spatial features, ensuring accurate predictions of bounding boxes and classification confidence. Furthermore, based on the IoU-aware confidence rectification module, which refines the alignment between confidence scores and localization accuracy, we devise an Optimized RPN (Region Proposal Network) Detection Head module as a dense head to further predict the IoU loss and improve the accuracy. In this paper, we combine two state-of-the-art techniques to provide a precise and efficient solution for 3D object detection in point clouds. We evaluate our model on the KITTI dataset1 and achieve 76.56 % accuracy in terms of AP3D (%) Hard.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] MVTr: multi-feature voxel transformer for 3D object detection
    Lingmei Ai
    Zhuoyu Xie
    Ruoxia Yao
    Mengyao Yang
    The Visual Computer, 2024, 40 : 1453 - 1466
  • [2] MVTr: multi-feature voxel transformer for 3D object detection
    Ai, Lingmei
    Xie, Zhuoyu
    Yao, Ruoxia
    Yang, Mengyao
    VISUAL COMPUTER, 2024, 40 (03): : 1453 - 1466
  • [3] Voxel Transformer for 3D Object Detection
    Mao, Jiageng
    Xue, Yujing
    Niu, Minzhe
    Bai, Haoyue
    Feng, Jiashi
    Liang, Xiaodan
    Xu, Hang
    Xu, Chunjing
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3144 - 3153
  • [4] Geometric Boundary Guided Feature Fusion and Spatial-Semantic Context Aggregation for Semantic Segmentation of Remote Sensing Images
    Wang, Yupei
    Zhang, Haoran
    Hu, Yongkang
    Hu, Xiaoxing
    Chen, Liang
    Hu, Shanqing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6373 - 6385
  • [5] Reinforced Voxel-RCNN: An Efficient 3D Object Detection Method Based on Feature Aggregation*
    Jiang, Jia-ji
    Wan, Hai-bin
    Sun, Hong-min
    Qin, Tuan-fa
    Wang, Zheng-qiang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (09) : 1228 - 1238
  • [6] Spatial-Semantic Image Search by Visual Feature Synthesis
    Mai, Long
    Jin, Hailin
    Lin, Zhe
    Fang, Chen
    Brandt, Jonathan
    Liu, Feng
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1121 - 1130
  • [7] Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds
    Kuang, Hongwu
    Wang, Bei
    An, Jianping
    Zhang, Ming
    Zhang, Zehan
    SENSORS, 2020, 20 (03)
  • [8] NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection
    Huo, Weile
    Jing, Tao
    Ren, Shuang
    NEURAL PROCESSING LETTERS, 2023, 55 (06) : 6925 - 6945
  • [9] NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection
    Weile Huo
    Tao Jing
    Shuang Ren
    Neural Processing Letters, 2023, 55 : 6925 - 6945
  • [10] PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection
    Leng, Zhaoqi
    Sun, Pei
    He, Tong
    Anguelov, Dragomir
    Tan, Mingxing
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 4238 - 4244