Optimized voxel transformer for 3D detection with spatial-semantic feature aggregation

被引:3
|
作者
Li, Yingfei [1 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
关键词
Artificial intelligence; 3D object detection; Point cloud; Single stage object detector;
D O I
10.1016/j.compeleceng.2023.109023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel 3D object detection model that leverages the advantages of the Voxel Transformer (VoTr) and the Confident IoU-Aware Single-Stage Object Detector (CIA-SSD) to address the challenges of detecting objects in 3D point clouds. Our model adopts the VoTr as its backbone, which enables long-range interactions between voxels via a self-attention mechanism. This overcomes the limitations of conventional voxel-based 3D detectors, which struggle to capture sufficient contextual information due to their restricted receptive fields. Our model also integrates the sparse voxel module and the submanifold voxel module, which efficiently process empty and non-empty voxel positions, effectively handling the natural sparsity and abundance of non-empty voxels. Moreover, inspired by the CIA-SSD design, our model incorporates the SpatialSemantic Feature Aggregation (SSFA) module, which allows for the adaptive fusion of high-level abstract semantic features and low-level spatial features, ensuring accurate predictions of bounding boxes and classification confidence. Furthermore, based on the IoU-aware confidence rectification module, which refines the alignment between confidence scores and localization accuracy, we devise an Optimized RPN (Region Proposal Network) Detection Head module as a dense head to further predict the IoU loss and improve the accuracy. In this paper, we combine two state-of-the-art techniques to provide a precise and efficient solution for 3D object detection in point clouds. We evaluate our model on the KITTI dataset1 and achieve 76.56 % accuracy in terms of AP3D (%) Hard.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] MSIT-Det: Multi-Scale Feature Aggregation with Iterative Transformer Networks for 3D Object Detection
    Li, Xi
    Chen, Yuanyuan
    Lv, Yisheng
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 5510 - 5515
  • [22] Novel 3D local feature descriptor of point clouds based on spatial voxel homogenization for feature matching
    Jiong Yang
    Jian Zhang
    Zhengyang Cai
    Dongyang Fang
    Visual Computing for Industry, Biomedicine, and Art, 6
  • [23] Novel 3D local feature descriptor of point clouds based on spatial voxel homogenization for feature matching
    Yang, Jiong
    Zhang, Jian
    Cai, Zhengyang
    Fang, Dongyang
    VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2023, 6 (01)
  • [24] Radar-camera fusion for 3D object detection with aggregation transformer
    Li, Jun
    Zhang, Han
    Wu, Zizhang
    Xu, Tianhao
    APPLIED INTELLIGENCE, 2024, 54 (21) : 10627 - 10639
  • [25] Learning accurate monocular 3D voxel representation via bilateral voxel transformer
    Cheng, Tianheng
    Jiang, Haoyi
    Chen, Shaoyu
    Liao, Bencheng
    Zhang, Qian
    Liu, Wenyu
    Wang, Xinggang
    IMAGE AND VISION COMPUTING, 2024, 150
  • [26] Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation
    Ji, Chaofeng
    Liu, Guizhong
    Zhao, Dan
    VISUAL COMPUTER, 2023, 39 (10): : 4543 - 4554
  • [27] Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation
    Chaofeng Ji
    Guizhong Liu
    Dan Zhao
    The Visual Computer, 2023, 39 : 4543 - 4554
  • [28] DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds
    Ning, Yaqian
    Cao, Jie
    Bao, Chun
    Hao, Qun
    REMOTE SENSING, 2023, 15 (23)
  • [29] H2GFormer: Horizontal-to-Global Voxel Transformer for 3D Semantic Scene Completion
    Wang, Yu
    Tong, Chao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5722 - 5730
  • [30] Spatial and Semantic Information Enhancement for Indoor 3D Object Detection
    Chen, Chunmei
    Liang, Zhiqiang
    Liu, Haitao
    Liu, Xin
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (05) : 831 - 839