Optimized voxel transformer for 3D detection with spatial-semantic feature aggregation

被引:3
|
作者
Li, Yingfei [1 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
关键词
Artificial intelligence; 3D object detection; Point cloud; Single stage object detector;
D O I
10.1016/j.compeleceng.2023.109023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel 3D object detection model that leverages the advantages of the Voxel Transformer (VoTr) and the Confident IoU-Aware Single-Stage Object Detector (CIA-SSD) to address the challenges of detecting objects in 3D point clouds. Our model adopts the VoTr as its backbone, which enables long-range interactions between voxels via a self-attention mechanism. This overcomes the limitations of conventional voxel-based 3D detectors, which struggle to capture sufficient contextual information due to their restricted receptive fields. Our model also integrates the sparse voxel module and the submanifold voxel module, which efficiently process empty and non-empty voxel positions, effectively handling the natural sparsity and abundance of non-empty voxels. Moreover, inspired by the CIA-SSD design, our model incorporates the SpatialSemantic Feature Aggregation (SSFA) module, which allows for the adaptive fusion of high-level abstract semantic features and low-level spatial features, ensuring accurate predictions of bounding boxes and classification confidence. Furthermore, based on the IoU-aware confidence rectification module, which refines the alignment between confidence scores and localization accuracy, we devise an Optimized RPN (Region Proposal Network) Detection Head module as a dense head to further predict the IoU loss and improve the accuracy. In this paper, we combine two state-of-the-art techniques to provide a precise and efficient solution for 3D object detection in point clouds. We evaluate our model on the KITTI dataset1 and achieve 76.56 % accuracy in terms of AP3D (%) Hard.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Radar Voxel Fusion for 3D Object Detection
    Nobis, Felix
    Shafiei, Ehsan
    Karle, Phillip
    Betz, Johannes
    Lienkamp, Markus
    APPLIED SCIENCES-BASEL, 2021, 11 (12):
  • [42] SKGHOI: Spatial-Semantic Knowledge Graph for Human-Object Interaction Detection
    Zhu, Lijing
    Lan, Qizhen
    Velasquez, Alvaro
    Song, Houbing
    Kamal, Acharya
    Tian, Qing
    Niu, Shuteng
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1186 - 1193
  • [43] MS23D: 2 3D: A 3D object detection method using multi-scale semantic feature points to construct 3D feature layer
    Shao, Yongxin
    Tan, Aihong
    Yan, Tianhong
    Sun, Zhetao
    Liu, Jiaxin
    NEURAL NETWORKS, 2024, 179
  • [44] PV-RCNN++: semantical point-voxel feature interaction for 3D object detection
    Peng Wu
    Lipeng Gu
    Xuefeng Yan
    Haoran Xie
    Fu Lee Wang
    Gary Cheng
    Mingqiang Wei
    The Visual Computer, 2023, 39 (6) : 2425 - 2440
  • [45] 3D Person Re-Identification Based on Global Semantic Guidance and Local Feature Aggregation
    Wang, Changshuo
    Ning, Xin
    Li, Weijun
    Bai, Xiao
    Gao, Xingyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4698 - 4712
  • [46] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
    Chen, Zehui
    Li, Zhenyu
    Zhang, Shiquan
    Fang, Liangji
    Jiang, Qinhong
    Zhao, Feng
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
  • [47] Geometric relation-based feature aggregation for 3D small object detection
    Yang, Wenbin
    Yu, Hang
    Luo, Xiangfeng
    Xie, Shaorong
    APPLIED INTELLIGENCE, 2024, 54 (19) : 8924 - 8938
  • [48] Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving
    Alaba, Simegnew Yihunie
    Ball, John E.
    IEEE ACCESS, 2024, 12 : 50165 - 50176
  • [49] MeT: A graph transformer for semantic segmentation of 3D meshes
    Vecchio, Giuseppe
    Prezzavento, Luca
    Pino, Carmelo
    Rundo, Francesco
    Palazzo, Simone
    Spampinato, Concetto
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 235
  • [50] MeT: A Graph Transformer for Semantic Segmentation of 3D Meshes
    Department of Computer Engineering, University of Catania, Italy
    不详
    arXiv, 1600,