Optimized voxel transformer for 3D detection with spatial-semantic feature aggregation

被引：3

作者：

Li, Yingfei ^{[1
]}

机构：

[1] Univ Toronto, Toronto, ON, Canada

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2023年 / 112卷

关键词：

Artificial intelligence; 3D object detection; Point cloud; Single stage object detector;

D O I：

10.1016/j.compeleceng.2023.109023

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a novel 3D object detection model that leverages the advantages of the Voxel Transformer (VoTr) and the Confident IoU-Aware Single-Stage Object Detector (CIA-SSD) to address the challenges of detecting objects in 3D point clouds. Our model adopts the VoTr as its backbone, which enables long-range interactions between voxels via a self-attention mechanism. This overcomes the limitations of conventional voxel-based 3D detectors, which struggle to capture sufficient contextual information due to their restricted receptive fields. Our model also integrates the sparse voxel module and the submanifold voxel module, which efficiently process empty and non-empty voxel positions, effectively handling the natural sparsity and abundance of non-empty voxels. Moreover, inspired by the CIA-SSD design, our model incorporates the SpatialSemantic Feature Aggregation (SSFA) module, which allows for the adaptive fusion of high-level abstract semantic features and low-level spatial features, ensuring accurate predictions of bounding boxes and classification confidence. Furthermore, based on the IoU-aware confidence rectification module, which refines the alignment between confidence scores and localization accuracy, we devise an Optimized RPN (Region Proposal Network) Detection Head module as a dense head to further predict the IoU loss and improve the accuracy. In this paper, we combine two state-of-the-art techniques to provide a precise and efficient solution for 3D object detection in point clouds. We evaluate our model on the KITTI dataset1 and achieve 76.56 % accuracy in terms of AP3D (%) Hard.

引用

页数：10

共 50 条

[41] Radar Voxel Fusion for 3D Object Detection
Nobis, Felix
Shafiei, Ehsan
Karle, Phillip
Betz, Johannes
Lienkamp, Markus
APPLIED SCIENCES-BASEL, 2021, 11 (12):
[42] SKGHOI: Spatial-Semantic Knowledge Graph for Human-Object Interaction Detection
Zhu, Lijing
Lan, Qizhen
Velasquez, Alvaro
Song, Houbing
Kamal, Acharya
Tian, Qing
Niu, Shuteng
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1186 - 1193
[43] MS23D: 2 3D: A 3D object detection method using multi-scale semantic feature points to construct 3D feature layer
Shao, Yongxin
Tan, Aihong
Yan, Tianhong
Sun, Zhetao
Liu, Jiaxin
NEURAL NETWORKS, 2024, 179
[44] PV-RCNN++: semantical point-voxel feature interaction for 3D object detection
Peng Wu
Lipeng Gu
Xuefeng Yan
Haoran Xie
Fu Lee Wang
Gary Cheng
Mingqiang Wei
The Visual Computer, 2023, 39 (6) : 2425 - 2440
[45] 3D Person Re-Identification Based on Global Semantic Guidance and Local Feature Aggregation
Wang, Changshuo
Ning, Xin
Li, Weijun
Bai, Xiao
Gao, Xingyu
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4698 - 4712
[46] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
Chen, Zehui
Li, Zhenyu
Zhang, Shiquan
Fang, Liangji
Jiang, Qinhong
Zhao, Feng
COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
[47] Geometric relation-based feature aggregation for 3D small object detection
Yang, Wenbin
Yu, Hang
Luo, Xiangfeng
Xie, Shaorong
APPLIED INTELLIGENCE, 2024, 54 (19) : 8924 - 8938
[48] Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving
Alaba, Simegnew Yihunie
Ball, John E.
IEEE ACCESS, 2024, 12 : 50165 - 50176
[49] MeT: A graph transformer for semantic segmentation of 3D meshes
Vecchio, Giuseppe
Prezzavento, Luca
Pino, Carmelo
Rundo, Francesco
Palazzo, Simone
Spampinato, Concetto
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 235
[50] MeT: A Graph Transformer for Semantic Segmentation of 3D Meshes
Department of Computer Engineering, University of Catania, Italy
不详
arXiv, 1600,

← 1 2 3 4 5 →