CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds

被引：0

作者：

Li, Xinglong ^{[1
]}

Zhang, Xiaowei ^{[1
]}

机构：

[1] Qingdao Univ, Sch Comp Sci & Technol, Qingdao, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III | 2024年 / 14427卷

关键词：

3-D object detection; Point clouds; Cascaded network;

D O I：

10.1007/978-981-99-8435-0_24

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, Transformers have been widely applied in 3-D object detection to model global contextual relationships in point cloud collections or for proposal refinement. However, the structural information in 3-D point clouds, especially to the distant and small objects is often incomplete, leading to difficulties in accurate detection using these methods. To address this issue, we propose a Cascaded Transformer based on Dynamic Voxel Pyramid (called CasFormer) for 3-D object detection from LiDAR point clouds. Specifically, we dynamically spread relevant features from the voxel pyramid based on the sparsity of each region of interest (RoI), capturing more rich semantic information for structurally incomplete objects. Furthermore, a cross-stage attentionmechanism is employed to cascade the refined results of theTransformer in stage by stage, aswell as to improve the training convergence of transformer. Extensive experiments demonstrate that our CasFormer achieves progressive performance in KITTI Dataset andWaymo Open Dataset. Compared to CT3D, our method outperforms it by 1.12% and 1.27% in the moderate and hard levels of car detection, respectively, on the KITTI online 3-D object detection leaderboard.

引用

页码：299 / 311

页数：13

共 50 条

[1] Planar object detection from 3D point clouds based on pyramid voxel representation
Hu, Zhaozheng
Bai, Dongfang
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (22) : 24343 - 24357
[2] Planar object detection from 3D point clouds based on pyramid voxel representation
Zhaozheng Hu
Dongfang Bai
[J]. Multimedia Tools and Applications, 2017, 76 : 24343 - 24357
[3] DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds
Ning, Yaqian
Cao, Jie
Bao, Chun
Hao, Qun
[J]. REMOTE SENSING, 2023, 15 (23)
[4] Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds
He, Chenhang
Li, Ruihuang
Li, Shuai
Zhang, Lei
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8407 - 8417
[5] Voxel Transformer for 3D Object Detection
Mao, Jiageng
Xue, Yujing
Niu, Minzhe
Bai, Haoyue
Feng, Jiashi
Liang, Xiaodan
Xu, Hang
Xu, Chunjing
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3144 - 3153
[6] Weakly Supervised Point Clouds Transformer for 3D Object Detection
Tang, Zuojin
Sun, Bo
Ma, Tongwei
Li, Daosheng
Xu, Zhenhui
[J]. 2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 3948 - 3955
[7] Clusterformer: Cluster-based Transformer for 3D Object Detection in Point Clouds
Pei, Yu
Zhao, Xian
Li, Hao
Ma, Jingyuan
Zhang, Jingwei
Pu, Shiliang
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6641 - 6650
[8] HCPVF: Hierarchical Cascaded Point-Voxel Fusion for 3D Object Detection
Fan, Baojie
Zhang, Kexin
Tian, Jiandong
[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (10) : 8997 - 9009
[9] SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
Sun, Pei
Tan, Mingxing
Wang, Weiyue
Liu, Chenxi
Xia, Fei
Leng, Zhaoqi
Anguelov, Dragomir
[J]. COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 426 - 442
[10] Voxel Graph Attention for 3-D Object Detection From Point Clouds
Lu, Bin
Sun, Yang
Yang, Zhenyu
[J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72

← 1 2 3 4 5 →