CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds

被引:0
|
作者
Li, Xinglong [1 ]
Zhang, Xiaowei [1 ]
机构
[1] Qingdao Univ, Sch Comp Sci & Technol, Qingdao, Peoples R China
关键词
3-D object detection; Point clouds; Cascaded network;
D O I
10.1007/978-981-99-8435-0_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Transformers have been widely applied in 3-D object detection to model global contextual relationships in point cloud collections or for proposal refinement. However, the structural information in 3-D point clouds, especially to the distant and small objects is often incomplete, leading to difficulties in accurate detection using these methods. To address this issue, we propose a Cascaded Transformer based on Dynamic Voxel Pyramid (called CasFormer) for 3-D object detection from LiDAR point clouds. Specifically, we dynamically spread relevant features from the voxel pyramid based on the sparsity of each region of interest (RoI), capturing more rich semantic information for structurally incomplete objects. Furthermore, a cross-stage attentionmechanism is employed to cascade the refined results of theTransformer in stage by stage, aswell as to improve the training convergence of transformer. Extensive experiments demonstrate that our CasFormer achieves progressive performance in KITTI Dataset andWaymo Open Dataset. Compared to CT3D, our method outperforms it by 1.12% and 1.27% in the moderate and hard levels of car detection, respectively, on the KITTI online 3-D object detection leaderboard.
引用
收藏
页码:299 / 311
页数:13
相关论文
共 50 条
  • [1] Planar object detection from 3D point clouds based on pyramid voxel representation
    Hu, Zhaozheng
    Bai, Dongfang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (22) : 24343 - 24357
  • [2] Planar object detection from 3D point clouds based on pyramid voxel representation
    Zhaozheng Hu
    Dongfang Bai
    [J]. Multimedia Tools and Applications, 2017, 76 : 24343 - 24357
  • [3] DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds
    Ning, Yaqian
    Cao, Jie
    Bao, Chun
    Hao, Qun
    [J]. REMOTE SENSING, 2023, 15 (23)
  • [4] Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds
    He, Chenhang
    Li, Ruihuang
    Li, Shuai
    Zhang, Lei
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8407 - 8417
  • [5] Voxel Transformer for 3D Object Detection
    Mao, Jiageng
    Xue, Yujing
    Niu, Minzhe
    Bai, Haoyue
    Feng, Jiashi
    Liang, Xiaodan
    Xu, Hang
    Xu, Chunjing
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3144 - 3153
  • [6] Weakly Supervised Point Clouds Transformer for 3D Object Detection
    Tang, Zuojin
    Sun, Bo
    Ma, Tongwei
    Li, Daosheng
    Xu, Zhenhui
    [J]. 2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 3948 - 3955
  • [7] Clusterformer: Cluster-based Transformer for 3D Object Detection in Point Clouds
    Pei, Yu
    Zhao, Xian
    Li, Hao
    Ma, Jingyuan
    Zhang, Jingwei
    Pu, Shiliang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6641 - 6650
  • [8] HCPVF: Hierarchical Cascaded Point-Voxel Fusion for 3D Object Detection
    Fan, Baojie
    Zhang, Kexin
    Tian, Jiandong
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (10) : 8997 - 9009
  • [9] SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
    Sun, Pei
    Tan, Mingxing
    Wang, Weiyue
    Liu, Chenxi
    Xia, Fei
    Leng, Zhaoqi
    Anguelov, Dragomir
    [J]. COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 426 - 442
  • [10] Voxel Graph Attention for 3-D Object Detection From Point Clouds
    Lu, Bin
    Sun, Yang
    Yang, Zhenyu
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72