DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds

被引:0
|
作者
Ning, Yaqian [1 ]
Cao, Jie [1 ,2 ]
Bao, Chun [1 ]
Hao, Qun [1 ,2 ,3 ]
机构
[1] Beijing Inst Technol, Sch Opt & Photon, Beijing 100081, Peoples R China
[2] Beijing Inst Technol, Yangtze Delta Reg Acad, Jiaxing 314003, Peoples R China
[3] Changchun Univ Sci & Technol, Sch Optoelect Engn, Changchun 130022, Peoples R China
关键词
3D object detection; deformable mechanism; transformer; point clouds;
D O I
10.3390/rs15235612
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The use of a transformer backbone in LiDAR point-cloud-based models for 3D object detection has recently gained significant interest. The larger receptive field of the transformer backbone improves its representation capability but also results in excessive attention being given to background regions. To solve this problem, we propose a novel approach called deformable voxel set attention, which we utilized to create a deformable voxel set transformer (DVST) backbone for 3D object detection from point clouds. The DVST aims to efficaciously integrate the flexible receptive field of the deformable mechanism and the powerful context modeling capability of the transformer. Specifically, we introduce the deformable mechanism into voxel-based set attention to selectively transfer candidate keys and values of foreground queries to important regions. An offset generation module was designed to learn the offsets of the foreground queries. Furthermore, a globally responsive convolutional feed-forward network with residual connection is presented to capture global feature interactions in hidden space. We verified the validity of the DVST on the KITTI and Waymo open datasets by constructing single-stage and two-stage models. The findings indicated that the DVST enhanced the average precision of the baseline model while preserving computational efficiency, achieving a performance comparable to state-of-the-art methods.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds
    He, Chenhang
    Li, Ruihuang
    Li, Shuai
    Zhang, Lei
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8407 - 8417
  • [2] CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds
    Li, Xinglong
    Zhang, Xiaowei
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 299 - 311
  • [3] Voxel Transformer for 3D Object Detection
    Mao, Jiageng
    Xue, Yujing
    Niu, Minzhe
    Bai, Haoyue
    Feng, Jiashi
    Liang, Xiaodan
    Xu, Hang
    Xu, Chunjing
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3144 - 3153
  • [4] Learning Deformable Network for 3D Object Detection on Point Clouds
    Zhang, Wanyi
    Fu, Xiuhua
    Li, Wei
    [J]. MOBILE INFORMATION SYSTEMS, 2021, 2021
  • [5] Voxel Transformer with Density-Aware Deformable Attention for 3D Object Detection
    Kim, Taeho
    Kim, Joohee
    [J]. SENSORS, 2023, 23 (16)
  • [6] Weakly Supervised Point Clouds Transformer for 3D Object Detection
    Tang, Zuojin
    Sun, Bo
    Ma, Tongwei
    Li, Daosheng
    Xu, Zhenhui
    [J]. 2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 3948 - 3955
  • [7] Planar object detection from 3D point clouds based on pyramid voxel representation
    Hu, Zhaozheng
    Bai, Dongfang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (22) : 24343 - 24357
  • [8] Planar object detection from 3D point clouds based on pyramid voxel representation
    Zhaozheng Hu
    Dongfang Bai
    [J]. Multimedia Tools and Applications, 2017, 76 : 24343 - 24357
  • [9] SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
    Sun, Pei
    Tan, Mingxing
    Wang, Weiyue
    Liu, Chenxi
    Xia, Fei
    Leng, Zhaoqi
    Anguelov, Dragomir
    [J]. COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 426 - 442
  • [10] 3D detection transformer: Set prediction of objects using point clouds
    Thon, Tan
    Lim, Joanne Mun-Yee
    Jinn, Foo Ji
    Muniandy, Ramachandran
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 236