Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds

被引:80
|
作者
He, Chenhang [1 ]
Li, Ruihuang [1 ]
Li, Shuai [1 ]
Zhang, Lei [1 ]
机构
[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China
关键词
D O I
10.1109/CVPR52688.2022.00823
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer has demonstrated promising performance in many 2D vision tasks. However, it is cumbersome to compute the self-attention on large-scale point cloud data because point cloud is a long sequence and unevenly distributed in 3D space. To solve this issue, existing methods usually compute self-attention locally by grouping the points into clusters of the same size, or perform convolutional self-attention on a discretized representation. However, the former results in stochastic point dropout, while the latter typically has narrow attention fields. In this paper, we propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation. VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross-attentions and models features in a hidden space induced by a group of latent codes. With the VSA module, VoxSeT can manage voxelized point clusters with arbitrary size in a wide range, and process them in parallel with linear complexity. The proposed VoxSeT integrates the high performance of transformer with the efficiency of voxel-based model, which can be used as a good alternative to the convolutional and point-based backbones. VoxSeT reports competitive results on the KITH and Waymo detection benchmarks. The source codes can be found at https://gitgub.com/skyhehe123/VoxSet.
引用
收藏
页码:8407 / 8417
页数:11
相关论文
共 50 条
  • [1] DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds
    Ning, Yaqian
    Cao, Jie
    Bao, Chun
    Hao, Qun
    [J]. REMOTE SENSING, 2023, 15 (23)
  • [2] 3D Object Representation Learning: A Set-to-Set Matching Perspective
    Yu, Tan
    Meng, Jingjing
    Yang, Ming
    Yuan, Junsong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2168 - 2179
  • [3] 3D detection transformer: Set prediction of objects using point clouds
    Thon, Tan
    Lim, Joanne Mun-Yee
    Jinn, Foo Ji
    Muniandy, Ramachandran
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 236
  • [4] Contour Context Selection for Object Detection: A Set-to-Set Contour Matching Approach
    Zhu, Qihui
    Wang, Liming
    Wu, Yang
    Shi, Jianbo
    [J]. COMPUTER VISION - ECCV 2008, PT II, PROCEEDINGS, 2008, 5303 : 774 - +
  • [5] SASAN: Shape-Adaptive Set Abstraction Network for Point-Voxel 3D Object Detection
    Zhang, Hui
    Luo, Guiyang
    Wang, Xiao
    Li, Yidong
    Ding, Weiping
    Wang, Fei-Yue
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 15
  • [6] CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds
    Li, Xinglong
    Zhang, Xiaowei
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 299 - 311
  • [7] Open-set 3D Object Detection
    Cen, Jun
    Yun, Peng
    Cai, Junhao
    Wang, Michael Yu
    Liu, Ming
    [J]. 2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, : 869 - 878
  • [8] Voxel Transformer for 3D Object Detection
    Mao, Jiageng
    Xue, Yujing
    Niu, Minzhe
    Bai, Haoyue
    Feng, Jiashi
    Liang, Xiaodan
    Xu, Hang
    Xu, Chunjing
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3144 - 3153
  • [9] Point-set Distances for Learning Representations of 3D Point Clouds
    Nguyen, Trung
    Quang-Hieu Pham
    Le, Tam
    Pham, Tung
    Ho, Nhat
    Binh-Son Hua
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10458 - 10467
  • [10] PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection
    Shaoshuai Shi
    Li Jiang
    Jiajun Deng
    Zhe Wang
    Chaoxu Guo
    Jianping Shi
    Xiaogang Wang
    Hongsheng Li
    [J]. International Journal of Computer Vision, 2023, 131 : 531 - 551