TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection

被引:7
|
作者
Luo, Zhipeng [1 ,3 ]
Zhang, Gongjie [1 ]
Zhou, Changqing [1 ,3 ]
Liu, Tianrui [1 ,3 ]
Lu, Shijian [1 ]
Pan, Liang [2 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Nanyang Technol Univ, S Lab, Singapore, Singapore
[3] Sensetime Res, Hong Kong, Peoples R China
关键词
D O I
10.1109/WACV56688.2023.00421
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics. However, most existing studies focus on single point cloud frames without harnessing the temporal information in point cloud sequences. In this paper, we design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames for multiframe 3D object detection. TransPillars aggregates spatial-temporal point cloud features from two perspectives. First, it fuses voxel-level features directly from multi-frame feature maps instead of pooled instance features to preserve instance details with contextual information that are essential to accurate object localization. Second, it introduces a hierarchical coarse-to-fine strategy to fuse multi-scale features progressively to effectively capture the motion of moving objects and guide the aggregation of fine features. Besides, a variant of deformable transformer is introduced to improve the effectiveness of cross-frame feature matching. Extensive experiments show that our proposed TransPillars achieves state-of-art performance as compared to existing multi-frame detection approaches.
引用
收藏
页码:4219 / 4228
页数:10
相关论文
共 50 条
  • [1] 3D-MAN: 3D Multi-frame Attention Network for Object Detection
    Yang, Zetong
    Zhou, Yin
    Chen, Zhifeng
    Ngiam, Jiquan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1863 - 1872
  • [2] A coarse-to-fine keypoint detection method for 3D model
    1600, International Frequency Sensor Association, 46 Thorny Vineway, Toronto, ON M2J 4J2, Canada (160):
  • [3] Multi-Sensor Fusion 3D Object Detection Based on Multi-Frame Information
    Wu S.
    Geng J.
    Wu C.
    Yan Z.
    Chen K.
    Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2023, 43 (12): : 1282 - 1289
  • [4] 3D Object Detection With Multi-Frame RGB-Lidar Feature Alignment
    Ercelik, Emec
    Yurtsever, Ekim
    Knoll, Alois
    IEEE ACCESS, 2021, 9 : 143138 - 143149
  • [5] Boosting Single-Frame 3D Object Detection by Simulating Multi-Frame Point Clouds
    Zheng, Wu
    Jiang, Li
    Lu, FanBin
    Ye, Yangyang
    Fu, Chi-Wing
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4848 - 4856
  • [6] MPPNet: Multi-frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection
    Chen, Xuesong
    Shi, Shaoshuai
    Zhu, Benjin
    Cheung, Ka Chun
    Xu, Hang
    Li, Hongsheng
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 680 - 697
  • [7] Adaptive Coarse-to-Fine Interactor for Multi-Scale Object Detection
    Li, Zekun
    Liu, Yufan
    Li, Bing
    Hu, Weiming
    Zhou, Xue
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] Coarse-to-Fine 3D Human Pose Estimation
    Guo, Yu
    Zhao, Lin
    Zhang, Shanshan
    Yang, Jian
    IMAGE AND GRAPHICS, ICIG 2019, PT III, 2019, 11903 : 579 - 592
  • [9] Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection
    Zhang, Yifan
    Zhu, Zhiyu
    Hou, Junhui
    Wu, Dapeng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 10614 - 10628
  • [10] A pseudo-3D coarse-to-fine architecture for 3D medical landmark detection
    Cui, Li
    Liu, Boyan
    Xu, Guikun
    Guo, Jixiang
    Tang, Wei
    He, Tao
    NEUROCOMPUTING, 2025, 614