SF3D: SlowFast Temporal 3D Object Detection

被引:0
|
作者
Wang, Renhao [1 ,2 ]
Yu, Zhiding [2 ]
Lan, Shiyi [2 ]
Xie, Enze [2 ,3 ]
Chen, Ke [2 ]
Anandkumar, Anima [2 ,4 ]
Alvarez, Jose M. [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] NVIDIA, Santa Clara, CA USA
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] CALTECH, Pasadena, CA 91125 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Leveraging inputs over multiple consecutive frames has been shown to benefit 3D object detection. However, existing approaches often demonstrate unsatisfactory scaling with increasing temporal histories. In this work, we propose SF3D, a late fusion module which addresses this issue by better modeling temporal relationships via a two-stream factorization. Concretely, SF3D operates on an input sequence of consecutive bird's-eye view (BEV) features, which is partitioned into "short-term" and "long-term" frames. A more heavily parameterized short-term branch using adapters and deformable attention aggregates features closer to the current timestep. In parallel, a long-term branch composed of efficiently implemented global convolution layers aggregates a larger window of temporally distant historical features. This two-stream paradigm allows SF3D to effectively consume near-term information, while scaling to efficiently leverage longer historical windows. We show that SF3D works with arbitrary upstream BEV encoders and downstream detectors, achieving improvements over recent state-of-the-art on the Waymo Open and nuScenes benchmarks.
引用
收藏
页码:1280 / 1285
页数:6
相关论文
共 50 条
  • [31] 3D sketching for 3D object retrieval
    Bo Li
    Juefei Yuan
    Yuxiang Ye
    Yijuan Lu
    Chaoyang Zhang
    Qi Tian
    Multimedia Tools and Applications, 2021, 80 : 9569 - 9595
  • [32] 3D, SF and the future
    Birtchnell, Thomas
    Urry, John
    FUTURES, 2013, 50 : 25 - 34
  • [33] Multimodal Object Query Initialization for 3D Object Detection
    van Geerenstein, Mathijs R.
    Ruppel, Felicia
    Dietmayers, Klaus
    Gavrila, Dariu M.
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 12484 - 12491
  • [34] 3D Object Proposals for Accurate Object Class Detection
    Chen, Xiaozhi
    Kundu, Kaustav
    Zhu, Yukun
    Berneshawi, Andrew
    Ma, Huimin
    Fidler, Sanja
    Urtasun, Raquel
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [35] Temporal 3D RetinaNet for fish detection
    Shen, Zhou
    Chuong Nguyen
    2020 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2020,
  • [36] Reinforcing LiDAR-Based 3D Object Detection with RGB and 3D Information
    Liu, Wenjian
    Zhou, Yue
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 199 - 209
  • [37] MonoSample: Synthetic 3D Data Augmentation Method in Monocular 3D Object Detection
    Qiao, Junchao
    Liu, Biao
    Yang, Jiaqi
    Wang, Baohua
    Xiu, Sanmu
    Du, Xin
    Nie, Xiaobo
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (08): : 7326 - 7332
  • [38] SGM3D: Stereo Guided Monocular 3D Object Detection
    Zhou, Zheyuan
    Du, Liang
    Ye, Xiaoqing
    Zou, Zhikang
    Tan, Xiao
    Zhang, Li
    Xue, Xiangyang
    Feng, Jianfeng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10478 - 10485
  • [39] FocalFormer3D: Focusing on Hard Instance for 3D Object Detection
    Chen, Yilun
    Yu, Zhiding
    Chen, Yukang
    Lan, Shiyi
    Anandkumar, Anima
    Jia, Jiaya
    Alvarez, Jose M.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8360 - 8371
  • [40] KPP3D:Key Point Painting for 3D Object Detection
    Wang, Mingming
    Chen, Qingkui
    Fu, Zhibing
    Computer Engineering and Applications, 2023, 59 (17) : 195 - 204