SF3D: SlowFast Temporal 3D Object Detection

被引:0
|
作者
Wang, Renhao [1 ,2 ]
Yu, Zhiding [2 ]
Lan, Shiyi [2 ]
Xie, Enze [2 ,3 ]
Chen, Ke [2 ]
Anandkumar, Anima [2 ,4 ]
Alvarez, Jose M. [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] NVIDIA, Santa Clara, CA USA
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] CALTECH, Pasadena, CA 91125 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Leveraging inputs over multiple consecutive frames has been shown to benefit 3D object detection. However, existing approaches often demonstrate unsatisfactory scaling with increasing temporal histories. In this work, we propose SF3D, a late fusion module which addresses this issue by better modeling temporal relationships via a two-stream factorization. Concretely, SF3D operates on an input sequence of consecutive bird's-eye view (BEV) features, which is partitioned into "short-term" and "long-term" frames. A more heavily parameterized short-term branch using adapters and deformable attention aggregates features closer to the current timestep. In parallel, a long-term branch composed of efficiently implemented global convolution layers aggregates a larger window of temporally distant historical features. This two-stream paradigm allows SF3D to effectively consume near-term information, while scaling to efficiently leverage longer historical windows. We show that SF3D works with arbitrary upstream BEV encoders and downstream detectors, achieving improvements over recent state-of-the-art on the Waymo Open and nuScenes benchmarks.
引用
收藏
页码:1280 / 1285
页数:6
相关论文
共 50 条
  • [1] A robust 3D unique descriptor for 3D object detection
    Joshi, Piyush
    Rastegarpanah, Alireza
    Stolkin, Rustam
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (03)
  • [2] Gated3D: Monocular 3D Object Detection From Temporal Illumination Cues
    Julca-Aguilar, Frank
    Taylor, Jason
    Bijelic, Mario
    Mannan, Fahim
    Tseng, Ethan
    Heide, Felix
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2918 - 2928
  • [3] 3D Object Detection with Pointformer
    Pan, Xuran
    Xia, Zhuofan
    Song, Shiji
    Li, Li Erran
    Huang, Gao
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7459 - 7468
  • [4] A survey of 3D object detection
    Wei Liang
    Pengfei Xu
    Ling Guo
    Heng Bai
    Yang Zhou
    Feng Chen
    Multimedia Tools and Applications, 2021, 80 : 29617 - 29641
  • [5] A survey of 3D object detection
    Liang, Wei
    Xu, Pengfei
    Guo, Ling
    Bai, Heng
    Zhou, Yang
    Chen, Feng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (19) : 29617 - 29641
  • [6] 3D object watermarking by a 3D hidden object
    Kishk, S
    Javidi, B
    OPTICS EXPRESS, 2003, 11 (08): : 874 - 888
  • [7] Temp-Frustum Net: 3D Object Detection with Temporal Fusion
    Ercelik, Emec
    Yurtsever, Ekim
    Knoll, Alois
    2021 32ND IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2021, : 1095 - 1101
  • [8] 3D OBJECT DETECTION FOR AUTONOMOUS DRIVING USING TEMPORAL LIDAR DATA
    McCrae, Scott
    Zakhor, Avideh
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2661 - 2665
  • [9] Anchor-Based Transformer for Temporal LiDAR 3D Object Detection
    Gu, Rongqi
    Wu, Fei
    Liu, Peigen
    Yang, Chu
    Lu, Yaohan
    Chen, Guang
    2024 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS, ICARM 2024, 2024, : 45 - 50
  • [10] Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver
    Liu, Xianpeng
    Zheng, Ce
    Cheng, Kelvin
    Xue, Nan
    Qi, Guo-Jun
    Wu, Tianfu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6413 - 6423