SF3D: SlowFast Temporal 3D Object Detection

被引:0
|
作者
Wang, Renhao [1 ,2 ]
Yu, Zhiding [2 ]
Lan, Shiyi [2 ]
Xie, Enze [2 ,3 ]
Chen, Ke [2 ]
Anandkumar, Anima [2 ,4 ]
Alvarez, Jose M. [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] NVIDIA, Santa Clara, CA USA
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] CALTECH, Pasadena, CA 91125 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Leveraging inputs over multiple consecutive frames has been shown to benefit 3D object detection. However, existing approaches often demonstrate unsatisfactory scaling with increasing temporal histories. In this work, we propose SF3D, a late fusion module which addresses this issue by better modeling temporal relationships via a two-stream factorization. Concretely, SF3D operates on an input sequence of consecutive bird's-eye view (BEV) features, which is partitioned into "short-term" and "long-term" frames. A more heavily parameterized short-term branch using adapters and deformable attention aggregates features closer to the current timestep. In parallel, a long-term branch composed of efficiently implemented global convolution layers aggregates a larger window of temporally distant historical features. This two-stream paradigm allows SF3D to effectively consume near-term information, while scaling to efficiently leverage longer historical windows. We show that SF3D works with arbitrary upstream BEV encoders and downstream detectors, achieving improvements over recent state-of-the-art on the Waymo Open and nuScenes benchmarks.
引用
收藏
页码:1280 / 1285
页数:6
相关论文
共 50 条
  • [41] Multiview 3D Object Detection Based on Improved DETR3D
    Zhang, Yuhan
    Huang, Miaohua
    Chen, Gengyao
    Li, Yanzhou
    Wu, Yiming
    LASER & OPTOELECTRONICS PROGRESS, 2025, 62 (02)
  • [42] RoadSense3D: A Framework for Roadside Monocular 3D Object Detection
    Carta, Salvatore
    Castrillon-Santana, Modesto
    Marras, Mirko
    Mohamed, Sondos
    Podda, Alessandro Sebastian
    Saia, Roberto
    Sau, Marco
    Zimmer, Walter
    ADJUNCT PROCEEDINGS OF THE 32ND ACM CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION, UMAP 2024, 2024, : 452 - 459
  • [43] GPro3D: Deriving 3D BBox from ground plane in monocular 3D object detection
    Yang, Fan
    Xu, Xinhao
    Chen, Hui
    Guo, Yuchen
    He, Yuwei
    Ni, Kai
    Ding, Guiguang
    NEUROCOMPUTING, 2023, 562
  • [44] IoU Loss for 2D/3D Object Detection
    Zhou, Dingfu
    Fang, Jin
    Song, Xibin
    Guan, Chenye
    Yin, Junbo
    Dai, Yuchao
    Yang, Ruigang
    2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 85 - 94
  • [45] Belief propagation in a 3D spatio-temporal MRF for moving object detection
    Yin, Zhaozheng
    Collins, Robert
    2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 1635 - +
  • [46] STFNET: Sparse Temporal Fusion for 3D Object Detection in LiDAR Point Cloud
    Meng, Xin
    Zhou, Yuan
    Ma, Jun
    Jiang, Fangdi
    Qi, Yongze
    Wang, Cui
    Kim, Jonghyuk
    Wang, Shifeng
    IEEE SENSORS JOURNAL, 2025, 25 (03) : 5866 - 5877
  • [47] Enhance the 3D Object Detection With 2D Prior
    Liu, Cheng
    IEEE ACCESS, 2024, 12 : 67161 - 67169
  • [48] LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection
    He, Tong
    Sun, Pei
    Leng, Zhaoqi
    Liu, Chenxi
    Anguelov, Dragomir
    Tan, Mingxing
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1637 - 1644
  • [49] Query-based Temporal Fusion with Explicit Motion for 3D Object Detection
    Hou, Jinghua
    Liu, Zhe
    Liang, Dingkang
    Zou, Zhikang
    Ye, Xiaoqing
    Bai, Xiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] 3D OBJECT RETRIEVAL BY 3D CURVE MATCHING
    Feinen, Christian
    Czajkowska, Joanna
    Grzegorzek, Marcin
    Latecki, Longin Jan
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 2749 - 2753