SF3D: SlowFast Temporal 3D Object Detection

被引：0

作者：

Wang, Renhao ^{[1
,2
]}

Yu, Zhiding ^{[2
]}

Lan, Shiyi ^{[2
]}

Xie, Enze ^{[2
,3
]}

Chen, Ke ^{[2
]}

Anandkumar, Anima ^{[2
,4
]}

Alvarez, Jose M. ^{[2
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] NVIDIA, Santa Clara, CA USA

[3] Univ Hong Kong, Hong Kong, Peoples R China

[4] CALTECH, Pasadena, CA 91125 USA

来源：

2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Leveraging inputs over multiple consecutive frames has been shown to benefit 3D object detection. However, existing approaches often demonstrate unsatisfactory scaling with increasing temporal histories. In this work, we propose SF3D, a late fusion module which addresses this issue by better modeling temporal relationships via a two-stream factorization. Concretely, SF3D operates on an input sequence of consecutive bird's-eye view (BEV) features, which is partitioned into "short-term" and "long-term" frames. A more heavily parameterized short-term branch using adapters and deformable attention aggregates features closer to the current timestep. In parallel, a long-term branch composed of efficiently implemented global convolution layers aggregates a larger window of temporally distant historical features. This two-stream paradigm allows SF3D to effectively consume near-term information, while scaling to efficiently leverage longer historical windows. We show that SF3D works with arbitrary upstream BEV encoders and downstream detectors, achieving improvements over recent state-of-the-art on the Waymo Open and nuScenes benchmarks.

引用

页码：1280 / 1285

页数：6

共 50 条

[41] Multiview 3D Object Detection Based on Improved DETR3D
Zhang, Yuhan
Huang, Miaohua
Chen, Gengyao
Li, Yanzhou
Wu, Yiming
LASER & OPTOELECTRONICS PROGRESS, 2025, 62 (02)
[42] RoadSense3D: A Framework for Roadside Monocular 3D Object Detection
Carta, Salvatore
Castrillon-Santana, Modesto
Marras, Mirko
Mohamed, Sondos
Podda, Alessandro Sebastian
Saia, Roberto
Sau, Marco
Zimmer, Walter
ADJUNCT PROCEEDINGS OF THE 32ND ACM CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION, UMAP 2024, 2024, : 452 - 459
[43] GPro3D: Deriving 3D BBox from ground plane in monocular 3D object detection
Yang, Fan
Xu, Xinhao
Chen, Hui
Guo, Yuchen
He, Yuwei
Ni, Kai
Ding, Guiguang
NEUROCOMPUTING, 2023, 562
[44] IoU Loss for 2D/3D Object Detection
Zhou, Dingfu
Fang, Jin
Song, Xibin
Guan, Chenye
Yin, Junbo
Dai, Yuchao
Yang, Ruigang
2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 85 - 94
[45] Belief propagation in a 3D spatio-temporal MRF for moving object detection
Yin, Zhaozheng
Collins, Robert
2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 1635 - +
[46] STFNET: Sparse Temporal Fusion for 3D Object Detection in LiDAR Point Cloud
Meng, Xin
Zhou, Yuan
Ma, Jun
Jiang, Fangdi
Qi, Yongze
Wang, Cui
Kim, Jonghyuk
Wang, Shifeng
IEEE SENSORS JOURNAL, 2025, 25 (03) : 5866 - 5877
[47] Enhance the 3D Object Detection With 2D Prior
Liu, Cheng
IEEE ACCESS, 2024, 12 : 67161 - 67169
[48] LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection
He, Tong
Sun, Pei
Leng, Zhaoqi
Liu, Chenxi
Anguelov, Dragomir
Tan, Mingxing
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1637 - 1644
[49] Query-based Temporal Fusion with Explicit Motion for 3D Object Detection
Hou, Jinghua
Liu, Zhe
Liang, Dingkang
Zou, Zhikang
Ye, Xiaoqing
Bai, Xiang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[50] 3D OBJECT RETRIEVAL BY 3D CURVE MATCHING
Feinen, Christian
Czajkowska, Joanna
Grzegorzek, Marcin
Latecki, Longin Jan
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 2749 - 2753

← 1 2 3 4 5 →