Query-based Temporal Fusion with Explicit Motion for 3D Object Detection

被引：0

作者：

Hou, Jinghua ^{[1
]}

Liu, Zhe ^{[1
]}

Liang, Dingkang ^{[1
]}

Zou, Zhikang ^{[2
]}

Ye, Xiaoqing ^{[2
]}

Bai, Xiang ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Wuhan, Hubei, Peoples R China

[2] Baidu Inc, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Effectively utilizing temporal information to improve 3D detection performance is vital for autonomous driving vehicles. Existing methods either conduct temporal fusion based on the dense BEV features or sparse 3D proposal features. However, the former does not pay more attention to foreground objects, leading to more computation costs and sub-optimal performance. The latter implements time-consuming operations to generate sparse 3D proposal features, and the performance is limited by the quality of 3D proposals. In this paper, we propose a simple and effective Query-based Temporal Fusion Network (QTNet). The main idea is to exploit the object queries in previous frames to enhance the representation of current object queries by the proposed Motion-guided Temporal Modeling (MTM) module, which utilizes the spatial position information of object queries along the temporal dimension to construct their relevance between adjacent frames reliably. Experimental results show our proposed QTNet outperforms BEV-based or proposal-based manners on the nuScenes dataset. Besides, the MTM is a plug-and-play module, which can be integrated into some advanced LiDAR-only or multi-modality 3D detectors and even brings new SOTA performance with negligible computation cost and latency on the nuScenes dataset. These experiments powerfully illustrate the superiority and generalization of our method. The code is available at https://github.com/AlmoonYsl/QTNet.

引用

页数：16

共 50 条

[1] ATTENTION DECOUPLING FOR QUERY-BASED OBJECT DETECTION
Ma, Jia-Wei
Liang, Min
Man, Haixia
Tian, Shu
Qin, Jingyan
Yin, Xu-Cheng
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2850 - 2854
[2] A Component for Query-based Object Detection in Crowded Scenes
Mao, Shuo
2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 205 - 209
[3] LiDAR-Based 3D Temporal Object Detection via Motion-Aware LiDAR Feature Fusion
Park, Gyuhee
Koh, Junho
Kim, Jisong
Moon, Jun
Choi, Jun Won
SENSORS, 2024, 24 (14)
[4] Query-based Composition of Animations for 3D Web Applications
Flotynski, Jakub
Krzyszkowski, Marcin
Walczak, Krzysztof
WEB3D 2018: THE 23RD INTERNATIONAL ACM CONFERENCE ON 3D WEB TECHNOLOGY, 2018,
[5] Frame Fusion with Vehicle Motion Prediction for 3D Object Detection
Li, Xirui
Wang, Feng
Wang, Naiyan
Ma, Chao
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 4252 - 4258
[6] Semantic Query-based Generation of Customized 3D Scenes
Walczak, Krzysztof
Flotynski, Jakub
WEB3D 2015, 2015, : 123 - 131
[7] Multimodal Object Query Initialization for 3D Object Detection
van Geerenstein, Mathijs R.
Ruppel, Felicia
Dietmayers, Klaus
Gavrila, Dariu M.
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 12484 - 12491
[8] Enhanced Training of Query-Based Object Detection via Selective Query Recollection
Chen, Fangyi
Zhang, Han
Hu, Kai
Huang, Yu-Kai
Zhu, Chenchen
Savvides, Marios
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23756 - 23765
[9] Temp-Frustum Net: 3D Object Detection with Temporal Fusion
Ercelik, Emec
Yurtsever, Ekim
Knoll, Alois
2021 32ND IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2021, : 1095 - 1101
[10] Slice coherence in a query-based architecture for 3D heterogeneous printing
Yaman, Ulas
Butt, Nabeel
Sacks, Elisha
Hoffmann, Christoph
COMPUTER-AIDED DESIGN, 2016, 75-76 : 27 - 38

← 1 2 3 4 5 →