SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection From Multi-view Camera Images With Global Cross-Sensor Attention

被引:7
|
作者
Doll, Simon [1 ,3 ]
Schulz, Richard [1 ]
Schneider, Lukas [1 ]
Benzin, Viviane [1 ]
Enzweiler, Markus [2 ]
Lensch, Hendrik P. A. [3 ]
机构
[1] Mercedes Benz, Stuttgart, Germany
[2] Esslingen Univ Appl Sci, Stuttgart, Germany
[3] Univ Tubingen, Tubingen, Germany
来源
关键词
3D object detection; Cross-sensor attention; Autonomous driving;
D O I
10.1007/978-3-031-19842-7_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Based on the key idea of DETR this paper introduces an object-centric 3D object detection framework that operates on a limited number of 3D object queries instead of dense bounding box proposals followed by non-maximum suppression. After image feature extraction a decoder-only transformer architecture is trained on a set-based loss. SpatialDETR infers the classification and bounding box estimates based on attention both spatially within each image and across the different views. To fuse the multi-view information in the attention block we introduce a novel geometric positional encoding that incorporates the view ray geometry to explicitly consider the extrinsic and intrinsic camera setup. This way, the spatially-aware cross-view attention exploits arbitrary receptive fields to integrate cross-sensor data and therefore global context. Extensive experiments on the nuScenes benchmark demonstrate the potential of global attention and result in state-of-the-art performance. Code available at https://github.com/cgtuebingen/SpatialDETR.
引用
收藏
页码:230 / 245
页数:16
相关论文
共 50 条
  • [1] A Transformer-based Network for Multi-view 3D Mesh Generation
    Shi, Wuzhen
    Liu, Zhijie
    Li, Yingxiang
    Wen, Yang
    Liu, Yutao
    Proceedings - 2023 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Autonomous and Trusted Vehicles, Scalable Computing and Communications, Digital Twin, Privacy Computing and Data Security, Metaverse, SmartWorld/UIC/ATC/ScalCom/DigitalTwin/PCDS/Metaverse 2023, 2023,
  • [2] CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
    Xiong, Kaixin
    Gong, Shi
    Ye, Xiaoqing
    Tan, Xiao
    Wan, Ji
    Ding, Errui
    Wang, Jingdong
    Bai, Xiang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21570 - 21579
  • [3] Transformer-Based Global PointPillars 3D Object Detection Method
    Zhang, Lin
    Meng, Hua
    Yan, Yunbing
    Xu, Xiaowei
    ELECTRONICS, 2023, 12 (14)
  • [4] OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection
    Qi, Zhangyang
    Wang, Jiaqi
    Wu, Xiaoyang
    Zhao, Hengshuang
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 1188 - 1197
  • [5] TransCAR: Transformer-based Camera-And-Radar Fusion for 3D Object Detection
    Pang, Su
    Morris, Daniel
    Radha, Hayder
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 10902 - 10909
  • [6] Transformer-Based Stereo-Aware 3D Object Detection From Binocular Images
    Sun, Hanqing
    Pang, Yanwei
    Cao, Jiale
    Xie, Jin
    Li, Xuelong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 19675 - 19687
  • [7] MLOD: A multi-view 3D object detection based on robust feature fusion method
    Deng, Jian
    Czarnecki, Krzysztof
    2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 279 - 284
  • [8] 3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection
    Shu, Changyong
    Deng, Jiajun
    Yu, Fisher
    Liu, Yifan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3557 - 3566
  • [9] A Coarse-to-Fine Transformer-Based Network for 3D Reconstruction from Non-Overlapping Multi-View Images
    Shan, Yue
    Xiao, Jun
    Liu, Lupeng
    Wang, Yunbiao
    Yu, Dongbo
    Zhang, Wenniu
    REMOTE SENSING, 2024, 16 (05)
  • [10] Multi-view convolutional vision transformer for 3D object recognition
    Li, Jie
    Liu, Zhao
    Li, Li
    Lin, Junqin
    Yao, Jian
    Tu, Jingmin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95