CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

被引:0
|
作者
Tseng, Ching-Yu [1 ]
Chen, Yi-Rong [1 ]
Lee, Hsin-Ying [1 ]
Wu, Tsung-Han [1 ]
Chen, Wen-Chin [1 ]
Hsu, Winston H. [1 ,2 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
[2] Mobile Drive Technol, Amsterdam, Netherlands
关键词
D O I
10.1109/ICRA48891.2023.10161451
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches. However, due to the lack of accurate estimated depth, existing multi-camera methods often generate multiple bounding boxes along a ray of depth direction for difficult small objects such as pedestrians, resulting in an extremely low recall. Furthermore, directly applying depth prediction modules to existing multi-camera methods, generally composed of large network architectures, cannot meet the real-time requirements of self-driving applications. To address these issues, we propose Cross-view and Depth-guided Transformers for 3D Object Detection, CrossDTR. First, our lightweight depth predictor is designed to produce precise object-wise sparse depth maps and low-dimensional depth embeddings without extra depth datasets during supervision. Second, a cross-view depth-guided transformer is developed to fuse the depth embeddings as well as image features from cameras of different views and generate 3D bounding boxes. Extensive experiments demonstrated that our method hugely surpassed existing multi-camera methods by 10 percent in pedestrian detection and about 3 percent in overall mAP and NDS metrics. Also, computational analyses showed that our method is 5 times faster than prior approaches. Our codes will be made publicly available at https://github.com/sty61010/CrossDTR.
引用
收藏
页码:4850 / 4857
页数:8
相关论文
共 50 条
  • [1] MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
    Zhang, Renrui
    Qiu, Han
    Wang, Tai
    Guo, Ziyu
    Cui, Ziteng
    Qiao, Yu
    Li, Hongsheng
    Gao, Peng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9121 - 9132
  • [2] Learning Depth-Guided Convolutions for Monocular 3D Object Detection
    Ng, Mingyu
    Huo, Yuqi
    Yi, Hongwei
    Wang, Zhe
    Shi, Jianping
    Lu, Zhiwu
    Luo, Ping
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4306 - 4315
  • [3] Revisiting Depth-guided Methods for Monocular 3D Object Detection by Hierarchical Balanced Depth
    Chen, Yi-Rong
    Tseng, Ching-Yu
    Liou, Yi-Syuan
    Wu, Tsung-Han
    Hsu, Winston H.
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [4] Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
    Cong Pan
    Junran Peng
    Zhaoxiang Zhang
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (03) : 673 - 689
  • [5] Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
    Pan, Cong
    Peng, Junran
    Zhang, Zhaoxiang
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (03) : 673 - 689
  • [6] DGT: Depth-guided RGB-D occluded target detection with transformers
    Kelei Xu
    Chunyan Wang
    Wanzhong Zhao
    Jinqiang Liu
    Applied Intelligence, 2025, 55 (5)
  • [7] Depth-Guided Progressive Network for Object Detection
    Ma, Jia-Wei
    Liang, Min
    Chen, Song-Lu
    Chen, Feng
    Tian, Shu
    Qin, Jingyan
    Yin, Xu-Cheng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) : 19523 - 19533
  • [8] ACDet: Attentive Cross-view Fusion for LiDAR-based 3D Object Detection
    Xu, Jiaolong
    Wang, Guojun
    Zhang, Xiao
    Wan, Guowei
    2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 74 - 83
  • [9] VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
    Deng, Shengheng
    Liang, Zhihao
    Sun, Lin
    Jia, Kui
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8438 - 8447
  • [10] DAFormer: Depth-aware 3D Object Detection Guided by Camera Model via Transformers
    Gao, Junbin
    Ruan, Hao
    Xu, Bingrong
    Zeng, Zhigang
    2022 IEEE INTERNATIONAL CONFERENCE ON CYBORG AND BIONIC SYSTEMS, CBS, 2022, : 170 - 175