MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

被引:42
|
作者
Huang, Kuan-Chih [1 ]
Wu, Tsung-Han [1 ]
Su, Hung-Ting [1 ]
Hsu, Winston H. [1 ,2 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
[2] Mobile Drive Technol, New Taipei, Taiwan
关键词
D O I
10.1109/CVPR52688.2022.00398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monocular 3D object detection is an important yet challenging task in autonomous driving. Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. It mainly consists of two components: (1) the Depth-Aware Feature Enhancement (DFE) module that implicitly learns depth-aware features with auxiliary supervision without requiring extra computation, and (2) the Depth-Aware Transformer (DTR) module that globally integrates context- and depth-aware features. Moreover, different from conventional pixel-wise positional encodings, we introduce a novel depth positional encoding (DPE) to inject depth positional hints into transformers. Our proposed depth-aware modules can be easily plugged into existing image-only monocular 3D object detectors to improve the performance. Extensive experiments on the KITTI dataset demonstrate that our approach outperforms previous state-of-the-art monocularbased methods and achieves real-time detection.
引用
收藏
页码:4002 / 4011
页数:10
相关论文
共 50 条
  • [1] MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
    Zhang, Renrui
    Qiu, Han
    Wang, Tai
    Guo, Ziyu
    Cui, Ziteng
    Qiao, Yu
    Li, Hongsheng
    Gao, Peng
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9121 - 9132
  • [2] Task-Aware Monocular Depth Estimation for 3D Object Detection
    Wang, Xinlong
    Yin, Wei
    Kong, Tao
    Jiang, Yuning
    Li, Lei
    Shen, Chunhua
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12257 - 12264
  • [3] DAFormer: Depth-aware 3D Object Detection Guided by Camera Model via Transformers
    Gao, Junbin
    Ruan, Hao
    Xu, Bingrong
    Zeng, Zhigang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON CYBORG AND BIONIC SYSTEMS, CBS, 2022, : 170 - 175
  • [4] DAST: Depth-Aware Assessment and Synthesis Transformer for RGB-D Salient Object Detection
    Xia, Chenxing
    Duan, Songsong
    Fang, Xianjin
    Ge, Bin
    Gao, Xiuju
    Cui, Jianhua
    [J]. PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 473 - 487
  • [5] Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
    Cong Pan
    Junran Peng
    Zhaoxiang Zhang
    [J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11 (03) : 673 - 689
  • [6] Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
    Pan, Cong
    Peng, Junran
    Zhang, Zhaoxiang
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (03) : 673 - 689
  • [7] Shape-Aware Monocular 3D Object Detection
    Chen, Wei
    Zhao, Jie
    Zhao, Wan-Lei
    Wu, Song-Yuan
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (06) : 6416 - 6424
  • [8] Monocular 3D Object Detection with Depth from Motion
    Wang, Tai
    Pang, Jiangmiao
    Lin, Dahua
    [J]. COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 386 - 403
  • [9] Object-Aware Centroid Voting for Monocular 3D Object Detection
    Bao, Wentao
    Yu, Qi
    Kong, Yu
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 2197 - 2204
  • [10] Depth-aware lightweight network for RGB-D salient object detection
    Ling, Liuyi
    Wang, Yiwen
    Wang, Chengjun
    Xu, Shanyong
    Huang, Yourui
    [J]. IET IMAGE PROCESSING, 2023, 17 (08) : 2350 - 2361