MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer

被引:5
|
作者
Zhou, Yunsong [1 ]
Zhu, Hongzi [1 ]
Liu, Quan [1 ]
Chang, Shan [2 ]
Guo, Minyi [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Donghua Univ, Shanghai, Peoples R China
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01678
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mobile monocular 3D object detection (Mono3D) (e.g., on a vehicle, a drone, or a robot) is an important yet challenging task. Existing transformer-based offline Mono3D models adopt grid-based vision tokens, which is suboptimal when using coarse tokens due to the limited available computational power. In this paper, we propose an online Mono3D framework, called MonoATT, which leverages a novel vision transformer with heterogeneous tokens of varying shapes and sizes to facilitate mobile Mono3D. The core idea of MonoATT is to adaptively assign finer tokens to areas of more significance before utilizing a transformer to enhance Mono3D. To this end, we first use prior knowledge to design a scoring network for selecting the most important areas of the image, and then propose a token clustering and merging network with an attention mechanism to gradually merge tokens around the selected areas in multiple stages. Finally, a pixel-level feature map is reconstructed from heterogeneous tokens before employing a SOTA Mono3D detector as the underlying detection core. Experiment results on the real-world KITTI dataset demonstrate that MonoATT can effectively improve the Mono3D accuracy for both near and far objects and guarantee low latency. MonoATT yields the best performance compared with the state-of-the-art methods by a large margin and is ranked number one on the KITTI 3D benchmark.
引用
收藏
页码:17493 / 17503
页数:11
相关论文
共 50 条
  • [1] MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
    Huang, Kuan-Chih
    Wu, Tsung-Han
    Su, Hung-Ting
    Hsu, Winston H.
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4002 - 4011
  • [2] Monocular 3D Object Detection for Autonomous Driving Based on Contextual Transformer
    She, Xiangyang
    Yan, Weijia
    Dong, Lihong
    [J]. Computer Engineering and Applications, 2024, 60 (19) : 178 - 189
  • [3] MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
    Zhang, Renrui
    Qiu, Han
    Wang, Tai
    Guo, Ziyu
    Cui, Ziteng
    Qiao, Yu
    Li, Hongsheng
    Gao, Peng
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9121 - 9132
  • [4] Efficient Transformer-based 3D Object Detection with Dynamic Token Halting
    Ye, Mao
    Meyer, Gregory P.
    Chai, Yuning
    Liu, Qiang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8404 - 8416
  • [5] Aerial Monocular 3D Object Detection
    Hu, Yue
    Fang, Shaoheng
    Xie, Weidi
    Chen, Siheng
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (04) : 1959 - 1966
  • [6] Disentangling Monocular 3D Object Detection
    Simonelli, Andrea
    Bulo, Samuel Rota
    Porzi, Lorenzo
    Lopez-Antequera, Manuel
    Kontschieder, Peter
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1991 - 1999
  • [7] Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
    Cong Pan
    Junran Peng
    Zhaoxiang Zhang
    [J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11 (03) : 673 - 689
  • [8] Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
    Pan, Cong
    Peng, Junran
    Zhang, Zhaoxiang
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (03) : 673 - 689
  • [9] Monocular 3D Object Detection for Autonomous Driving
    Chen, Xiaozhi
    Kundu, Kaustav
    Zhang, Ziyu
    Ma, Huimin
    Fidler, Sanja
    Urtasun, Raquel
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2147 - 2156
  • [10] Dimension Embeddings for Monocular 3D Object Detection
    Zhang, Yunpeng
    Zheng, Wenzhao
    Zhu, Zheng
    Huang, Guan
    Du, Dalong
    Zhou, Jie
    Lu, Jiwen
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1579 - 1588