MonoPSTR: Monocular 3-D Object Detection With Dynamic Position and Scale-Aware Transformer

被引:0
|
作者
Yang, Fan [1 ]
He, Xuan [2 ]
Chen, Wenrui [1 ,3 ]
Zhou, Pengjie [2 ]
Li, Zhiyong [2 ,3 ]
机构
[1] Hunan Univ, Sch Robot, Changsha 410012, Peoples R China
[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China
[3] Hunan Univ, Natl Engn Res Ctr Robot Visual Percept & Control T, Changsha 410082, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Transformers; Object detection; Decoding; Training; Accuracy; Feature extraction; Autonomous driving; monocular 3-D object detection; robotics; scene understanding; transformer;
D O I
10.1109/TIM.2024.3415231
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Transformer-based approaches have demonstrated outstanding performance in monocular 3-D object detection, which involves predicting 3-D attributes from a single 2-D image. These transformer-based methods typically rely on visual and depth representations to identify crucial queries related to objects. However, the feature and location of queries are expected to learn adaptively without any prior knowledge, which often leads to an imprecise location in some complex scenes and a long-time training process. To overcome this limitation, we present MonoPSTR, which employs a dynamic position and scale-aware transformer for monocular 3-D detection. Our approach introduces a dynamically and explicitly position-coded query (DEP-query) and a scale-assisted deformable attention (SDA) module to help the raw query possess valuable spatial and content cues. Specifically, the DEP-query employs explicit position priors of 3-D projection coordinates to enhance the accuracy of query localization, thereby enabling the attention layers in the decoder to avoid noisy background information. The SDA module optimizes the receptive field learning of queries by the size priors of the corresponding 2-D boxes; thus, the queries could acquire high-quality visual features. Both the position and size priors do not require any additional data and are updated in each layer of the decoder to provide long-term assistance. Extensive experiments show that our model outperforms all the existing methods in terms of inference speed, which reaches the impressive 62.5 FPS. What is more, compared with the backbone MonoDETR, our MonoPSTR achieves around two times of training convergence speed and surpasses its precision by over 1.15% on famous KITTI dataset, demonstrating the sufficient practical value. The code is available at: https://github.com/yangfan293/MonoPSTR/tree/master/MonoPSTR.
引用
收藏
页码:1 / 1
页数:13
相关论文
共 50 条
  • [1] SSD-MonoDETR: Supervised Scale-Aware Deformable Transformer for Monocular 3D Object Detection
    He, Xuan
    Yang, Fan
    Yang, Kailun
    Lin, Jiacheng
    Fu, Haolong
    Wang, Meng
    Yuan, Jin
    Li, Zhiyong
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 555 - 567
  • [2] Scale-Aware Automatic Augmentations for Object Detection With Dynamic Training
    Chen, Yukang
    Zhang, Peizhen
    Kong, Tao
    Li, Yanwei
    Zhang, Xiangyu
    Qi, Lu
    Sun, Jian
    Jia, Jiaya
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2367 - 2383
  • [3] MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
    Huang, Kuan-Chih
    Wu, Tsung-Han
    Su, Hung-Ting
    Hsu, Winston H.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4002 - 4011
  • [4] Scale-Aware Trident Networks for Object Detection
    Li, Yanghao
    Chen, Yuntao
    Wang, Naiyan
    Zhang, Zhaoxiang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6053 - 6062
  • [5] Scale-aware Automatic Augmentation for Object Detection
    Chen, Yukang
    Li, Yanwei
    Kong, Tao
    Qi, Lu
    Chu, Ruihang
    Li, Lei
    Jia, Jiaya
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9558 - 9567
  • [6] MonoCAPE: Monocular 3D object detection with coordinate-aware position embeddings
    Chen, Wenyu
    Chen, Mu
    Fang, Jian
    Zhao, Huaici
    Wang, Guogang
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 120
  • [7] Scale-aware feature pyramid architecture for marine object detection
    Fengqiang Xu
    Huibing Wang
    Jinjia Peng
    Xianping Fu
    Neural Computing and Applications, 2021, 33 : 3637 - 3653
  • [8] AdaZoom: Towards Scale-Aware Large Scene Object Detection
    Xu, Jingtao
    Li, Ya-Li
    Wang, Shengjin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4598 - 4609
  • [9] Scale-Aware Squeeze-and-Excitation for Lightweight Object Detection
    Xu, Zhihua
    Hong, Xiaobin
    Chen, Tianshui
    Yang, Zhijing
    Shi, Yukai
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (01) : 49 - 56
  • [10] Scale-aware camera localization in 3D LiDAR maps with a monocular visual odometry
    Sun, Manhui
    Yang, Shaowu
    Liu, Henzhu
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)