MonoPSTR: Monocular 3-D Object Detection With Dynamic Position and Scale-Aware Transformer

被引:0
|
作者
Yang, Fan [1 ]
He, Xuan [2 ]
Chen, Wenrui [1 ,3 ]
Zhou, Pengjie [2 ]
Li, Zhiyong [2 ,3 ]
机构
[1] Hunan Univ, Sch Robot, Changsha 410012, Peoples R China
[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China
[3] Hunan Univ, Natl Engn Res Ctr Robot Visual Percept & Control T, Changsha 410082, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Transformers; Object detection; Decoding; Training; Accuracy; Feature extraction; Autonomous driving; monocular 3-D object detection; robotics; scene understanding; transformer;
D O I
10.1109/TIM.2024.3415231
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Transformer-based approaches have demonstrated outstanding performance in monocular 3-D object detection, which involves predicting 3-D attributes from a single 2-D image. These transformer-based methods typically rely on visual and depth representations to identify crucial queries related to objects. However, the feature and location of queries are expected to learn adaptively without any prior knowledge, which often leads to an imprecise location in some complex scenes and a long-time training process. To overcome this limitation, we present MonoPSTR, which employs a dynamic position and scale-aware transformer for monocular 3-D detection. Our approach introduces a dynamically and explicitly position-coded query (DEP-query) and a scale-assisted deformable attention (SDA) module to help the raw query possess valuable spatial and content cues. Specifically, the DEP-query employs explicit position priors of 3-D projection coordinates to enhance the accuracy of query localization, thereby enabling the attention layers in the decoder to avoid noisy background information. The SDA module optimizes the receptive field learning of queries by the size priors of the corresponding 2-D boxes; thus, the queries could acquire high-quality visual features. Both the position and size priors do not require any additional data and are updated in each layer of the decoder to provide long-term assistance. Extensive experiments show that our model outperforms all the existing methods in terms of inference speed, which reaches the impressive 62.5 FPS. What is more, compared with the backbone MonoDETR, our MonoPSTR achieves around two times of training convergence speed and surpasses its precision by over 1.15% on famous KITTI dataset, demonstrating the sufficient practical value. The code is available at: https://github.com/yangfan293/MonoPSTR/tree/master/MonoPSTR.
引用
收藏
页码:1 / 1
页数:13
相关论文
共 50 条
  • [21] Global to Local: A Scale-Aware Network for Remote Sensing Object Detection
    Gao, Tao
    Niu, Qianqian
    Zhang, Jing
    Chen, Ting
    Mei, Shaohui
    Jubair, Ahmad
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [22] SARD: Towards Scale-Aware Rotated Object Detection in Aerial Imagery
    Wang, Yashan
    Zhang, Yue
    Zhang, Yi
    Zhao, Liangjin
    Sun, Xuewen
    Guo, Zhi
    IEEE ACCESS, 2019, 7 : 173855 - 173865
  • [23] Scale-Aware Attention-Based PillarsNet (SAPN) Based 3D Object Detection for Point Cloud
    Song, Xiang
    Zhan, Weiqin
    Che, Xiaoyu
    Jiang, Huilin
    Yang, Biao
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [24] Object Detection for Traffic Scenarios Based on Scale-Aware Label Assignment and Dynamic Class Suppression Loss
    Ma, Yalong
    Xiong, Zhongxia
    Song, Tao
    He, Shan
    Yao, Ziying
    Wu, Xinkai
    CICTP 2023: INNOVATION-EMPOWERED TECHNOLOGY FOR SUSTAINABLE, INTELLIGENT, DECARBONIZED, AND CONNECTED TRANSPORTATION, 2023, : 1061 - 1073
  • [25] An enhanced vision transformer with scale-aware and spatial-aware attention for thighbone fracture detection
    Guan B.
    Yao J.
    Zhang G.
    Neural Computing and Applications, 2024, 36 (19) : 11425 - 11438
  • [26] A Scale-Aware Pyramid Network for Multi-Scale Object Detection in SAR Images
    Tang, Linbo
    Tang, Wei
    Qu, Xin
    Han, Yuqi
    Wang, Wenzheng
    Zhao, Baojun
    REMOTE SENSING, 2022, 14 (04)
  • [27] CubeSLAM: Monocular 3-D Object SLAM
    Yang, Shichao
    Scherer, Sebastian
    IEEE TRANSACTIONS ON ROBOTICS, 2019, 35 (04) : 925 - 938
  • [28] Dynamic graph transformer for 3D object detection
    Ren, Siyuan
    Pan, Xiao
    Zhao, Wenjie
    Nie, Binling
    Han, Bo
    KNOWLEDGE-BASED SYSTEMS, 2023, 259
  • [29] Scale-Aware Regional Collective Feature Enhancement Network for Scene Object Detection
    Li, Yiyao
    Liu, Jin
    Gao, Zhenyu
    NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6289 - 6310
  • [30] Enhancing Monocular 3-D Object Detection Through Data Augmentation Strategies
    Jia, Yisong
    Wang, Jue
    Pan, Huihui
    Sun, Weichao
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 11