MonoPSTR: Monocular 3-D Object Detection With Dynamic Position and Scale-Aware Transformer

被引:0
|
作者
Yang, Fan [1 ]
He, Xuan [2 ]
Chen, Wenrui [1 ,3 ]
Zhou, Pengjie [2 ]
Li, Zhiyong [2 ,3 ]
机构
[1] Hunan Univ, Sch Robot, Changsha 410012, Peoples R China
[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China
[3] Hunan Univ, Natl Engn Res Ctr Robot Visual Percept & Control T, Changsha 410082, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Transformers; Object detection; Decoding; Training; Accuracy; Feature extraction; Autonomous driving; monocular 3-D object detection; robotics; scene understanding; transformer;
D O I
10.1109/TIM.2024.3415231
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Transformer-based approaches have demonstrated outstanding performance in monocular 3-D object detection, which involves predicting 3-D attributes from a single 2-D image. These transformer-based methods typically rely on visual and depth representations to identify crucial queries related to objects. However, the feature and location of queries are expected to learn adaptively without any prior knowledge, which often leads to an imprecise location in some complex scenes and a long-time training process. To overcome this limitation, we present MonoPSTR, which employs a dynamic position and scale-aware transformer for monocular 3-D detection. Our approach introduces a dynamically and explicitly position-coded query (DEP-query) and a scale-assisted deformable attention (SDA) module to help the raw query possess valuable spatial and content cues. Specifically, the DEP-query employs explicit position priors of 3-D projection coordinates to enhance the accuracy of query localization, thereby enabling the attention layers in the decoder to avoid noisy background information. The SDA module optimizes the receptive field learning of queries by the size priors of the corresponding 2-D boxes; thus, the queries could acquire high-quality visual features. Both the position and size priors do not require any additional data and are updated in each layer of the decoder to provide long-term assistance. Extensive experiments show that our model outperforms all the existing methods in terms of inference speed, which reaches the impressive 62.5 FPS. What is more, compared with the backbone MonoDETR, our MonoPSTR achieves around two times of training convergence speed and surpasses its precision by over 1.15% on famous KITTI dataset, demonstrating the sufficient practical value. The code is available at: https://github.com/yangfan293/MonoPSTR/tree/master/MonoPSTR.
引用
收藏
页码:1 / 1
页数:13
相关论文
共 50 条
  • [41] Task-Aware Monocular Depth Estimation for 3D Object Detection
    Wang, Xinlong
    Yin, Wei
    Kong, Tao
    Jiang, Yuning
    Li, Lei
    Shen, Chunhua
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12257 - 12264
  • [42] Transformer fusion-based scale-aware attention network for multispectral victim detection
    Chen, Yunfan
    Li, Yuting
    Zheng, Wenqi
    Wan, Xiangkui
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 6619 - 6632
  • [43] MonoDCN: Monocular 3D object detection based on dynamic convolution
    Qu, Shenming
    Yang, Xinyu
    Gao, Yiming
    Liang, Shengbin
    PLOS ONE, 2022, 17 (10):
  • [44] ODD-M3D: Object-Wise Dense Depth Estimation for Monocular 3-D Object Detection
    Park, Chanyeong
    Kim, Heegwang
    Jang, Junbo
    Paik, Joonki
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 646 - 655
  • [45] 3-D Object Recognition via Aspect Graph Aware 3-D Object Representation
    Hu, Mengjie
    Wei, Zhenzhong
    Shao, Mingwei
    Zhang, Guangjun
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (12) : 2359 - 2363
  • [46] D2D: Divide to Detect, A Scale-Aware Framework for On-Road Object Detection Using IR Camera
    Luu, Van-Tin
    Tran, Vu-Hoang
    Poliakov, Egor
    Huang, Ching-Chun
    2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE, 2023,
  • [47] Scale-Aware Object Tracking with Convex Shape Constraints on RGB-D Images
    Klodt, Maria
    Sturm, Juergen
    Cremers, Daniel
    PATTERN RECOGNITION, GCPR 2013, 2013, 8142 : 111 - 120
  • [48] Occlusion-Aware Plane-Constraints for Monocular 3D Object Detection
    Yao, Hongdou
    Chen, Jun
    Wang, Zheng
    Wang, Xiao
    Han, Pengfei
    Chai, Xiaoyu
    Qiu, Yansheng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (05) : 4593 - 4605
  • [49] MonoGAE: Roadside Monocular 3D Object Detection With Ground-Aware Embeddings
    Yang, Lei
    Zhang, Xinyu
    Yu, Jiaxin
    Li, Jun
    Zhao, Tong
    Wang, Li
    Huang, Yi
    Zhang, Chuang
    Wang, Hong
    Li, Yiming
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 17587 - 17601
  • [50] 3-D position analysis of an object using a monocular USB port camera through java']java
    Lee, DY
    Ji, CH
    Ha, HK
    Jeong, SG
    Keh, JE
    Choi, KK
    Kim, CD
    Lee, MH
    ISIE 2001: IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS PROCEEDINGS, VOLS I-III, 2001, : 2028 - 2032