YOLOPose V2: Understanding and improving transformer-based 6D pose estimation

被引:12
|
作者
Periyasamy, Arul Selvam [1 ]
Amini, Arash [1 ]
Tsaturyan, Vladimir [1 ]
Behnke, Sven [1 ]
机构
[1] Univ Bonn, Autonomous Intelligent Syst, Bonn, Germany
关键词
Vision transformers; Object pose estimation; Object detection; CALIBRATION;
D O I
10.1016/j.robot.2023.104490
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state -of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression and an improved variant of the YOLOPose model. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods. We analyze the role of object queries in our architecture and reveal that the object queries specialize in detecting objects in specific image regions. Furthermore, we quantify the accuracy trade-off of using datasets of smaller sizes to train our model. & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] YOLOPose: Transformer-Based Multi-object 6D Pose Estimation Using Keypoint Regression
    Amini, Arash
    Periyasamy, Arul Selvam
    Behnke, Sven
    INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17, 2023, 577 : 392 - 406
  • [2] A Transformer-based multi-modal fusion network for 6D pose estimation
    Hong, Jia-Xin
    Zhang, Hong-Bo
    Liu, Jing-Hua
    Lei, Qing
    Yang, Li-Jie
    Du, Ji-Xiang
    INFORMATION FUSION, 2024, 105
  • [3] TPSFusion: A Transformer-based pyramid screening fusion network for 6D pose estimation
    Zhu, Jiaqi
    Li, Bin
    Zhao, Xinhua
    IMAGE AND VISION COMPUTING, 2025, 154
  • [4] 6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-Based Instance Representation Learning
    Zou, Lu
    Huang, Zhangjin
    Gu, Naijie
    Wang, Guoping
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6907 - 6921
  • [5] HFT6D: Multimodal 6D object pose estimation based on hierarchical feature transformer
    An, Yunnan
    Yang, Dedong
    Song, Mengyuan
    MEASUREMENT, 2024, 224
  • [6] DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation
    Park, Jaewoo
    Cho, Nam Ik
    COMPUTER VISION - ECCV 2022, PT VI, 2022, 13666 : 363 - 379
  • [7] CatFormer: Category-Level 6D Object Pose Estimation with Transformer
    Yu, Sheng
    Zhai, Di-Hua
    Xia, Yuanqing
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6808 - 6816
  • [8] TransPose: 6D object pose estimation with geometry-aware Transformer
    Lin, Xiao
    Wang, Deming
    Zhou, Guangliang
    Liu, Chengju
    Chen, Qijun
    NEUROCOMPUTING, 2024, 589
  • [9] PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation
    Jantos, Thomas
    Hamdad, Mohamed Amin
    Granig, Wolfgang
    Weiss, Stephan
    Steinbrener, Jan
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1060 - 1070
  • [10] Vision Transformer-based pilot pose estimation
    Wu, Honglan
    Liu, Hao
    Sun, Youchao
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (10): : 3100 - 3110