YOLOPose V2: Understanding and improving transformer-based 6D pose estimation

被引：12

作者：

Periyasamy, Arul Selvam ^{[1
]}

Amini, Arash ^{[1
]}

Tsaturyan, Vladimir ^{[1
]}

Behnke, Sven ^{[1
]}

机构：

[1] Univ Bonn, Autonomous Intelligent Syst, Bonn, Germany

来源：

ROBOTICS AND AUTONOMOUS SYSTEMS | 2023年 / 168卷

关键词：

Vision transformers; Object pose estimation; Object detection; CALIBRATION;

D O I：

10.1016/j.robot.2023.104490

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state -of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression and an improved variant of the YOLOPose model. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods. We analyze the role of object queries in our architecture and reveal that the object queries specialize in detecting objects in specific image regions. Furthermore, we quantify the accuracy trade-off of using datasets of smaller sizes to train our model. & COPY; 2023 Elsevier B.V. All rights reserved.

引用

页数：12

共 50 条

[1] YOLOPose: Transformer-Based Multi-object 6D Pose Estimation Using Keypoint Regression
Amini, Arash
Periyasamy, Arul Selvam
Behnke, Sven
INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17, 2023, 577 : 392 - 406
[2] A Transformer-based multi-modal fusion network for 6D pose estimation
Hong, Jia-Xin
Zhang, Hong-Bo
Liu, Jing-Hua
Lei, Qing
Yang, Li-Jie
Du, Ji-Xiang
INFORMATION FUSION, 2024, 105
[3] TPSFusion: A Transformer-based pyramid screening fusion network for 6D pose estimation
Zhu, Jiaqi
Li, Bin
Zhao, Xinhua
IMAGE AND VISION COMPUTING, 2025, 154
[4] 6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-Based Instance Representation Learning
Zou, Lu
Huang, Zhangjin
Gu, Naijie
Wang, Guoping
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6907 - 6921
[5] HFT6D: Multimodal 6D object pose estimation based on hierarchical feature transformer
An, Yunnan
Yang, Dedong
Song, Mengyuan
MEASUREMENT, 2024, 224
[6] DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation
Park, Jaewoo
Cho, Nam Ik
COMPUTER VISION - ECCV 2022, PT VI, 2022, 13666 : 363 - 379
[7] CatFormer: Category-Level 6D Object Pose Estimation with Transformer
Yu, Sheng
Zhai, Di-Hua
Xia, Yuanqing
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6808 - 6816
[8] TransPose: 6D object pose estimation with geometry-aware Transformer
Lin, Xiao
Wang, Deming
Zhou, Guangliang
Liu, Chengju
Chen, Qijun
NEUROCOMPUTING, 2024, 589
[9] PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation
Jantos, Thomas
Hamdad, Mohamed Amin
Granig, Wolfgang
Weiss, Stephan
Steinbrener, Jan
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1060 - 1070
[10] Vision Transformer-based pilot pose estimation
Wu, Honglan
Liu, Hao
Sun, Youchao
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (10): : 3100 - 3110

← 1 2 3 4 5 →