CMT-6D: a lightweight iterative 6DoF pose estimation network based on cross-modal Transformer

被引：0

作者：

Liu, Suyi ^{[1
]}

Xu, Fang ^{[2
]}

Wu, Chengdong ^{[1
]}

Chi, Jianning ^{[1
]}

Yu, Xiaosheng ^{[1
]}

Wei, Longxing ^{[3
]}

Leng, Chuanjiang ^{[1
]}

机构：

[1] Northeastern Univ, Fac Robot Sci & Engn, Chuangxin Rd, Shenyang 110167, Liaoning, Peoples R China

[2] Acad Sinica, Shenyang Siasun Robot Automat Co Ltd, Quanyun Rd, Shenyang 110180, Liaoning, Peoples R China

[3] China Aerosp Sci & Ind Corp, Inst 706, Acad 2, Yongding Rd, Beijing 100049, Peoples R China

来源：

VISUAL COMPUTER | 2025年 / 41卷 / 03期

基金：

中国国家自然科学基金;

关键词：

6D pose estimation; Cross-modal Transformer; Cross-modal key query strategy; 3D keypoint selection; Lightweight pose iterative; 3D OBJECT DETECTION;

D O I：

10.1007/s00371-024-03520-1

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

6DoF pose estimation has received much attention in recent years. A key challenge is the difficulty of estimating object pose when the target texture is weak. In this work, we present the cross-modal Transformer (CMT-6D), a Transformer-based network suitable for highly accurate workpiece-level object 6D pose estimation from a single RGBD image. Our main insight is to make the surface texture information of RGB images with the geometric feature information of point clouds complement each other through a cross-modal Transformer, enabling accurate estimation of the pose of weakly textured targets. Specifically, the whole framework consists of two parallel Transformer branches, named Point Transformer and Image Transformer. Both parallel transformer networks use a pyramid structured encoder and a multi-layer perceptron structured decoder to extract geometric features of point clouds and texture features of RGB images, respectively. Then, a cross-modal key query strategy is proposed for information exchange between parallel channels. In addition, at the output representation stage, we design a simple and effective 3D keypoint selection algorithm to solve the problem that keypoints are likely to appear in the non-significant region. Finally, to improve the accuracy of attitude estimation and meet real-time requirements, a lightweight pose iterative network based on target feature regression is proposed to correct the initial attitude estimation error. Extensive experiments demonstrate the effectiveness and superiority of our method on LineMOD, Occlusion LineMOD, T-Less, and YCB-Video datasets. We demonstrate that our method can improve the 6D pose estimation performance by comparing with the state-of-the-art. Ablation research and visualization validate the design of CMT-6D.

引用

页码：2011 / 2027

页数：17

共 50 条

[41] A Survey of 6DoF Object Pose Estimation Methods for Different Application Scenarios
Guan, Jian
Hao, Yingming
Wu, Qingxiao
Li, Sicong
Fang, Yingjian
SENSORS, 2024, 24 (04)
[42] A Study on the Impact of Domain Randomization for Monocular Deep 6DoF Pose Estimation
da Cunha, Kelvin B.
Brito, Caio
Valenca, Luas
Simoes, Francisco
Teichrieb, Veronica
2020 33RD SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2020), 2020, : 332 - 339
[43] CAD-Model Recognition and 6DOF Pose Estimation Using 3D Cues
Aldoma, Aitor
Vincze, Markus
Blodow, Nico
Gossow, David
Gedikli, Suat
Rusu, Radu Bogdan
Bradski, Gary
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
[44] 6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images
Wu, Di
Zhuang, Zhaoyong
Xiang, Canqun
Zou, Wenbin
Li, Xia
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1238 - 1247
[45] 6DOF Pose Estimation of a 3D Rigid Object based on an adaptive model curvature Point Pair Features
Zhao, Xin
Cui, Xining
Fu, Hongyong
2024 10TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTIC, ICCAR 2024, 2024, : 53 - 58
[46] img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation
Albiero, Vitor
Chen, Xingyu
Yin, Xi
Pang, Guan
Hassner, Tal
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7613 - 7623
[47] Triangulate geometric constraint combined with visual-flow fusion network for accurate 6DoF pose estimation
Jiang, Zhihong
Wang, Xin
Huang, Xiao
Li, Hui
IMAGE AND VISION COMPUTING, 2021, 108
[48] 6DOF pose estimation of a 3D rigid object based on edge-enhanced point pair features
Liu, Chenyi
Chen, Fei
Deng, Lu
Yi, Renjiao
Zheng, Lintao
Zhu, Chenyang
Wang, Jia
Xu, Kai
COMPUTATIONAL VISUAL MEDIA, 2024, 10 (01): : 61 - 77
[49] 6DOF pose estimation of a 3D rigid object based on edge-enhanced point pair features
Chenyi Liu
Fei Chen
Lu Deng
Renjiao Yi
Lintao Zheng
Chenyang Zhu
Jia Wang
Kai Xu
Computational Visual Media, 2024, 10 : 61 - 77
[50] Binocular Reconstruction and Monocular 6Dof Pose Estimation for Model Free Robot Grasping
Tang, Conghui
Chen, Wenrui
Peng, Yong
Wang, Yaonan
INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2024, PT II, 2025, 15202 : 160 - 173

← 1 2 3 4 5 →