CMT-6D: a lightweight iterative 6DoF pose estimation network based on cross-modal Transformer

被引:0
|
作者
Liu, Suyi [1 ]
Xu, Fang [2 ]
Wu, Chengdong [1 ]
Chi, Jianning [1 ]
Yu, Xiaosheng [1 ]
Wei, Longxing [3 ]
Leng, Chuanjiang [1 ]
机构
[1] Northeastern Univ, Fac Robot Sci & Engn, Chuangxin Rd, Shenyang 110167, Liaoning, Peoples R China
[2] Acad Sinica, Shenyang Siasun Robot Automat Co Ltd, Quanyun Rd, Shenyang 110180, Liaoning, Peoples R China
[3] China Aerosp Sci & Ind Corp, Inst 706, Acad 2, Yongding Rd, Beijing 100049, Peoples R China
来源
VISUAL COMPUTER | 2025年 / 41卷 / 03期
基金
中国国家自然科学基金;
关键词
6D pose estimation; Cross-modal Transformer; Cross-modal key query strategy; 3D keypoint selection; Lightweight pose iterative; 3D OBJECT DETECTION;
D O I
10.1007/s00371-024-03520-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
6DoF pose estimation has received much attention in recent years. A key challenge is the difficulty of estimating object pose when the target texture is weak. In this work, we present the cross-modal Transformer (CMT-6D), a Transformer-based network suitable for highly accurate workpiece-level object 6D pose estimation from a single RGBD image. Our main insight is to make the surface texture information of RGB images with the geometric feature information of point clouds complement each other through a cross-modal Transformer, enabling accurate estimation of the pose of weakly textured targets. Specifically, the whole framework consists of two parallel Transformer branches, named Point Transformer and Image Transformer. Both parallel transformer networks use a pyramid structured encoder and a multi-layer perceptron structured decoder to extract geometric features of point clouds and texture features of RGB images, respectively. Then, a cross-modal key query strategy is proposed for information exchange between parallel channels. In addition, at the output representation stage, we design a simple and effective 3D keypoint selection algorithm to solve the problem that keypoints are likely to appear in the non-significant region. Finally, to improve the accuracy of attitude estimation and meet real-time requirements, a lightweight pose iterative network based on target feature regression is proposed to correct the initial attitude estimation error. Extensive experiments demonstrate the effectiveness and superiority of our method on LineMOD, Occlusion LineMOD, T-Less, and YCB-Video datasets. We demonstrate that our method can improve the 6D pose estimation performance by comparing with the state-of-the-art. Ablation research and visualization validate the design of CMT-6D.
引用
收藏
页码:2011 / 2027
页数:17
相关论文
共 50 条
  • [41] A Survey of 6DoF Object Pose Estimation Methods for Different Application Scenarios
    Guan, Jian
    Hao, Yingming
    Wu, Qingxiao
    Li, Sicong
    Fang, Yingjian
    SENSORS, 2024, 24 (04)
  • [42] A Study on the Impact of Domain Randomization for Monocular Deep 6DoF Pose Estimation
    da Cunha, Kelvin B.
    Brito, Caio
    Valenca, Luas
    Simoes, Francisco
    Teichrieb, Veronica
    2020 33RD SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2020), 2020, : 332 - 339
  • [43] CAD-Model Recognition and 6DOF Pose Estimation Using 3D Cues
    Aldoma, Aitor
    Vincze, Markus
    Blodow, Nico
    Gossow, David
    Gedikli, Suat
    Rusu, Radu Bogdan
    Bradski, Gary
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
  • [44] 6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images
    Wu, Di
    Zhuang, Zhaoyong
    Xiang, Canqun
    Zou, Wenbin
    Li, Xia
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1238 - 1247
  • [45] 6DOF Pose Estimation of a 3D Rigid Object based on an adaptive model curvature Point Pair Features
    Zhao, Xin
    Cui, Xining
    Fu, Hongyong
    2024 10TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTIC, ICCAR 2024, 2024, : 53 - 58
  • [46] img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation
    Albiero, Vitor
    Chen, Xingyu
    Yin, Xi
    Pang, Guan
    Hassner, Tal
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7613 - 7623
  • [47] Triangulate geometric constraint combined with visual-flow fusion network for accurate 6DoF pose estimation
    Jiang, Zhihong
    Wang, Xin
    Huang, Xiao
    Li, Hui
    IMAGE AND VISION COMPUTING, 2021, 108
  • [48] 6DOF pose estimation of a 3D rigid object based on edge-enhanced point pair features
    Liu, Chenyi
    Chen, Fei
    Deng, Lu
    Yi, Renjiao
    Zheng, Lintao
    Zhu, Chenyang
    Wang, Jia
    Xu, Kai
    COMPUTATIONAL VISUAL MEDIA, 2024, 10 (01): : 61 - 77
  • [49] 6DOF pose estimation of a 3D rigid object based on edge-enhanced point pair features
    Chenyi Liu
    Fei Chen
    Lu Deng
    Renjiao Yi
    Lintao Zheng
    Chenyang Zhu
    Jia Wang
    Kai Xu
    Computational Visual Media, 2024, 10 : 61 - 77
  • [50] Binocular Reconstruction and Monocular 6Dof Pose Estimation for Model Free Robot Grasping
    Tang, Conghui
    Chen, Wenrui
    Peng, Yong
    Wang, Yaonan
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2024, PT II, 2025, 15202 : 160 - 173