CMA: Cross-modal attention for 6D object pose estimation

被引：13

作者：

Zou, Lu ^{[1
]}

Huang, Zhangjin ^{[1
]}

Wang, Fangjun ^{[1
]}

Yang, Zhouwang ^{[1
]}

Wang, Guoping ^{[2
]}

机构：

[1] Univ Sci & Technol China, Hefei 230026, Peoples R China

[2] Peking Univ, Beijing 100000, Peoples R China

来源：

COMPUTERS & GRAPHICS-UK | 2021年 / 97卷

基金：

中国国家自然科学基金;

关键词：

6D object pose estimation; Cross-modal data fusion; Attention mechanism;

D O I：

10.1016/j.cag.2021.04.018

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Deep learning methods for 6D object pose estimation based on RGB and depth (RGB-D) images have been successfully applied to robotic manipulation and grasping. Among these approaches, the fusion of RGB and depth modalities is one of the most critical issues. Most existing works performed fusion via either simple concatenation, or element-wise multiplication of the features generated by these two modalities. Despite achieving impressive progress, such fusion strategies do not explicitly consider the different con-tributions of RGB and depth modalities, leaving a gap for performance enhancement. In this paper, we present a Cross-Modal Attention (CMA) component for the problem of 6D object pose estimation. With the attention mechanism, features of two different modalities are aggregated adaptively through the at-tention weights, such that powerful representations from the RGB-D images can be efficiently extracted. Comprehensive experiments on both LINEMOD and YCB-Video datasets demonstrate that the proposed approach achieves state-of-the-art performance. (c) 2021 Elsevier Ltd. All rights reserved.

引用

页码：139 / 147

页数：9

共 50 条

[1] 6D Object Pose Estimation Based on the Attention Mechanism
Zhou, Guanyu
INTERNATIONAL CONFERENCE ON ALGORITHMS, HIGH PERFORMANCE COMPUTING, AND ARTIFICIAL INTELLIGENCE (AHPCAI 2021), 2021, 12156
[2] Cross-modal attention and geometric contextual aggregation network for 6DoF object pose estimation
Guo, Yi
Wang, Fei
Chu, Hao
Wen, Shiguang
NEUROCOMPUTING, 2025, 617
[3] 6D Object Pose Estimation With Color/Geometry Attention Fusion
Yuan, Honglin
Veltkamp, Remco C.
16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 529 - 535
[4] Spatial Attention Improves Iterative 6D Object Pose Estimation
Stevsic, Stefan
Hilliges, Otmar
2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1070 - 1078
[5] Object 6D Pose Estimation with Non-local Attention
Mei, Jianhan
Ding, Henghui
Jiang, Xudong
TWELFTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2020), 2020, 11519
[6] A modal fusion network with dual attention mechanism for 6D pose estimation
Wei, Liangrui
Xie, Feifei
Sun, Lin
Chen, Jinpeng
Zhang, Zhipeng
VISUAL COMPUTER, 2024, 40 (10): : 7411 - 7425
[7] On Evaluation of 6D Object Pose Estimation
Hodan, Tomas
Matas, Jiri
Obdrzalek, Stephan
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT III, 2016, 9915 : 606 - 619
[8] Single Shot 6D Object Pose Estimation
Kleeberger, Kilian
Huber, Marco F.
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6239 - 6245
[9] BOP: Benchmark for 6D Object Pose Estimation
Hodan, Tomas
Michel, Frank
Brachmann, Eric
Kehl, Wadim
Buch, Anders Glent
Kraft, Dirk
Drost, Bertram
Vidal, Joel
Ihrke, Stephan
Zabulis, Xenophon
Sahin, Caner
Manhardt, Fabian
Tombari, Federico
Kim, Tae-Kyun
Matas, Jiri
Rother, Carsten
COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 19 - 35
[10] 6D Object Pose Estimation with Attention Aware Bi-gated Fusion
Wang, Laichao
Lu, Weiding
Tian, Yuan
Guan, Yong
Shao, Zhenzhou
Shi, Zhiping
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 573 - 585

← 1 2 3 4 5 →