CMA: Cross-modal attention for 6D object pose estimation

被引:13
|
作者
Zou, Lu [1 ]
Huang, Zhangjin [1 ]
Wang, Fangjun [1 ]
Yang, Zhouwang [1 ]
Wang, Guoping [2 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Peoples R China
[2] Peking Univ, Beijing 100000, Peoples R China
来源
COMPUTERS & GRAPHICS-UK | 2021年 / 97卷
基金
中国国家自然科学基金;
关键词
6D object pose estimation; Cross-modal data fusion; Attention mechanism;
D O I
10.1016/j.cag.2021.04.018
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep learning methods for 6D object pose estimation based on RGB and depth (RGB-D) images have been successfully applied to robotic manipulation and grasping. Among these approaches, the fusion of RGB and depth modalities is one of the most critical issues. Most existing works performed fusion via either simple concatenation, or element-wise multiplication of the features generated by these two modalities. Despite achieving impressive progress, such fusion strategies do not explicitly consider the different con-tributions of RGB and depth modalities, leaving a gap for performance enhancement. In this paper, we present a Cross-Modal Attention (CMA) component for the problem of 6D object pose estimation. With the attention mechanism, features of two different modalities are aggregated adaptively through the at-tention weights, such that powerful representations from the RGB-D images can be efficiently extracted. Comprehensive experiments on both LINEMOD and YCB-Video datasets demonstrate that the proposed approach achieves state-of-the-art performance. (c) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页码:139 / 147
页数:9
相关论文
共 50 条
  • [1] 6D Object Pose Estimation Based on the Attention Mechanism
    Zhou, Guanyu
    INTERNATIONAL CONFERENCE ON ALGORITHMS, HIGH PERFORMANCE COMPUTING, AND ARTIFICIAL INTELLIGENCE (AHPCAI 2021), 2021, 12156
  • [2] Cross-modal attention and geometric contextual aggregation network for 6DoF object pose estimation
    Guo, Yi
    Wang, Fei
    Chu, Hao
    Wen, Shiguang
    NEUROCOMPUTING, 2025, 617
  • [3] 6D Object Pose Estimation With Color/Geometry Attention Fusion
    Yuan, Honglin
    Veltkamp, Remco C.
    16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 529 - 535
  • [4] Spatial Attention Improves Iterative 6D Object Pose Estimation
    Stevsic, Stefan
    Hilliges, Otmar
    2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1070 - 1078
  • [5] Object 6D Pose Estimation with Non-local Attention
    Mei, Jianhan
    Ding, Henghui
    Jiang, Xudong
    TWELFTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2020), 2020, 11519
  • [6] A modal fusion network with dual attention mechanism for 6D pose estimation
    Wei, Liangrui
    Xie, Feifei
    Sun, Lin
    Chen, Jinpeng
    Zhang, Zhipeng
    VISUAL COMPUTER, 2024, 40 (10): : 7411 - 7425
  • [7] On Evaluation of 6D Object Pose Estimation
    Hodan, Tomas
    Matas, Jiri
    Obdrzalek, Stephan
    COMPUTER VISION - ECCV 2016 WORKSHOPS, PT III, 2016, 9915 : 606 - 619
  • [8] Single Shot 6D Object Pose Estimation
    Kleeberger, Kilian
    Huber, Marco F.
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6239 - 6245
  • [9] BOP: Benchmark for 6D Object Pose Estimation
    Hodan, Tomas
    Michel, Frank
    Brachmann, Eric
    Kehl, Wadim
    Buch, Anders Glent
    Kraft, Dirk
    Drost, Bertram
    Vidal, Joel
    Ihrke, Stephan
    Zabulis, Xenophon
    Sahin, Caner
    Manhardt, Fabian
    Tombari, Federico
    Kim, Tae-Kyun
    Matas, Jiri
    Rother, Carsten
    COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 19 - 35
  • [10] 6D Object Pose Estimation with Attention Aware Bi-gated Fusion
    Wang, Laichao
    Lu, Weiding
    Tian, Yuan
    Guan, Yong
    Shao, Zhenzhou
    Shi, Zhiping
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 573 - 585