CSA6D: Channel-Spatial Attention Networks for 6D Object Pose Estimation

被引:5
|
作者
Chen, Tao [1 ]
Gu, Dongbing [1 ]
机构
[1] Univ Essex, Sch Comp Sci & Elect Engn, Colchester CO4 3SQ, Essex, England
关键词
6D object pose estimation; Convolutional neural network; Feature fusion; Attention mechanism;
D O I
10.1007/s12559-021-09966-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
6D object pose estimation plays a crucial role in robotic manipulation and grasping tasks. The aim to estimate the 6D object pose from RGB or RGB-D images is to detect objects and estimate their orientations and translations relative to the given canonical models. RGB-D cameras provide two sensory modalities: RGB and depth images, which could benefit the estimation accuracy. But the exploitation of two different modality sources remains a challenging issue. In this paper, inspired by recent works on attention networks that could focus on important regions and ignore unnecessary information, we propose a novel network: Channel-Spatial Attention Network (CSA6D) to estimate the 6D object pose from RGB-D camera. The proposed CSA6D includes a pre-trained 2D network to segment the interested objects from RGB image. Then it uses two separate networks to extract appearance and geometrical features from RGB and depth images for each segmented object. Two feature vectors for each pixel are stacked together as a fusion vector which is refined by an attention module to generate a aggregated feature vector. The attention module includes a channel attention block and a spatial attention block which can effectively leverage the concatenated embeddings into accurate 6D pose prediction on known objects. We evaluate proposed network on two benchmark datasets YCB-Video dataset and LineMod dataset and the results show it can outperform previous state-of-the-art methods under ADD and ADD-S metrics. Also, the attention map demonstrates our proposed network searches for the unique geometry information as the most likely features for pose estimation. From experiments, we conclude that the proposed network can accurately estimate the object pose by effectively leveraging multi-modality features.
引用
收藏
页码:702 / 713
页数:12
相关论文
共 50 条
  • [1] CSA6D: Channel-Spatial Attention Networks for 6D Object Pose Estimation
    Tao Chen
    Dongbing Gu
    [J]. Cognitive Computation, 2022, 14 : 702 - 713
  • [2] Spatial Attention Improves Iterative 6D Object Pose Estimation
    Stevsic, Stefan
    Hilliges, Otmar
    [J]. 2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1070 - 1078
  • [3] 6D Object Pose Estimation Based on the Attention Mechanism
    Zhou, Guanyu
    [J]. INTERNATIONAL CONFERENCE ON ALGORITHMS, HIGH PERFORMANCE COMPUTING, AND ARTIFICIAL INTELLIGENCE (AHPCAI 2021), 2021, 12156
  • [4] 6D Object Pose Estimation With Color/Geometry Attention Fusion
    Yuan, Honglin
    Veltkamp, Remco C.
    [J]. 16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 529 - 535
  • [5] On Evaluation of 6D Object Pose Estimation
    Hodan, Tomas
    Matas, Jiri
    Obdrzalek, Stephan
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT III, 2016, 9915 : 606 - 619
  • [6] Object 6D Pose Estimation with Non-local Attention
    Mei, Jianhan
    Ding, Henghui
    Jiang, Xudong
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2020), 2020, 11519
  • [7] CMA: Cross-modal attention for 6D object pose estimation
    Zou, Lu
    Huang, Zhangjin
    Wang, Fangjun
    Yang, Zhouwang
    Wang, Guoping
    [J]. COMPUTERS & GRAPHICS-UK, 2021, 97 : 139 - 147
  • [8] Single Shot 6D Object Pose Estimation
    Kleeberger, Kilian
    Huber, Marco F.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6239 - 6245
  • [9] BOP: Benchmark for 6D Object Pose Estimation
    Hodan, Tomas
    Michel, Frank
    Brachmann, Eric
    Kehl, Wadim
    Buch, Anders Glent
    Kraft, Dirk
    Drost, Bertram
    Vidal, Joel
    Ihrke, Stephan
    Zabulis, Xenophon
    Sahin, Caner
    Manhardt, Fabian
    Tombari, Federico
    Kim, Tae-Kyun
    Matas, Jiri
    Rother, Carsten
    [J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 19 - 35
  • [10] Survey on 6D Pose Estimation of Rigid Object
    Chen, Jiale
    Zhang, Lijun
    Liu, Yi
    Xu, Chi
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7440 - 7445