Attention-Based Grasp Detection With Monocular Depth Estimation

被引:1
|
作者
Xuan Tan, Phan [1 ]
Hoang, Dinh-Cuong [2 ]
Nguyen, Anh-Nhat [3 ]
Nguyen, Van-Thiep [3 ]
Vu, Van-Duc [3 ]
Nguyen, Thu-Uyen [3 ]
Hoang, Ngoc-Anh [3 ]
Phan, Khanh-Toan [3 ]
Tran, Duc-Thanh [3 ]
Vu, Duy-Quang [3 ]
Ngo, Phuc-Quan [2 ]
Duong, Quang-Tri [2 ]
Ho, Ngoc-Trung [3 ]
Tran, Cong-Trinh [3 ]
Duong, Van-Hiep [3 ]
Mai, Anh-Truong [3 ]
机构
[1] Shibaura Inst Technol, Coll Engn, Tokyo 1358548, Japan
[2] FPT Univ, Greenwich Vietnam, Hanoi 10000, Vietnam
[3] FPT Univ, IT Dept, Hanoi 10000, Vietnam
关键词
Pose estimation; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2024.3397718
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Grasp detection plays a pivotal role in robotic manipulation, allowing robots to interact with and manipulate objects in their surroundings. Traditionally, this has relied on three-dimensional (3D) point cloud data acquired from specialized depth cameras. However, the limited availability of such sensors in real-world scenarios poses a significant challenge. In many practical applications, robots operate in diverse environments where obtaining high-quality 3D point cloud data may be impractical or impossible. This paper introduces an innovative approach to grasp generation using color images, thereby eliminating the need for dedicated depth sensors. Our method capitalizes on advanced deep learning techniques for depth estimation directly from color images. Instead of relying on conventional depth sensors, our approach computes predicted point clouds based on estimated depth images derived directly from Red-Green-Blue (RGB) input data. To our knowledge, this is the first study to explore the use of predicted depth data for grasp detection, moving away from the traditional dependence on depth sensors. The novelty of this work is the development of a fusion module that seamlessly integrates features extracted from RGB images with those inferred from the predicted point clouds. Additionally, we adapt a voting mechanism from our previous work (VoteGrasp) to enhance robustness to occlusion and generate collision-free grasps. Experimental evaluations conducted on standard datasets validate the effectiveness of our approach, demonstrating its superior performance in generating grasp configurations compared to existing methods. With our proposed method, we achieved a significant 4% improvement in average precision compared to state-of-the-art grasp detection methods. Furthermore, our method demonstrates promising practical viability through real robot grasping experiments, achieving an impressive 84% success rate.
引用
收藏
页码:65041 / 65057
页数:17
相关论文
共 50 条
  • [1] Attention-based context aggregation network for monocular depth estimation
    Chen, Yuru
    Zhao, Haitao
    Hu, Zhengwei
    Peng, Jingchao
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (06) : 1583 - 1596
  • [2] Attention-Based Dense Decoding Network for Monocular Depth Estimation
    Wang, Jianrong
    Zhang, Ge
    Yu, Mei
    Xu, Tianyi
    Luo, Tao
    [J]. IEEE ACCESS, 2020, 8 (08): : 85802 - 85812
  • [3] Attention-based context aggregation network for monocular depth estimation
    Yuru Chen
    Haitao Zhao
    Zhengwei Hu
    Jingchao Peng
    [J]. International Journal of Machine Learning and Cybernetics, 2021, 12 : 1583 - 1596
  • [4] Online supervised attention-based recurrent depth estimation from monocular video
    Maslov, Dmitrii
    Makarov, Ilya
    [J]. PEERJ COMPUTER SCIENCE, 2020,
  • [5] Online supervised attention-based recurrent depth estimation from monocular video
    Maslov, Dmitrii
    Makarov, Ilya
    [J]. Maslov, Dmitrii (dvmaslov@edu.hse.ru), 1600, PeerJ Inc. (06): : 1 - 22
  • [6] ATTENTION-BASED SELF-SUPERVISED LEARNING MONOCULAR DEPTH ESTIMATION WITH EDGE REFINEMENT
    Jiang, Chenweinan
    Liu, Haichun
    Li, Lanzhen
    Pan, Changchun
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 3218 - 3222
  • [7] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation
    Yan, Jiaxing
    Zhao, Hong
    Bu, Penghui
    Jin, YuSheng
    [J]. 2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, : 464 - 473
  • [8] Attention-Based Monocular Depth Estimation Considering Global and Local Information in Remote Sensing Images
    Lv, Junwei
    Zhang, Yueting
    Guo, Jiayi
    Zhao, Xin
    Gao, Ming
    Lei, Bin
    [J]. REMOTE SENSING, 2024, 16 (03)
  • [9] Attention-based efficient robot grasp detection network
    Qin, Xiaofei
    Hu, Wenkai
    Xiao, Chen
    He, Changxiang
    Pei, Songwen
    Zhang, Xuedian
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2023, 24 (10) : 1430 - 1444
  • [10] Lightweight monocular absolute depth estimation based on attention mechanism
    Jin, Jiayu
    Tao, Bo
    Qian, Xinbo
    Hu, Jiaxin
    Li, Gongfa
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)