Attention-Based Grasp Detection With Monocular Depth Estimation

被引:1
|
作者
Xuan Tan, Phan [1 ]
Hoang, Dinh-Cuong [2 ]
Nguyen, Anh-Nhat [3 ]
Nguyen, Van-Thiep [3 ]
Vu, Van-Duc [3 ]
Nguyen, Thu-Uyen [3 ]
Hoang, Ngoc-Anh [3 ]
Phan, Khanh-Toan [3 ]
Tran, Duc-Thanh [3 ]
Vu, Duy-Quang [3 ]
Ngo, Phuc-Quan [2 ]
Duong, Quang-Tri [2 ]
Ho, Ngoc-Trung [3 ]
Tran, Cong-Trinh [3 ]
Duong, Van-Hiep [3 ]
Mai, Anh-Truong [3 ]
机构
[1] Shibaura Inst Technol, Coll Engn, Tokyo 1358548, Japan
[2] FPT Univ, Greenwich Vietnam, Hanoi 10000, Vietnam
[3] FPT Univ, IT Dept, Hanoi 10000, Vietnam
关键词
Pose estimation; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2024.3397718
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Grasp detection plays a pivotal role in robotic manipulation, allowing robots to interact with and manipulate objects in their surroundings. Traditionally, this has relied on three-dimensional (3D) point cloud data acquired from specialized depth cameras. However, the limited availability of such sensors in real-world scenarios poses a significant challenge. In many practical applications, robots operate in diverse environments where obtaining high-quality 3D point cloud data may be impractical or impossible. This paper introduces an innovative approach to grasp generation using color images, thereby eliminating the need for dedicated depth sensors. Our method capitalizes on advanced deep learning techniques for depth estimation directly from color images. Instead of relying on conventional depth sensors, our approach computes predicted point clouds based on estimated depth images derived directly from Red-Green-Blue (RGB) input data. To our knowledge, this is the first study to explore the use of predicted depth data for grasp detection, moving away from the traditional dependence on depth sensors. The novelty of this work is the development of a fusion module that seamlessly integrates features extracted from RGB images with those inferred from the predicted point clouds. Additionally, we adapt a voting mechanism from our previous work (VoteGrasp) to enhance robustness to occlusion and generate collision-free grasps. Experimental evaluations conducted on standard datasets validate the effectiveness of our approach, demonstrating its superior performance in generating grasp configurations compared to existing methods. With our proposed method, we achieved a significant 4% improvement in average precision compared to state-of-the-art grasp detection methods. Furthermore, our method demonstrates promising practical viability through real robot grasping experiments, achieving an impressive 84% success rate.
引用
收藏
页码:65041 / 65057
页数:17
相关论文
共 50 条
  • [21] MLDA-Net: Multi-Level Dual Attention-Based Network for Self-Supervised Monocular Depth Estimation
    Song, Xibin
    Li, Wei
    Zhou, Dingfu
    Dai, Yuchao
    Fang, Jin
    Li, Hongdong
    Zhang, Liangjun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4691 - 4705
  • [22] Attention-Based Depth Distillation with 3D-Aware Positional Encoding for Monocular 3D Object Detection
    Wu, Zizhang
    Wu, Yunzhe
    Pu, Jian
    Li, Xianzhi
    Wang, Xiaoquan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2892 - 2900
  • [23] Trap Attention: Monocular Depth Estimation with Manual Traps
    Ning, Chao
    Gan, Hongping
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5033 - 5043
  • [24] Unsupervised Monocular Depth Estimation With Channel and Spatial Attention
    Wang, Zhuping
    Dai, Xinke
    Guo, Zhanyu
    Huang, Chao
    Zhang, Hao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) : 7860 - 7870
  • [25] CATNet: Convolutional attention and transformer for monocular depth estimation
    Tang, Shuai
    Lu, Tongwei
    Liu, Xuanxuan
    Zhou, Huabing
    Zhang, Yanduo
    PATTERN RECOGNITION, 2024, 145
  • [26] Attention Mechanism Used in Monocular Depth Estimation: An Overview
    Li, Yundong
    Wei, Xiaokun
    Fan, Hanlu
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [27] Dual-Attention Mechanism for Monocular Depth Estimation
    Chiu, Chui-Hong
    Astuti, Lia
    Lin, Yu-Chen
    Hung, Ming-Ku
    2024 16TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING, ICCAE 2024, 2024, : 456 - 460
  • [28] UNSUPERVISED MONOCULAR DEPTH ESTIMATION BASED ON DUAL ATTENTION MECHANISM AND DEPTH-AWARE LOSS
    Ye, Xinchen
    Zhang, Mingliang
    Xu, Rui
    Zhong, Wei
    Fan, Xin
    Liu, Zhu
    Zhang, Jiaao
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 169 - 174
  • [29] Illumination Insensitive Monocular Depth Estimation Based on Scene Object Attention and Depth Map Fusion
    Wen, Jing
    Ma, Haojiang
    Yang, Jie
    Zhang, Songsong
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 358 - 370
  • [30] Decoupled and Reparameterized Compound Attention-Based Light Field Depth Estimation Network
    Liao, Wan
    Bai, Xiaoqi
    Zhang, Qian
    Cao, Jie
    Fu, Haoyu
    Wei, Wei
    Wang, Bin
    Yan, Tao
    IEEE ACCESS, 2023, 11 : 130119 - 130130