GaussianGrasper: 3D Language Gaussian Splatting for Open-Vocabulary Robotic Grasping

被引:0
|
作者
Zheng, Yuhang [1 ,2 ]
Chen, Xiangyu [3 ]
Zheng, Yupeng [4 ]
Gu, Songen [5 ]
Yang, Runyi [6 ]
Jin, Bu [4 ]
Li, Pengfei [5 ]
Zhong, Chengliang [5 ]
Wang, Zengmao [7 ]
Liu, Lina [8 ]
Yang, Chao [9 ]
Wang, Dawei [10 ]
Chen, Zhen [3 ]
Long, Xiaoxiao [10 ]
Wang, Meiqing [1 ]
机构
[1] Beihang Univ, SMEA, Haidian 100191, Peoples R China
[2] EncoSmart, Haidian 100191, Peoples R China
[3] EncoSmart, Beijing 100083, Peoples R China
[4] Chinese Acad Sci CASIA, Inst Automat, Haidian 100190, Peoples R China
[5] Tsinghua Univ, AIR, Haidian 100190, Peoples R China
[6] Imperial Coll London, London SW7 2AZ, England
[7] Wuhan Univ, Wuhan 430072, Peoples R China
[8] China Mobile Res Inst, Xicheng 100053, Peoples R China
[9] Shanghai AI Lab, Shanghai 200232, Peoples R China
[10] Univ Hong Kong, Hong Kong, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Language-guided robotic manipulation; 3D Gaussian splatting; language feature field;
D O I
10.1109/LRA.2024.3432348
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit in the domain of robotics, which facilitates robots in executing object manipulations based on human language directives. To achieve this, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (e.g. NeRF) encounter limitations due to the necessity of taking images from a larger number of viewpoints for reconstruction, coupled with their inherent inefficiencies in inference. Furthermore, these methods directly distill patch-level 2D features, leading to ambiguous segmentation boundaries. Thus, we present the GaussianGrasper, which uses 3D Gaussian Splatting (3DGS) to explicitly represent the scene as a set of Gaussian primitives and is capable of real-time rendering. Our approach takes RGB-D images from limited viewpoints as input and uses an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently distill 2D language embeddings and constraint consistency of feature embeddings. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately locate and grasp objects according to language instructions, providing a new solution for language-guided grasping tasks.
引用
收藏
页码:7827 / 7834
页数:8
相关论文
共 50 条
  • [1] PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
    Ding, Runyu
    Yang, Jihan
    Xue, Chuhui
    Zhang, Wenqing
    Bai, Song
    Qi, Xiaojuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7010 - 7019
  • [2] Weakly Supervised 3D Open-vocabulary Segmentation
    Liu, Kunhao
    Zhan, Fangneng
    Zhang, Jiahui
    Xu, Muyu
    Yu, Yingchen
    El Saddik, Abdulmotaleb
    Theobalt, Christian
    Xing, Eric
    Lu, Shijian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Open-Vocabulary Affordance Detection in 3D Point Clouds
    Toan Nguyen
    Minh Nhat Vu
    An Vuong
    Dzung Nguyen
    Thieu Vo
    Ngan Le
    Anh Nguyen
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5692 - 5698
  • [4] OpenMask3D: Open-Vocabulary 3D Instance Segmentation
    Takmaz, Ayca
    Fedele, Elisabetta
    Sumner, Robert W.
    Pollefeys, Marc
    Tombari, Federico
    Engelmann, Francis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
    Vobecky, Antonin
    Simeoni, Oriane
    Hurych, David
    Gidaris, Spyros
    Bursuc, Andrei
    Perez, Patrick
    Sivic, Josef
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] 3D Gaussian Splatting with Deferred Reflection
    Ye, Keyang
    Hou, Qiming
    Zhou, Kun
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [7] OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
    Lu, Shiyang
    Chang, Haonan
    Jing, Eric Pu
    Boularias, Abdeslam
    Bekris, Kostas
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [8] Recent advances in 3D Gaussian splatting
    Wu, Tong
    Yuan, Yu-Jie
    Zhang, Ling-Xiao
    Yang, Jie
    Cao, Yan-Pei
    Yan, Ling-Qi
    Gao, Lin
    COMPUTATIONAL VISUAL MEDIA, 2024, 10 (04) : 613 - 642
  • [9] TAMC: Textual Alignment and Masked Consistency for Open-Vocabulary 3D Scene Understanding
    Wang, Juan
    Wang, Zhijie
    Miyazaki, Tomo
    Fan, Yaohou
    Omachi, Shinichiro
    Sensors, 2024, 24 (19)
  • [10] Reducing the Memory Footprint of 3D Gaussian Splatting
    Papantonakis, Panagiotis
    Kopanas, Georgios
    Kerbl, Bernhard
    Lanvin, Alexandre
    Drettakis, George
    PROCEEDINGS OF THE ACM ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES, 2024, 7 (01)