GaussianGrasper: 3D Language Gaussian Splatting for Open-Vocabulary Robotic Grasping

被引:0
|
作者
Zheng, Yuhang [1 ,2 ]
Chen, Xiangyu [3 ]
Zheng, Yupeng [4 ]
Gu, Songen [5 ]
Yang, Runyi [6 ]
Jin, Bu [4 ]
Li, Pengfei [5 ]
Zhong, Chengliang [5 ]
Wang, Zengmao [7 ]
Liu, Lina [8 ]
Yang, Chao [9 ]
Wang, Dawei [10 ]
Chen, Zhen [3 ]
Long, Xiaoxiao [10 ]
Wang, Meiqing [1 ]
机构
[1] Beihang Univ, SMEA, Haidian 100191, Peoples R China
[2] EncoSmart, Haidian 100191, Peoples R China
[3] EncoSmart, Beijing 100083, Peoples R China
[4] Chinese Acad Sci CASIA, Inst Automat, Haidian 100190, Peoples R China
[5] Tsinghua Univ, AIR, Haidian 100190, Peoples R China
[6] Imperial Coll London, London SW7 2AZ, England
[7] Wuhan Univ, Wuhan 430072, Peoples R China
[8] China Mobile Res Inst, Xicheng 100053, Peoples R China
[9] Shanghai AI Lab, Shanghai 200232, Peoples R China
[10] Univ Hong Kong, Hong Kong, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Language-guided robotic manipulation; 3D Gaussian splatting; language feature field;
D O I
10.1109/LRA.2024.3432348
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit in the domain of robotics, which facilitates robots in executing object manipulations based on human language directives. To achieve this, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (e.g. NeRF) encounter limitations due to the necessity of taking images from a larger number of viewpoints for reconstruction, coupled with their inherent inefficiencies in inference. Furthermore, these methods directly distill patch-level 2D features, leading to ambiguous segmentation boundaries. Thus, we present the GaussianGrasper, which uses 3D Gaussian Splatting (3DGS) to explicitly represent the scene as a set of Gaussian primitives and is capable of real-time rendering. Our approach takes RGB-D images from limited viewpoints as input and uses an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently distill 2D language embeddings and constraint consistency of feature embeddings. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately locate and grasp objects according to language instructions, providing a new solution for language-guided grasping tasks.
引用
收藏
页码:7827 / 7834
页数:8
相关论文
共 50 条
  • [21] Gaussian Splatting: 3D Reconstruction and Novel View Synthesis: A Review
    Dalal, Anurag
    Hagen, Daniel
    Robbersmyr, Kjell G.
    Knausgard, Kristian Muri
    IEEE ACCESS, 2024, 12 : 96797 - 96820
  • [22] Latent human traits in the language of social media: An open-vocabulary approach
    Kulkarni, Vivek
    Kern, Margaret L.
    Stillwell, David
    Kosinski, Michel
    Matz, Sandra
    Ungar, Lyle
    Skiena, Steven
    Schwartz, H. Andrew
    PLOS ONE, 2018, 13 (11):
  • [23] Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling
    Kawakami, Kazuya
    Dyer, Chris
    Blunsom, Phil
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1492 - 1502
  • [24] Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach
    Schwartz, H. Andrew
    Eichstaedt, Johannes C.
    Kern, Margaret L.
    Dziurzynski, Lukasz
    Ramones, Stephanie M.
    Agrawal, Megha
    Shah, Achal
    Kosinski, Michal
    Stillwell, David
    Seligman, Martin E. P.
    Ungar, Lyle H.
    PLOS ONE, 2013, 8 (09):
  • [25] FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding
    Zuo, Xingxing
    Samangouei, Pouya
    Zhou, Yunwen
    Di, Yan
    Li, Mingyang
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 611 - 627
  • [26] Can Identifier Splitting Improve Open-Vocabulary Language Model of Code
    Shi, Jieke
    Yang, Zhou
    He, Junda
    Xu, Bowen
    Lo, David
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 1134 - 1138
  • [27] Localized Vision-Language Matching for Open-vocabulary Object Detection
    Bravo, Maria A.
    Mittal, Sudhanshu
    Brox, Thomas
    PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 393 - 408
  • [28] Large-Scale 3D Terrain Reconstruction Using 3D Gaussian Splatting for Visualization and Simulation
    Chen, Meida
    Lal, Devashish
    Yu, Zifan
    Xu, Jiuyi
    Feng, Andrew
    You, Suya
    Nurunnabi, Abdul
    Shi, Yangming
    MID-TERM SYMPOSIUM THE ROLE OF PHOTOGRAMMETRY FOR A SUSTAINABLE WORLD, VOL. 48-2, 2024, : 49 - 54
  • [29] Geometry enhanced 3D Gaussian Splatting for high quality deferred rendering
    Wang, Shuo
    Xie, Cong
    Wang, Shengdong
    Jiao, Shaohui
    PROCEEDINGS OF THE SIGGRAPH 2024 POSTERS, 2024,
  • [30] 3D Gaussian Splatting for Real-Time Radiance Field Rendering
    Kerbl, Bernhard
    Kopanas, Georgios
    Leimkuehler, Thomas
    Drettakis, George
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04):