Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding

被引:0
|
作者
Shi, Jin-Chuan [1 ]
Wang, Miao [1 ,2 ]
Duan, Hao-Bin [1 ]
Guan, Shao-Hua [1 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, SCSE, Beijing, Peoples R China
[2] Zhongguancun Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52733.2024.00510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as object localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their efficacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view synthesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the memory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our comprehensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU. Project page: https://buaavrcg.github.io/LEGaussians/.
引用
收藏
页码:5333 / 5343
页数:11
相关论文
共 50 条
  • [1] PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
    Ding, Runyu
    Yang, Jihan
    Xue, Chuhui
    Zhang, Wenqing
    Bai, Song
    Qi, Xiaojuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7010 - 7019
  • [2] Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
    Li, Ruihuang
    Zhang, Zhengqiang
    He, Chenheng
    Ma, Zhiyuan
    Patel, Vishal M.
    Zhang, Lei
    COMPUTER VISION - ECCV 2024, PT XLIX, 2025, 15107 : 416 - 434
  • [3] TAMC: Textual Alignment and Masked Consistency for Open-Vocabulary 3D Scene Understanding
    Wang, Juan
    Wang, Zhijie
    Miyazaki, Tomo
    Fan, Yaohou
    Omachi, Shinichiro
    SENSORS, 2024, 24 (19)
  • [4] Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
    Chang, Haonan
    Boyalakuntla, Kowndinya
    Lu, Shiyang
    Cai, Siwei
    Jing, Eric Pu
    Keskar, Shreesh
    Geng, Shijie
    Abbas, Adeeb
    Zhou, Lifeng
    Bekris, Kostas
    Boularias, Abdeslam
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [5] GaussianGrasper: 3D Language Gaussian Splatting for Open-Vocabulary Robotic Grasping
    Zheng, Yuhang
    Chen, Xiangyu
    Zheng, Yupeng
    Gu, Songen
    Yang, Runyi
    Jin, Bu
    Li, Pengfei
    Zhong, Chengliang
    Wang, Zengmao
    Liu, Lina
    Yang, Chao
    Wang, Dawei
    Chen, Zhen
    Long, Xiaoxiao
    Wang, Meiqing
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (09): : 7827 - 7834
  • [6] Dynamic Open-Vocabulary 3D Scene Graphs for Long-Term Language-Guided Mobile Manipulation
    Yan, Zhijie
    Li, Shufei
    Wang, Zuoxu
    Wu, Lixiu
    Wang, Han
    Zhu, Jun
    Chen, Lijiang
    Liu, Jihong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (05): : 4252 - 4259
  • [7] Weakly Supervised 3D Open-vocabulary Segmentation
    Liu, Kunhao
    Zhan, Fangneng
    Zhang, Jiahui
    Xu, Muyu
    Yu, Yingchen
    El Saddik, Abdulmotaleb
    Theobalt, Christian
    Xing, Eric
    Lu, Shijian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] LANGUAGE-DRIVEN OPEN-VOCABULARY 3D SEMANTIC SEGMENTATION WITH KNOWLEDGE DISTILLATION
    Wu, Yuting
    Han, Xian-Feng
    Xiao, Guoqiang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3320 - 3324
  • [9] Open-Vocabulary Affordance Detection in 3D Point Clouds
    Toan Nguyen
    Minh Nhat Vu
    An Vuong
    Dzung Nguyen
    Thieu Vo
    Ngan Le
    Anh Nguyen
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5692 - 5698
  • [10] Search3D: Hierarchical Open-Vocabulary 3D Segmentation
    Takmaz, Ayca
    Delitzas, Alexandros
    Sumner, Robert W.
    Engelmann, Francis
    Wald, Johanna
    Tombari, Federico
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2558 - 2565