Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding

被引：0

作者：

Shi, Jin-Chuan ^{[1
]}

Wang, Miao ^{[1
,2
]}

Duan, Hao-Bin ^{[1
]}

Guan, Shao-Hua ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, SCSE, Beijing, Peoples R China

[2] Zhongguancun Lab, Beijing, Peoples R China

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52733.2024.00510

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as object localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their efficacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view synthesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the memory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our comprehensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU. Project page: https://buaavrcg.github.io/LEGaussians/.

引用

页码：5333 / 5343

页数：11

共 50 条

[1] PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
Ding, Runyu
Yang, Jihan
Xue, Chuhui
Zhang, Wenqing
Bai, Song
Qi, Xiaojuan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7010 - 7019
[2] Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Li, Ruihuang
Zhang, Zhengqiang
He, Chenheng
Ma, Zhiyuan
Patel, Vishal M.
Zhang, Lei
COMPUTER VISION - ECCV 2024, PT XLIX, 2025, 15107 : 416 - 434
[3] TAMC: Textual Alignment and Masked Consistency for Open-Vocabulary 3D Scene Understanding
Wang, Juan
Wang, Zhijie
Miyazaki, Tomo
Fan, Yaohou
Omachi, Shinichiro
SENSORS, 2024, 24 (19)
[4] Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
Chang, Haonan
Boyalakuntla, Kowndinya
Lu, Shiyang
Cai, Siwei
Jing, Eric Pu
Keskar, Shreesh
Geng, Shijie
Abbas, Adeeb
Zhou, Lifeng
Bekris, Kostas
Boularias, Abdeslam
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[5] GaussianGrasper: 3D Language Gaussian Splatting for Open-Vocabulary Robotic Grasping
Zheng, Yuhang
Chen, Xiangyu
Zheng, Yupeng
Gu, Songen
Yang, Runyi
Jin, Bu
Li, Pengfei
Zhong, Chengliang
Wang, Zengmao
Liu, Lina
Yang, Chao
Wang, Dawei
Chen, Zhen
Long, Xiaoxiao
Wang, Meiqing
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (09): : 7827 - 7834
[6] Dynamic Open-Vocabulary 3D Scene Graphs for Long-Term Language-Guided Mobile Manipulation
Yan, Zhijie
Li, Shufei
Wang, Zuoxu
Wu, Lixiu
Wang, Han
Zhu, Jun
Chen, Lijiang
Liu, Jihong
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (05): : 4252 - 4259
[7] Weakly Supervised 3D Open-vocabulary Segmentation
Liu, Kunhao
Zhan, Fangneng
Zhang, Jiahui
Xu, Muyu
Yu, Yingchen
El Saddik, Abdulmotaleb
Theobalt, Christian
Xing, Eric
Lu, Shijian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[8] LANGUAGE-DRIVEN OPEN-VOCABULARY 3D SEMANTIC SEGMENTATION WITH KNOWLEDGE DISTILLATION
Wu, Yuting
Han, Xian-Feng
Xiao, Guoqiang
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3320 - 3324
[9] Open-Vocabulary Affordance Detection in 3D Point Clouds
Toan Nguyen
Minh Nhat Vu
An Vuong
Dzung Nguyen
Thieu Vo
Ngan Le
Anh Nguyen
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5692 - 5698
[10] Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Takmaz, Ayca
Delitzas, Alexandros
Sumner, Robert W.
Engelmann, Francis
Wald, Johanna
Tombari, Federico
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2558 - 2565

← 1 2 3 4 5 →