Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding

被引:0
|
作者
Shi, Jin-Chuan [1 ]
Wang, Miao [1 ,2 ]
Duan, Hao-Bin [1 ]
Guan, Shao-Hua [1 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, SCSE, Beijing, Peoples R China
[2] Zhongguancun Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52733.2024.00510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as object localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their efficacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view synthesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the memory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our comprehensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU. Project page: https://buaavrcg.github.io/LEGaussians/.
引用
收藏
页码:5333 / 5343
页数:11
相关论文
共 50 条
  • [21] Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding
    Kim, Hwa-Yeon
    Roh, Yoon-Hyung
    Kim, Young-Kil
    NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2019, : 97 - 102
  • [22] CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning
    Chen, Lianggangxu
    Wang, Xuejiao
    Lu, Jiale
    Lin, Shaohui
    Wang, Changbo
    He, Gaoqi
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27863 - 27873
  • [23] OpenScene: 3D Scene Understanding with Open Vocabularies
    Peng, Songyou
    Genova, Kyle
    Jiang, Chiyu Max
    Tagliasacchi, Andrea
    Pollefeys, Marc
    Funkhouser, Thomas
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 815 - 824
  • [24] Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
    Etchegaray, Djamahl
    Huang, Zi
    Harada, Tatsuya
    Luo, Yadan
    COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 133 - 151
  • [25] Open-Vocabulary Object Detection via Scene Graph Discovery
    Shi, Hengcan
    Hayat, Munawar
    Cai, Jianfei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4012 - 4021
  • [26] Open-Vocabulary Point-Cloud Object Detection without 3D Annotation
    Lu, Yuheng
    Xu, Chenfeng
    Wei, Xiaobao
    Xie, Xiaodong
    Tomizuka, Masayoshi
    Keutzer, Kurt
    Zhang, Shanghang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1190 - 1199
  • [27] Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
    Zhu, Xiaoyu
    Zhou, Hao
    Xing, Pengfei
    Zhao, Long
    Xu, Hao
    Liang, Junwei
    Hauptmann, Alexander
    Liu, Ting
    Gallagher, Andrew
    COMPUTER VISION - ECCV 2024, PT XXIX, 2025, 15087 : 357 - 375
  • [28] Open-vocabulary Queryable Scene Representations for Real World Planning
    Chen, Boyuan
    Xia, Fei
    Ichter, Brian
    Rao, Kanishka
    Gopalakrishnan, Keerthana
    Ryoo, Michael S.
    Stone, Austin
    Kappler, Daniel
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 11509 - 11522
  • [29] From Characters toWords: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
    Sun, Li
    Luisier, Florian
    Batmanghelich, Kayhan
    Florencio, Dinei
    Zhang, Cha
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3605 - 3620
  • [30] A Hybrid Language Model for Open-Vocabulary Thai LVCSR
    Thangthai, Kwanchiva
    Chotimongkol, Ananlada
    Wutiwiwatchai, Chai
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2206 - 2210