OV-NeRF: Open-Vocabulary Neural Radiance Fields With Vision and Language Foundation Models for 3D Semantic Understanding

被引:0
|
作者
Liao, Guibiao [1 ]
Zhou, Kaichen [2 ]
Bao, Zhenyu [1 ]
Liu, Kanglin [3 ]
Li, Qing [3 ]
机构
[1] Peking Univ, Sch Elect & Comp Engn, Shenzhen 518055, Peoples R China
[2] Univ Oxford, Dept Comp Sci, Oxford OX1 2JD, Oxfordshire, England
[3] Pengcheng Lab, Shenzhen 518066, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Three-dimensional displays; Neural radiance field; Training; Solid modeling; Rendering (computer graphics); Circuits and systems; open-vocabulary; vision and language foundation models; cross-view self-enhancement;
D O I
10.1109/TCSVT.2024.3439737
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The development of Neural Radiance Fields (NeRFs) has provided a potent representation for encapsulating the geometric and appearance characteristics of 3D scenes. Enhancing the capabilities of NeRFs in open-vocabulary 3D semantic perception tasks has been a recent focus. However, current methods that extract semantics directly from Contrastive Language-Image Pretraining (CLIP) for semantic field learning encounter difficulties due to noisy and view-inconsistent semantics provided by CLIP. To tackle these limitations, we propose OV-NeRF, which exploits the potential of pre-trained vision and language foundation models to enhance semantic field learning through proposed single-view and cross-view strategies. First, from the single-view perspective, we introduce Region Semantic Ranking (RSR) regularization by leveraging 2D mask proposals derived from Segment Anything (SAM) to rectify the noisy semantics of each training view, facilitating accurate semantic field learning. Second, from the cross-view perspective, we propose a Cross-view Self-enhancement (CSE) strategy to address the challenge raised by view-inconsistent semantics. Rather than invariably utilizing the 2D inconsistent semantics from CLIP, CSE leverages the 3D consistent semantics generated from the well-trained semantic field itself for semantic field training, aiming to reduce ambiguity and enhance overall semantic consistency across different views. Extensive experiments validate our OV-NeRF outperforms current state-of-the-art methods, achieving a significant improvement of 20.31% and 18.42% in mIoU metric on Replica and ScanNet, respectively. Furthermore, our approach exhibits consistent superior results across various CLIP configurations, further verifying its robustness. Codes are available at: https://github.com/pcl3dv/OV-NeRF.
引用
收藏
页码:12923 / 12936
页数:14
相关论文
共 26 条
  • [21] FiG-NeRF: Figure-Ground Neural Radiance Fields for 3D Object Category Modelling
    Xie, Christopher
    Park, Keunhong
    Martin-Brualla, Ricardo
    Brown, Matthew
    2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, : 962 - 971
  • [22] OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
    Wang, Zhenyu
    Li, Yali
    Liu, Taichi
    Zhao, Hengshuang
    Wang, Shengjin
    COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 73 - 89
  • [23] NeRF-MAE: Masked AutoEncoders for Self-supervised 3D Representation Learning for Neural Radiance Fields
    Irshad, Muhammad Zubair
    Zakharov, Sergey
    Guizilini, Vitor
    Gaidon, Adrien
    Kira, Zsolt
    Ambrus, Rares
    COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 434 - 453
  • [24] Dynamic Open-Vocabulary 3D Scene Graphs for Long-Term Language-Guided Mobile Manipulation
    Yan, Zhijie
    Li, Shufei
    Wang, Zuoxu
    Wu, Lixiu
    Wang, Han
    Zhu, Jun
    Chen, Lijiang
    Liu, Jihong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (05): : 4252 - 4259
  • [25] Open-set 3D semantic instance maps for vision language navigation-O3D-SIM
    Nanwani, Laksh
    Gupta, Kumaraditya
    Mathur, Aditya
    Agrawal, Swayam
    Hafez, A. H. Abdul
    Krishna, K. Madhava
    ADVANCED ROBOTICS, 2024, 38 (19-20) : 1378 - 1391
  • [26] Automatic Removal of Non-Architectural Elements in 3D Models of Historic Buildings with Language Embedded Radiance Fields
    Rusnak, Alexander
    Pantoja-Rosero, Bryan G.
    Kaplan, Frederic
    Beyer, Katrin
    HERITAGE, 2024, 7 (06): : 3332 - 3349