Weakly Supervised 3D Open-vocabulary Segmentation

被引:0
|
作者
Liu, Kunhao [1 ]
Zhan, Fangneng [2 ]
Zhang, Jiahui [1 ]
Xu, Muyu [1 ]
Yu, Yingchen [1 ]
El Saddik, Abdulmotaleb [3 ,5 ]
Theobalt, Christian [2 ]
Xing, Eric [4 ,5 ]
Lu, Shijian [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Max Planck Inst Informat, Saarbrucken, Germany
[3] Univ Ottawa, Ottawa, ON K1N 6N5, Canada
[4] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[5] MBZUAI, Abu Dhabi, U Arab Emirates
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner. Specifically, given only the open-vocabulary text descriptions of the objects in a scene, we distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF), which effectively lifts 2D features into view-consistent 3D segmentation. A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process. Extensive experiments show that our method even outperforms fully supervised models trained with segmentation annotations in certain scenes, suggesting that 3D open-vocabulary segmentation can be effectively learned from 2D images and text-image pairs. Code is available at https://github.com/Kunhao-Liu/3D-OVS.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
    Chang, Haonan
    Boyalakuntla, Kowndinya
    Lu, Shiyang
    Cai, Siwei
    Jing, Eric Pu
    Keskar, Shreesh
    Geng, Shijie
    Abbas, Adeeb
    Zhou, Lifeng
    Bekris, Kostas
    Boularias, Abdeslam
    [J]. CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [22] Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
    Han, Kunyang
    Liu, Yong
    Liew, Jun Hao
    Ding, Henghui
    Liu, Jiajun
    Wang, Yitong
    Tang, Yansong
    Yang, Yujiu
    Feng, Jiashi
    Zhao, Yao
    Wei, Yunchao
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 797 - 807
  • [23] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
    Barsellotti, Luca
    Amoroso, Roberto
    Baraldi, Lorenzo
    Cucchiara, Rita
    [J]. IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208
  • [24] A survey on weakly supervised 3D point cloud semantic segmentation
    Wang, Jingyi
    Liu, Yu
    Tan, Hanlin
    Zhang, Maojun
    [J]. IET COMPUTER VISION, 2024, 18 (03) : 329 - 342
  • [25] GECNN for Weakly Supervised Semantic Segmentation of 3D Point Clouds
    He, Zifen
    Zhu, Shouye
    Huang, Ying
    Zhang, Yinhui
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (12) : 2237 - 2243
  • [26] A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
    Zhu, Chaoyang
    Chen, Long
    [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46 (12) : 8954 - 8975
  • [27] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
    Liang, Feng
    Wu, Bichen
    Dai, Xiaoliang
    Li, Kunpeng
    Zhao, Yinan
    Zhang, Hang
    Zhang, Peizhao
    Vajda, Peter
    Marculescu, Diana
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7061 - 7070
  • [28] OV-VIS: Open-Vocabulary Video Instance Segmentation
    Wang, Haochen
    Yan, Cilin
    Chen, Keyan
    Jiang, Xiaolong
    Tang, Xu
    Hu, Yao
    Kang, Guoliang
    Xie, Weidi
    Gavves, Efstratios
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 5048 - 5065
  • [29] OV-PARTS: Towards Open-Vocabulary Part Segmentation
    Wei, Meng
    Yue, Xiaoyu
    Zhang, Wenwei
    Kong, Shu
    Liu, Xihui
    Pang, Jiangmiao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] TAG: Guidance-Free Open-Vocabulary Semantic Segmentation
    Kawano, Yasufumi
    Aoki, Yoshimitsu
    [J]. IEEE ACCESS, 2024, 12 : 88322 - 88331