Weakly Supervised 3D Open-vocabulary Segmentation

被引:0
|
作者
Liu, Kunhao [1 ]
Zhan, Fangneng [2 ]
Zhang, Jiahui [1 ]
Xu, Muyu [1 ]
Yu, Yingchen [1 ]
El Saddik, Abdulmotaleb [3 ,5 ]
Theobalt, Christian [2 ]
Xing, Eric [4 ,5 ]
Lu, Shijian [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Max Planck Inst Informat, Saarbrucken, Germany
[3] Univ Ottawa, Ottawa, ON K1N 6N5, Canada
[4] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[5] MBZUAI, Abu Dhabi, U Arab Emirates
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner. Specifically, given only the open-vocabulary text descriptions of the objects in a scene, we distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF), which effectively lifts 2D features into view-consistent 3D segmentation. A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process. Extensive experiments show that our method even outperforms fully supervised models trained with segmentation annotations in certain scenes, suggesting that 3D open-vocabulary segmentation can be effectively learned from 2D images and text-image pairs. Code is available at https://github.com/Kunhao-Liu/3D-OVS.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Weakly Supervised Open-Vocabulary Object Detection
    Lin, Jianghang
    Shen, Yunhang
    Wang, Bingquan
    Lin, Shaohui
    Li, Ke
    Cao, Liujuan
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3404 - 3412
  • [2] OpenMask3D: Open-Vocabulary 3D Instance Segmentation
    Takmaz, Ayca
    Fedele, Elisabetta
    Sumner, Robert W.
    Pollefeys, Marc
    Tombari, Federico
    Engelmann, Francis
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
    Zhang, Fei
    Zhou, Tianfei
    Li, Boyang
    He, Hao
    Ma, Chaofan
    Zhang, Tianjiao
    Yao, Jiangchao
    Zhang, Ya
    Wang, Yanfeng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Open-Vocabulary Affordance Detection in 3D Point Clouds
    Toan Nguyen
    Minh Nhat Vu
    An Vuong
    Dzung Nguyen
    Thieu Vo
    Ngan Le
    Anh Nguyen
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5692 - 5698
  • [5] Open-Vocabulary And Multitask Image Segmentation
    Pan, Lihu
    Yang, Yunting
    Wang, Zhengkui
    Shan, Wen
    Yin, Jaili
    [J]. 39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1048 - 1049
  • [6] Open-vocabulary Panoptic Segmentation with Embedding Modulation
    Chen, Xi
    Li, Shuang
    Lim, Ser-Nam
    Torralba, Antonio
    Zhao, Hengshuang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1141 - 1150
  • [7] POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
    Vobecky, Antonin
    Simeoni, Oriane
    Hurych, David
    Gidaris, Spyros
    Bursuc, Andrei
    Perez, Patrick
    Sivic, Josef
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Towards Open-Vocabulary Video Instance Segmentation
    Wang, Haochen
    Yan, Cilin
    Wang, Shuai
    Jiang, Xiaolong
    Tang, Xu
    Hu, Yao
    Xie, Weidi
    Gavves, Efstratios
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4034 - 4043
  • [9] Going Denser with Open-Vocabulary Part Segmentation
    Sun, Peize
    Chen, Shoufa
    Zhu, Chenchen
    Xiao, Fanyi
    Luo, Ping
    Xie, Saining
    Yan, Zhicheng
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15407 - 15419
  • [10] Open-vocabulary Object Segmentation with Diffusion Models
    Li, Ziyi
    Zhou, Qinye
    Zhang, Xiaoyun
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7633 - 7642