Panoptic Vision-Language Feature Fields

被引:2
|
作者
Chen, Haoran [1 ]
Blomqvist, Kenneth [1 ]
Milano, Francesco [1 ]
Siegwart, Roland [1 ]
机构
[1] Swiss Fed Inst Technol, Autonomous Syst Lab, CH-8092 Zurich, Switzerland
基金
欧盟地平线“2020”;
关键词
Semantics; Three-dimensional displays; Semantic segmentation; Self-supervised learning; Instance segmentation; Image reconstruction; Computational modeling; Semantic scene understanding; deep learning for visual perception; 3D open vocabulary panoptic segmentation; neural implicit representation;
D O I
10.1109/LRA.2024.3354624
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes based on text descriptions provided during runtime. In this letter, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF), learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model, and jointly fits an instance feature field through contrastive learning using 2D instance segments on input frames. Despite not being trained on the target classes, our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, ScanNet and Replica dataset and additionally outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We ablate the components of our method to demonstrate the effectiveness of our model architecture.
引用
收藏
页码:2144 / 2151
页数:8
相关论文
共 50 条
  • [41] Core Challenges in Embodied Vision-Language Planning
    Francis J.
    Kitamura N.
    Labelle F.
    Lu X.
    Navarro I.
    Oh J.
    Journal of Artificial Intelligence Research, 2022, 74 : 459 - 515
  • [42] Vision-Language Models for Robot Success Detection
    Luo, Fiona
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23750 - 23752
  • [43] Robust Calibration of Large Vision-Language Adapters
    Murugesan, Balamurali
    Silva-Rodriguez, Julio
    Ben Ayed, Ismail
    Dolz, Jose
    COMPUTER VISION - ECCV 2024, PT XXIV, 2025, 15082 : 147 - 165
  • [44] Vision-Language Navigation with Random Environmental Mixup
    Liu, Chong
    Zhu, Fengda
    Chang, Xiaojun
    Liang, Xiaodan
    Ge, Zongyuan
    Shen, Yi-Dong
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1624 - 1634
  • [45] A vision-language foundation model for clinical oncology
    Skourti, Eleni
    NATURE CANCER, 2025, 6 (02) : 226 - 226
  • [46] A vision-language foundation model for precision oncology
    Xiang, Jinxi
    Wang, Xiyue
    Zhang, Xiaoming
    Xi, Yinghua
    Eweje, Feyisope
    Chen, Yijiang
    Li, Yuchen
    Bergstrom, Colin
    Gopaulchan, Matthew
    Kim, Ted
    Yu, Kun-Hsing
    Willens, Sierra
    Olguin, Francesca Maria
    Nirschl, Jeffrey J.
    Neal, Joel
    Diehn, Maximilian
    Yang, Sen
    Li, Ruijiang
    NATURE, 2025, : 769 - 778
  • [47] Adversarial Prompt Tuning for Vision-Language Models
    Zhang, Jiaming
    Ma, Xingjun
    Wang, Xin
    Qiu, Lingyu
    Wang, Jiaqi
    Jiang, Yu-Gang
    Sang, Jitao
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 56 - 72
  • [48] Vision-language integration in AI: a reality check
    Pastra, K
    Wilks, Y
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 937 - 941
  • [49] Vision-Language Navigation Policy Learning and Adaptation
    Wang, Xin
    Huang, Qiuyuan
    Celikyilmaz, Asli
    Gao, Jianfeng
    Shen, Dinghan
    Wang, Yuan-Fang
    Wang, William Yang
    Zhang, Lei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4205 - 4216
  • [50] Core Challenges in Embodied Vision-Language Planning
    Francis, Jonathan
    Kitamura, Nariaki
    Labelle, Felix
    Lu, Xiaopeng
    Navarro, Ingrid
    Oh, Jean
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6878 - 6883