Panoptic Vision-Language Feature Fields

被引：2

作者：

Chen, Haoran ^{[1
]}

Blomqvist, Kenneth ^{[1
]}

Milano, Francesco ^{[1
]}

Siegwart, Roland ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Autonomous Syst Lab, CH-8092 Zurich, Switzerland

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 03期

基金：

欧盟地平线“2020”;

关键词：

Semantics; Three-dimensional displays; Semantic segmentation; Self-supervised learning; Instance segmentation; Image reconstruction; Computational modeling; Semantic scene understanding; deep learning for visual perception; 3D open vocabulary panoptic segmentation; neural implicit representation;

D O I：

10.1109/LRA.2024.3354624

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes based on text descriptions provided during runtime. In this letter, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF), learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model, and jointly fits an instance feature field through contrastive learning using 2D instance segments on input frames. Despite not being trained on the target classes, our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, ScanNet and Replica dataset and additionally outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We ablate the components of our method to demonstrate the effectiveness of our model architecture.

引用

页码：2144 / 2151

页数：8

共 50 条

[41] Core Challenges in Embodied Vision-Language Planning
Francis J.
Kitamura N.
Labelle F.
Lu X.
Navarro I.
Oh J.
Journal of Artificial Intelligence Research, 2022, 74 : 459 - 515
[42] Vision-Language Models for Robot Success Detection
Luo, Fiona
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23750 - 23752
[43] Robust Calibration of Large Vision-Language Adapters
Murugesan, Balamurali
Silva-Rodriguez, Julio
Ben Ayed, Ismail
Dolz, Jose
COMPUTER VISION - ECCV 2024, PT XXIV, 2025, 15082 : 147 - 165
[44] Vision-Language Navigation with Random Environmental Mixup
Liu, Chong
Zhu, Fengda
Chang, Xiaojun
Liang, Xiaodan
Ge, Zongyuan
Shen, Yi-Dong
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1624 - 1634
[45] A vision-language foundation model for clinical oncology
Skourti, Eleni
NATURE CANCER, 2025, 6 (02) : 226 - 226
[46] A vision-language foundation model for precision oncology
Xiang, Jinxi
Wang, Xiyue
Zhang, Xiaoming
Xi, Yinghua
Eweje, Feyisope
Chen, Yijiang
Li, Yuchen
Bergstrom, Colin
Gopaulchan, Matthew
Kim, Ted
Yu, Kun-Hsing
Willens, Sierra
Olguin, Francesca Maria
Nirschl, Jeffrey J.
Neal, Joel
Diehn, Maximilian
Yang, Sen
Li, Ruijiang
NATURE, 2025, : 769 - 778
[47] Adversarial Prompt Tuning for Vision-Language Models
Zhang, Jiaming
Ma, Xingjun
Wang, Xin
Qiu, Lingyu
Wang, Jiaqi
Jiang, Yu-Gang
Sang, Jitao
COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 56 - 72
[48] Vision-language integration in AI: a reality check
Pastra, K
Wilks, Y
ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 937 - 941
[49] Vision-Language Navigation Policy Learning and Adaptation
Wang, Xin
Huang, Qiuyuan
Celikyilmaz, Asli
Gao, Jianfeng
Shen, Dinghan
Wang, Yuan-Fang
Wang, William Yang
Zhang, Lei
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4205 - 4216
[50] Core Challenges in Embodied Vision-Language Planning
Francis, Jonathan
Kitamura, Nariaki
Labelle, Felix
Lu, Xiaopeng
Navarro, Ingrid
Oh, Jean
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6878 - 6883

← 1 2 3 4 5 →