Panoptic Vision-Language Feature Fields

被引：2

作者：

Chen, Haoran ^{[1
]}

Blomqvist, Kenneth ^{[1
]}

Milano, Francesco ^{[1
]}

Siegwart, Roland ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Autonomous Syst Lab, CH-8092 Zurich, Switzerland

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 03期

基金：

欧盟地平线“2020”;

关键词：

Semantics; Three-dimensional displays; Semantic segmentation; Self-supervised learning; Instance segmentation; Image reconstruction; Computational modeling; Semantic scene understanding; deep learning for visual perception; 3D open vocabulary panoptic segmentation; neural implicit representation;

D O I：

10.1109/LRA.2024.3354624

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes based on text descriptions provided during runtime. In this letter, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF), learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model, and jointly fits an instance feature field through contrastive learning using 2D instance segments on input frames. Despite not being trained on the target classes, our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, ScanNet and Replica dataset and additionally outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We ablate the components of our method to demonstrate the effectiveness of our model architecture.

引用

页码：2144 / 2151

页数：8

共 50 条

[1] Neural Implicit Vision-Language Feature Fields
Blomqvist, Kenneth
Milano, Francesco
Chung, Jen Jen
Ott, Lionel
Siegwart, Roland
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1313 - 1318
[2] PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
Gao, Yuting
Liu, Jinfeng
Xu, Zihan
Zhang, Jun
Li, Ke
Shen, Chunhua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Feature distillation from vision-language model for semisupervised action classification
Celik, Asli
Kucukmanisa, Ayhan
Urhan, Oguzhan
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2023, 31 (06) : 1129 - 1145
[4] CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
International Journal of Computer Vision, 2024, 132 (2) : 581 - 595
[5] Improving multimodal sentiment prediction through vision-language feature interaction
An, Jieyu
Ding, Binfen
Zainon, Wan Mohd Nazmee Wan
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[6] CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Gao, Peng
Geng, Shijie
Zhang, Renrui
Ma, Teli
Fang, Rongyao
Zhang, Yongfeng
Li, Hongsheng
Qiao, Yu
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (02) : 581 - 595
[7] Vision-Language Models for Vision Tasks: A Survey
Zhang, Jingyi
Huang, Jiaxing
Jin, Sheng
Lu, Shijian
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
[8] Enriching visual feature representations for vision-language tasks using spectral transforms
Ondeng, Oscar
Ouma, Heywood
Akuon, Peter
IMAGE AND VISION COMPUTING, 2025, 154
[9] Vision-Language Models for Feature Detection of Macular Diseases on Optical Coherence Tomography
Antaki, Fares
Chopra, Reena
Keane, Pearse A.
JAMA OPHTHALMOLOGY, 2024, 142 (06) : 573 - 576
[10] Boosting adversarial transferability in vision-language models via multimodal feature heterogeneity
Chen, Long
Chen, Yuling
Ouyang, Zhi
Dou, Hui
Zhang, Yangwen
Sang, Haiwei
SCIENTIFIC REPORTS, 2025, 15 (01):

← 1 2 3 4 5 →