Panoptic Vision-Language Feature Fields

被引：2

作者：

Chen, Haoran ^{[1
]}

Blomqvist, Kenneth ^{[1
]}

Milano, Francesco ^{[1
]}

Siegwart, Roland ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Autonomous Syst Lab, CH-8092 Zurich, Switzerland

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 03期

基金：

欧盟地平线“2020”;

关键词：

Semantics; Three-dimensional displays; Semantic segmentation; Self-supervised learning; Instance segmentation; Image reconstruction; Computational modeling; Semantic scene understanding; deep learning for visual perception; 3D open vocabulary panoptic segmentation; neural implicit representation;

D O I：

10.1109/LRA.2024.3354624

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes based on text descriptions provided during runtime. In this letter, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF), learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model, and jointly fits an instance feature field through contrastive learning using 2D instance segments on input frames. Despite not being trained on the target classes, our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, ScanNet and Replica dataset and additionally outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We ablate the components of our method to demonstrate the effectiveness of our model architecture.

引用

页码：2144 / 2151

页数：8

共 50 条

[21] TVLT: Textless Vision-Language Transformer
Tang, Zineng
Cho, Jaemin
Nie, Yixin
Bansal, Mohit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[22] The Neglected Tails in Vision-Language Models
Parashar, Shubham
Lin, Zhiqiu
Liu, Tian
Dong, Xiangjue
Li, Yanan
Ramanan, Deva
Caverlee, James
Kong, Shu
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12988 - 12997
[23] Vision-language navigation: a survey and taxonomy
Wansen Wu
Tao Chang
Xinmeng Li
Quanjun Yin
Yue Hu
Neural Computing and Applications, 2024, 36 : 3291 - 3316
[24] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
Du, Yuqing
Konyushkova, Ksenia
Denil, Misha
Raju, Akhil
Landon, Jessica
Hill, Felix
de Freitas, Nando
Cabi, Serkan
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136
[25] Vision-Language Fusion for Object Recognition
Shiang, Sz-Rung
Rosenthal, Stephanie
Gershman, Anatole
Carbonell, Jaime
Oh, Jean
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4603 - 4610
[26] 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Xiao, Zihao
Jing, Longlong
Wu, Shangxuan
Zhu, Alex Zihao
Ji, Jingwei
Jiang, Chiyu Max
Hung, Wei-Chih
Funkhouser, Thomas
Kuo, Weicheng
Angelova, Anelia
Zhou, Yin
Sheng, Shiwei
COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 21 - 38
[27] Accelerating Vision-Language Pretraining with Free Language Modeling
Wang, Teng
Ge, Yixiao
Zheng, Feng
Cheng, Ran
Shan, Ying
Qie, Xiaohu
Luo, Ping
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23161 - 23170
[28] Towards Better Vision-Inspired Vision-Language Models
Cao, Yun-Hao
Ji, Kaixiang
Huang, Ziyuan
Zheng, Chuanyang
Liu, Jiajia
Wang, Jian
Chen, Jingdong
Yang, Ming
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13537 - 13547
[29] Enhancing Automatic Placenta Analysis Through Distributional Feature Recomposition in Vision-Language Contrastive Learning
Pan, Yimu
Cai, Tongan
Mehta, Manas
Gernand, Alison D.
Goldstein, Jeffery A.
Mithal, Leena
Mwinyelle, Delia
Gallagher, Kelly
Wang, James Z.
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 116 - 126
[30] Language Features Matter: Effective Language Representations for Vision-Language Tasks
Burns, Andrea
Tan, Reuben
Saenko, Kate
Sclaroff, Stan
Plummer, Bryan A.
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7473 - 7482

← 1 2 3 4 5 →