Panoptic Vision-Language Feature Fields

被引:2
|
作者
Chen, Haoran [1 ]
Blomqvist, Kenneth [1 ]
Milano, Francesco [1 ]
Siegwart, Roland [1 ]
机构
[1] Swiss Fed Inst Technol, Autonomous Syst Lab, CH-8092 Zurich, Switzerland
基金
欧盟地平线“2020”;
关键词
Semantics; Three-dimensional displays; Semantic segmentation; Self-supervised learning; Instance segmentation; Image reconstruction; Computational modeling; Semantic scene understanding; deep learning for visual perception; 3D open vocabulary panoptic segmentation; neural implicit representation;
D O I
10.1109/LRA.2024.3354624
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes based on text descriptions provided during runtime. In this letter, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF), learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model, and jointly fits an instance feature field through contrastive learning using 2D instance segments on input frames. Despite not being trained on the target classes, our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, ScanNet and Replica dataset and additionally outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We ablate the components of our method to demonstrate the effectiveness of our model architecture.
引用
收藏
页码:2144 / 2151
页数:8
相关论文
共 50 条
  • [21] TVLT: Textless Vision-Language Transformer
    Tang, Zineng
    Cho, Jaemin
    Nie, Yixin
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [22] The Neglected Tails in Vision-Language Models
    Parashar, Shubham
    Lin, Zhiqiu
    Liu, Tian
    Dong, Xiangjue
    Li, Yanan
    Ramanan, Deva
    Caverlee, James
    Kong, Shu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12988 - 12997
  • [23] Vision-language navigation: a survey and taxonomy
    Wansen Wu
    Tao Chang
    Xinmeng Li
    Quanjun Yin
    Yue Hu
    Neural Computing and Applications, 2024, 36 : 3291 - 3316
  • [24] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
    Du, Yuqing
    Konyushkova, Ksenia
    Denil, Misha
    Raju, Akhil
    Landon, Jessica
    Hill, Felix
    de Freitas, Nando
    Cabi, Serkan
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136
  • [25] Vision-Language Fusion for Object Recognition
    Shiang, Sz-Rung
    Rosenthal, Stephanie
    Gershman, Anatole
    Carbonell, Jaime
    Oh, Jean
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4603 - 4610
  • [26] 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
    Xiao, Zihao
    Jing, Longlong
    Wu, Shangxuan
    Zhu, Alex Zihao
    Ji, Jingwei
    Jiang, Chiyu Max
    Hung, Wei-Chih
    Funkhouser, Thomas
    Kuo, Weicheng
    Angelova, Anelia
    Zhou, Yin
    Sheng, Shiwei
    COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 21 - 38
  • [27] Accelerating Vision-Language Pretraining with Free Language Modeling
    Wang, Teng
    Ge, Yixiao
    Zheng, Feng
    Cheng, Ran
    Shan, Ying
    Qie, Xiaohu
    Luo, Ping
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23161 - 23170
  • [28] Towards Better Vision-Inspired Vision-Language Models
    Cao, Yun-Hao
    Ji, Kaixiang
    Huang, Ziyuan
    Zheng, Chuanyang
    Liu, Jiajia
    Wang, Jian
    Chen, Jingdong
    Yang, Ming
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13537 - 13547
  • [29] Enhancing Automatic Placenta Analysis Through Distributional Feature Recomposition in Vision-Language Contrastive Learning
    Pan, Yimu
    Cai, Tongan
    Mehta, Manas
    Gernand, Alison D.
    Goldstein, Jeffery A.
    Mithal, Leena
    Mwinyelle, Delia
    Gallagher, Kelly
    Wang, James Z.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 116 - 126
  • [30] Language Features Matter: Effective Language Representations for Vision-Language Tasks
    Burns, Andrea
    Tan, Reuben
    Saenko, Kate
    Sclaroff, Stan
    Plummer, Bryan A.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7473 - 7482