TAG: Guidance-Free Open-Vocabulary Semantic Segmentation

被引:0
|
作者
Kawano, Yasufumi [1 ]
Aoki, Yoshimitsu [1 ]
机构
[1] Keio Univ, Grad Sch Integrated Design Engn, Yokohama, Kanagawa 2238522, Japan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Semantic segmentation; Training; Databases; Annotations; Task analysis; Semantics; Vocabulary; Computer vision; Classification algorithms; open-vocabulary; zero-guidance;
D O I
10.1109/ACCESS.2024.3418210
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive training. Furthermore, because supervised learning uses a limited set of predefined categories, models typically struggle with rare classes and cannot recognize new ones. Unsupervised and open-vocabulary segmentation, proposed to tackle these issues, faces challenges, including the inability to assign specific class labels to clusters and the necessity of user-provided text queries for guidance. In this context, we propose a novel approach, TAG which achieves Training, Annotation, and Guidance-free open-vocabulary semantic segmentation. TAG utilizes pre-trained models such as CLIP and DINO to segment images into meaningful categories without additional training or dense annotations. It retrieves class labels from an external database, providing flexibility to adapt to new scenarios. Our TAG achieves state-of-the-art results on PascalVOC, PascalContext and ADE20K for open-vocabulary segmentation without given class names, i.e. improvement of +15.3 mIoU on PascalVOC.
引用
收藏
页码:88322 / 88331
页数:10
相关论文
共 50 条
  • [1] Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Hu, Han
    Bai, Xiang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2945 - 2954
  • [2] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
    Barsellotti, Luca
    Amoroso, Roberto
    Baraldi, Lorenzo
    Cucchiara, Rita
    [J]. IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208
  • [3] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
    Liang, Feng
    Wu, Bichen
    Dai, Xiaoliang
    Li, Kunpeng
    Zhao, Yinan
    Zhang, Hang
    Zhang, Peizhao
    Vajda, Peter
    Marculescu, Diana
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7061 - 7070
  • [4] Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
    Zhang, Fei
    Zhou, Tianfei
    Li, Boyang
    He, Hao
    Ma, Chaofan
    Zhang, Tianjiao
    Yao, Jiangchao
    Zhang, Ya
    Wang, Yanfeng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
    Shi, Hengcan
    Dao, Son Duy
    Cai, Jianfei
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024,
  • [6] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Hu, Han
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15546 - 15561
  • [7] Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
    Han, Cong
    Zhong, Yujie
    Li, Dengjie
    Han, Kai
    Ma, Lin
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1086 - 1096
  • [8] Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic Segmentation
    Dao, Son Duy
    Shi, Hengcan
    Phung, Dinh
    Cai, Jianfei
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8442 - 8453
  • [9] Open-Vocabulary And Multitask Image Segmentation
    Pan, Lihu
    Yang, Yunting
    Wang, Zhengkui
    Shan, Wen
    Yin, Jaili
    [J]. 39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1048 - 1049
  • [10] Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
    Xu, Jilan
    Hou, Junlin
    Zhang, Yuejie
    Feng, Rui
    Wang, Yi
    Qiao, Yu
    Xie, Weidi
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2935 - 2944