Open-Vocabulary And Multitask Image Segmentation

被引:0
|
作者
Pan, Lihu [1 ]
Yang, Yunting [1 ]
Wang, Zhengkui [2 ]
Shan, Wen [3 ]
Yin, Jaili [1 ]
机构
[1] Taiyuan Univ Sci & Technol, Taiyuan, Peoples R China
[2] Singapore Inst Technol, Infocomm Technol Cluster, Singapore, Singapore
[3] Singapore Univ Social Sci, Singapore, Singapore
关键词
Image segmentation; Adaptive prompt learning; Image-text fusion; Multitask;
D O I
10.1145/3605098.3636192
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Open-vocabulary learning has revolutionized image segmentation, enabling the delineation of arbitrary categories from textual descriptions. While current methods often employ specialized architectures, OVAMTSeg presents a unified framework for Open-Vocabulary and Multitask Image Segmentation. Leveraging adaptive prompt learning, OVAMTSeg excels in capturing category-sensitive concepts, ensuring robustness across diverse multi-task scenarios. Text prompts effectively capture semantic and contextual features, while cross-attention and cross-modal interactions facilitate seamless fusion of image and text features. The framework incorporates a transformer-based decoder for dense prediction. Experimental results demonstrate OVAMTSeg's effectiveness, achieving a 47.5 mIoU in referring expression segmentation, 51.6 mIoU on Pascal-VOC with four unseen classes, 46.6 mIoU on Pascal-Context in zero-shot segmentation, 65.9 mIoU on Pascal-5i, and 35.7 mIoU on COCO-20i datasets for one-shot segmentation.
引用
收藏
页码:1048 / 1049
页数:2
相关论文
共 50 条
  • [1] MasQCLIP for Open-Vocabulary Universal Image Segmentation
    Xu, Xin
    Xiong, Tianyi
    Ding, Zheng
    Tu, Zhuowen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 887 - 898
  • [2] Hierarchical Open-vocabulary Universal Image Segmentation
    Wang, Xudong
    Li, Shufan
    Kallidromitis, Konstantinos
    Kato, Yusuke
    Kozuka, Kazuki
    Darrell, Trevor
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
    Qin, Jie
    Wu, Jie
    Yan, Pengxiang
    Li, Ming
    Ren Yuxi
    Xiao, Xuefeng
    Wang, Yitong
    Wang, Rui
    Wen, Shilei
    Pan, Xin
    Wang, Xingang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19446 - 19455
  • [4] Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
    Ghiasi, Golnaz
    Gu, Xiuye
    Cui, Yin
    Lin, Tsung-Yi
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 540 - 557
  • [5] USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
    Wang, Xiaoqi
    He, Wenbin
    Xuan, Xiwei
    Sebastian, Clint
    Ono, Jorge Piazentin
    Li, Xin
    Behpour, Sima
    Thang Doan
    Gou, Liang
    Shen, Han-Wei
    Ren, Liu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 4187 - 4196
  • [6] Image-text aggregation for open-vocabulary semantic segmentation
    Cheng, Shengyang
    Huang, Jianyong
    Wang, Xiaodong
    Huang, Lei
    Wei, Zhiqiang
    NEUROCOMPUTING, 2025, 630
  • [7] Diffusion Models for Open-Vocabulary Segmentation
    Karazija, Laurynas
    Laina, Iro
    Vedaldi, Andrea
    Rupprecht, Christian
    COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 299 - 317
  • [8] Open-Vocabulary Camouflaged Object Segmentation
    Pang, Youwei
    Zhao, Xiaoqi
    Zuo, Jiaming
    Zhang, Lihe
    Lu, Huchuan
    COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 476 - 495
  • [9] Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
    Xu, Jiarui
    Liu, Sifei
    Vahdat, Arash
    Byeon, Wonmin
    Wang, Xiaolong
    De Meo, Shalini
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2955 - 2966
  • [10] Open-vocabulary Panoptic Segmentation with Embedding Modulation
    Chen, Xi
    Li, Shuang
    Lim, Ser-Nam
    Torralba, Antonio
    Zhao, Hengshuang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1141 - 1150