CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models

被引:0
|
作者
Huang, Min [1 ]
Yang, Chen [1 ]
Yu, Xiaoyan [1 ]
机构
[1] Zhengzhou Univ Light Ind, Zhengzhou 450001, Peoples R China
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
关键词
CLIP; CuTCP; Prompt learning; TCP; VLMs;
D O I
10.1038/s41598-025-85838-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its reliance on fixed text templates as prior knowledge can limit adaptability to fine-grained category distinctions. To address this, we propose Custom Text Generation-based Class-aware Prompt Tuning (CuTCP). CuTCP leverages large language models to generate descriptive, category-specific prompts, embedding richer semantic information that enhances the model's ability to differentiate between known and unseen categories. Compared with TCP, CuTCP achieves an improvement of 0.74% on new classes and 0.44% on overall harmonic mean, averaged over 11 diverse image datasets. Experimental results demonstrate that CuTCP addresses the limitations of general prompt templates, significantly improving model adaptability and generalization capability, with particularly strong performance in fine-grained classification tasks.
引用
收藏
页数:11
相关论文
共 7 条
  • [1] TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
    Yao, Hantao
    Zhang, Rui
    Xu, Changsheng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23438 - 23448
  • [2] VTPL: Visual and text prompt learning for visual-language models
    Sun, Bo
    Wu, Zhichao
    Zhang, Hao
    He, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
  • [3] SDPT: Synchronous Dual Prompt Tuning for Fusion-Based Visual-Language Pre-trained Models
    Zhou, Yang
    Wu, Yongjian
    Saiyin, Jiya
    Wei, Bingzheng
    Lai, Maode
    Chang, Eric
    Xu, Yan
    COMPUTER VISION - ECCV 2024, PT XLIX, 2025, 15107 : 340 - 356
  • [4] Context-Aware Prompt for Generation-based Event Argument Extraction with Diffusion Models
    Luo, Lei
    Xu, Yajing
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1717 - 1725
  • [5] Affective Prompt-Tuning-Based Language Model for Semantic-Based Emotional Text Generation
    Gu, Zhaodong
    He, Kejing
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2024, 20 (01)
  • [6] Category-instance distillation based on visual-language models for rehearsal-free class incremental learning
    Jin, Weilong
    Wang, Zilei
    Zhang, Yixin
    IET COMPUTER VISION, 2024,
  • [7] F-SCP: An automatic prompt generation method for specific classes based on visual language pre-training models
    Han, Baihong
    Jiang, Xiaoyan
    Fang, Zhijun
    Fujita, Hamido
    Gao, Yongbin
    PATTERN RECOGNITION, 2024, 147