Efficient Transfer Learning for Visual Tasks via Continuous Optimization of Prompts

被引:3
|
作者
Conder, Jonathan [1 ]
Jefferson, Josephine [1 ]
Pages, Nathan [1 ]
Jawed, Khurram [1 ]
Nejati, Alireza [1 ]
Sagar, Mark [1 ]
机构
[1] Soul Machines, Auckland, New Zealand
关键词
Computer vision; Few-shot; Fine-tuning; Prompt engineering; Prefix-tuning; CLIP; Transformers; Vision transformers; LAND-USE; BENCHMARK; EUROSAT; DATASET;
D O I
10.1007/978-3-031-06427-2_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional methods for adapting pre-trained vision models to downstream tasks involve fine-tuning some or all of the model's parameters. There are a number of trade-offs with this approach. When too many parameters are fine-tuned, the model may lose the benefits associated with pre-training, such as the ability to generalize to out-of-distribution data. But, if instead too few parameters are fine-tuned, the model may be unable to adapt effectively for the tasks downstream. In this paper, we propose Visual Prompt Tuning (VPT) as an alternative to fine-tuning for Transformer-based vision models. Our method is closely related to, and inspired by, prefix-tuning of language models [22]. We find that, by adding additional parameters to a pre-trained model, VPT offers similar performance to fine-tuning the final layer. In addition, for low-data settings and for specialized tasks, such as traffic sign recognition, satellite photo recognition and handwriting classification, the performance of Transformer-based vision models is improved with the use of VPT.
引用
收藏
页码:297 / 309
页数:13
相关论文
共 50 条
  • [1] Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks
    Liu, Hongye
    Xie, Xianhai
    Gao, Yang
    Yu, Zhou
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 387 - 396
  • [2] ETL: Efficient Transfer Learning for Face Tasks
    John, Thrupthi Ann
    Dua, Isha
    Balasubramanian, Vineeth N.
    Jawahar, C., V
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 248 - 257
  • [3] Transfer of perceptual learning between different visual tasks
    McGovern, David P.
    Webb, Ben S.
    Peirce, Jonathan W.
    JOURNAL OF VISION, 2012, 12 (11):
  • [4] "Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer
    Hundt, Andrew
    Killeen, Benjamin
    Greene, Nicholas
    Wu, Hongtao
    Kwon, Heeyeon
    Paxton, Chris
    Hager, Gregory D.
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 6724 - 6731
  • [5] Learning to Learn Better Visual Prompts
    Wang, Fengxiang
    Huang, Wanrong
    Yang, Shaowu
    Qi, Fan
    Lan, Long
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5354 - 5363
  • [6] Learning to Transfer Prompts for Text Generation
    Li, Junyi
    Tang, Tianyi
    Nie, Jian-Yun
    Wen, Ji-Rong
    Zhaol, Wayne Xin
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3506 - 3518
  • [7] ROBUST VISUAL TRACKING VIA TRANSFER LEARNING
    Luo, Wenhan
    Li, Xi
    Li, Wei
    Hu, Weiming
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011, : 485 - 488
  • [8] Transfer of perceptual learning in a hierarchy of mid-level visual tasks
    Peirce, J. W.
    McGovern, D. P.
    Webb, B. S.
    PERCEPTION, 2010, 39 : 64 - 64
  • [9] Reactive power optimization via deep transfer reinforcement learning for efficient adaptation to multiple scenarios
    Bi, Congbo
    Liu, Di
    Zhu, Lipeng
    Lu, Chao
    Li, Shiyang
    Tang, Yingqi
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2025, 164
  • [10] LEARNING EARLY VISUAL TASKS
    POGGIO, TA
    HURLBERT, AC
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 1992, 33 (04) : 826 - 826