MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

被引:0
|
作者
Monajatipoor, Masoud [1 ]
Li, Liunian Harold [1 ]
Rouhsedaghat, Mozhdeh [1 ]
Yang, Lin F. [1 ]
Chang, Kai-Wei [1 ]
机构
[1] UCLA, Los Angeles, CA 90024 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to the VL domain? Specifically, we first meta-trains a language model to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having similar to 20 times fewer parameters.
引用
收藏
页码:495 / 508
页数:14
相关论文
共 50 条
  • [1] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    [J]. International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [2] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [3] Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
    Wu, Wenhao
    Sun, Zhun
    Song, Yuxin
    Wang, Jingdong
    Ouyang, Wanli
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (02) : 392 - 409
  • [4] Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
    Wenhao Wu
    Zhun Sun
    Yuxin Song
    Jingdong Wang
    Wanli Ouyang
    [J]. International Journal of Computer Vision, 2024, 132 (2) : 392 - 409
  • [5] SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
    Chen, Yi-Syuan
    Song, Yun-Zhu
    Yeo, Cheng Yu
    Liu, Bei
    Fu, Jianlong
    Shuai, Hong-Han
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15384 - 15396
  • [6] Exploring Vision-Language Models for Imbalanced Learning
    Wang Y.
    Yu Z.
    Wang J.
    Heng Q.
    Chen H.
    Ye W.
    Xie R.
    Xie X.
    Zhang S.
    [J]. International Journal of Computer Vision, 2024, 132 (01) : 224 - 237
  • [7] Conditional Prompt Learning for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16795 - 16804
  • [8] ECO: Ensembling Context Optimization for Vision-Language Models
    Agnolucci, Lorenzo
    Baldrati, Alberto
    Todino, Francesco
    Becattini, Federico
    Bertini, Marco
    Del Bimbo, Alberto
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2803 - 2807
  • [9] Compositional Kronecker Context Optimization for vision-language models
    Ding, Kun
    Li, Xiaohui
    Yu, Qiang
    Wang, Ying
    Zhang, Haojian
    Xiang, Shiming
    [J]. NEUROCOMPUTING, 2024, 608
  • [10] Adaptive In-Context Learning with Large Language Models for Bundle
    Sun, Zhu
    Feng, Kaidong
    Yang, Jie
    Qu, Xinghua
    Fang, Hui
    Ong, Yew-Soon
    Liu, Wenyuan
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 966 - 976