MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

被引:0
|
作者
Monajatipoor, Masoud [1 ]
Li, Liunian Harold [1 ]
Rouhsedaghat, Mozhdeh [1 ]
Yang, Lin F. [1 ]
Chang, Kai-Wei [1 ]
机构
[1] UCLA, Los Angeles, CA 90024 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to the VL domain? Specifically, we first meta-trains a language model to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having similar to 20 times fewer parameters.
引用
收藏
页码:495 / 508
页数:14
相关论文
共 50 条
  • [21] Task Residual for Tuning Vision-Language Models
    Yu, Tao
    Lu, Zhihe
    Jin, Xin
    Chen, Zhibo
    Wang, Xinchao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10899 - 10909
  • [22] Perceptual Grouping in Contrastive Vision-Language Models
    Ranasinghe, Kanchana
    McKinzie, Brandon
    Ravi, Sachin
    Yang, Yinfei
    Toshev, Alexander
    Shlens, Jonathon
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5548 - 5561
  • [23] Adventures of Trustworthy Vision-Language Models: A Survey
    Vatsa, Mayank
    Jain, Anubhooti
    Singh, Richa
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22650 - 22658
  • [24] Equivariant Similarity for Vision-Language Foundation Models
    Wang, Tan
    Lin, Kevin
    Li, Linjie
    Lin, Chung-Ching
    Yang, Zhengyuan
    Zhang, Hanwang
    Liu, Zicheng
    Wang, Lijuan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11964 - 11974
  • [25] Concept-Guided Prompt Learning for Generalization in Vision-Language Models
    Zhang, Yi
    Zhang, Ce
    Yu, Ke
    Tang, Yushun
    He, Zhihai
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7377 - 7386
  • [26] Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models
    Wang, Yubin
    Jiang, Xinyang
    Cheng, De
    Li, Dongsheng
    Zhao, Cairong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5749 - 5757
  • [27] LiFT: Transfer Learning in Vision-Language Models for Downstream Adaptation and Generalization
    Li, Jingzheng
    Sun, Hailong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4678 - 4687
  • [28] Cross-Modal Concept Learning and Inference for Vision-Language Models
    Zhang, Yi
    Zhang, Ce
    Tang, Yushun
    He, Zhihai
    NEUROCOMPUTING, 2024, 583
  • [29] Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning
    Wang, Xinyi
    Zhu, Wanrong
    Saxon, Michael
    Steyvers, Mark
    Wang, William Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] WINODICT: Probing language models for in-context word acquisition
    Eisenschlos, Julian Martin
    Cole, Jeremy R.
    Liu, Fangyu
    Cohen, William W.
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 94 - 102