Visual In-Context Learning for Large Vision-Language Models

被引:0
|
作者
Zhou, Yucheng [1 ]
Le, Xiang [2 ]
Wang, Qianning [3 ]
Shen, Jianbing [1 ]
机构
[1] Univ Macau, CIS, SKL IOTSC, Taipa, Macao, Peoples R China
[2] Tianjin Univ, Tianjin, Peoples R China
[3] Nanjing Audit Univ, Nanjing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via "Retrieval & Rerank" paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining.
引用
收藏
页码:15890 / 15902
页数:13
相关论文
共 50 条
  • [1] MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models
    Monajatipoor, Masoud
    Li, Liunian Harold
    Rouhsedaghat, Mozhdeh
    Yang, Lin F.
    Chang, Kai-Wei
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 495 - 508
  • [2] SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
    Chen, Yi-Syuan
    Song, Yun-Zhu
    Yeo, Cheng Yu
    Liu, Bei
    Fu, Jianlong
    Shuai, Hong-Han
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15384 - 15396
  • [3] Active Learning Principles for In-Context Learning with Large Language Models
    Margatina, Katerina
    Schick, Timo
    Aletras, Nikolaos
    Dwivedi-Yu, Jane
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5011 - 5034
  • [4] Learning to Retrieve In-Context Examples for Large Language Models
    Wang, Liang
    Yang, Nan
    Wei, Furu
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1752 - 1767
  • [5] Adaptive In-Context Learning with Large Language Models for Bundle
    Sun, Zhu
    Feng, Kaidong
    Yang, Jie
    Qu, Xinghua
    Fang, Hui
    Ong, Yew-Soon
    Liu, Wenyuan
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 966 - 976
  • [6] Are Emergent Abilities in Large Language Models just In-Context Learning?
    Lu, Sheng
    Bigoulaeva, Irina
    Sachdeva, Rachneet
    Madabushi, Harish Tayyar
    Gurevych, Iryna
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5098 - 5139
  • [7] Learning the Visualness of Text Using Large Vision-Language Models
    Verma, Gaurav
    Rossi, Ryan A.
    Tensmeyer, Christopher
    Gu, Jiuxiang
    Nenkova, Ani
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2394 - 2408
  • [8] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [9] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [10] Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning
    Alves, Duarte M.
    Guerreirol, Nuno M.
    Alves, Joao
    Pombal, Jose
    Rei, Ricardo
    de Souza, Jose G. C.
    Colombo, Pierre
    Martins, Andre F. T.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11127 - 11148