Visual In-Context Learning for Large Vision-Language Models

被引:0
|
作者
Zhou, Yucheng [1 ]
Le, Xiang [2 ]
Wang, Qianning [3 ]
Shen, Jianbing [1 ]
机构
[1] Univ Macau, CIS, SKL IOTSC, Taipa, Macao, Peoples R China
[2] Tianjin Univ, Tianjin, Peoples R China
[3] Nanjing Audit Univ, Nanjing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via "Retrieval & Rerank" paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining.
引用
收藏
页码:15890 / 15902
页数:13
相关论文
共 50 条
  • [41] Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
    Zeng, Yunan
    Huang, Yan
    Zhang, Jinjin
    Jie, Zequn
    Chai, Zhenhua
    Wang, Liang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14141 - 14151
  • [42] In-Context Impersonation Reveals Large Language Models' Strengths and Biases
    Salewski, Leonard
    Alaniz, Stephan
    Rio-Torto, Isabel
    Schulz, Eric
    Akata, Zeynep
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [43] Unifying Visual and Vision-Language Tracking via Contrastive Learning
    Ma, Yinchao
    Tang, Yuyang
    Yang, Wenfei
    Zhang, Tianzhu
    Zhang, Jinpeng
    Kang, Mengxue
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4107 - 4116
  • [44] Iterative Forward Tuning Boosts In-Context Learning in Language Models
    Yang, Jiaxi
    Hui, Binyuan
    Yang, Min
    Wang, Bailin
    Li, Bowen
    Li, Binhua
    Huang, Fei
    Li, Yongbin
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 15460 - 15473
  • [45] Integrating advanced vision-language models for context recognition in risks assessment
    Rodriguez-Juan, Javier
    Ortiz-Perez, David
    Garcia-Rodriguez, Jose
    Tomas, David
    Nalepa, Grzegorz J.
    NEUROCOMPUTING, 2025, 618
  • [46] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
  • [47] GalLoP: Learning Global and Local Prompts for Vision-Language Models
    Lafon, Marc
    Ramzi, Elias
    Rambour, Clement
    Audebert, Nicolas
    Thome, Nicolas
    COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 : 264 - 282
  • [48] Adapting Vision-Language Models via Learning to Inject Knowledge
    Xuan, Shiyu
    Yang, Ming
    Zhang, Shiliang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5798 - 5809
  • [49] JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
    Guo, Yuncheng
    Guo, Xiaodong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 28695 - 28705
  • [50] JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
    Jin, Haibo
    Hu, Leyang
    Li, Xinnuo
    Zhang, Peiyan
    Chen, Chonghan
    Zhuang, Jun
    Wang, Haohan
    arXiv,