Visual In-Context Learning for Large Vision-Language Models

被引:0
|
作者
Zhou, Yucheng [1 ]
Le, Xiang [2 ]
Wang, Qianning [3 ]
Shen, Jianbing [1 ]
机构
[1] Univ Macau, CIS, SKL IOTSC, Taipa, Macao, Peoples R China
[2] Tianjin Univ, Tianjin, Peoples R China
[3] Nanjing Audit Univ, Nanjing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via "Retrieval & Rerank" paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining.
引用
收藏
页码:15890 / 15902
页数:13
相关论文
共 50 条
  • [21] Symbol tuning improves in-context learning in language models
    Wei, Jerry
    Hou, Le
    Lampinen, Andrew
    Chen, Xiangning
    Huang, Da
    Tay, Yi
    Chen, Xinyun
    Lu, Yifeng
    Zhou, Denny
    Ma, Tengyu
    Le, Quoc V.
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 968 - 979
  • [22] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
    Leng, Sicong
    Zhang, Hang
    Chen, Guanzheng
    Li, Xin
    Lug, Shijian
    Miao, Chunyan
    Bing, Lidong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13872 - 13882
  • [23] IVTP: Instruction-Guided Visual Token Pruning for Large Vision-Language Models
    Huang, Kai
    Zou, Hao
    Xi, Ye
    Wang, BoChen
    Xie, Zhen
    Yu, Liang
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 214 - 230
  • [24] Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning
    Wang, Xinyi
    Zhu, Wanrong
    Saxon, Michael
    Steyvers, Mark
    Wang, William Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [25] Attention Prompting on Image for Large Vision-Language Models
    Yu, Runpeng
    Yu, Weihao
    Wang, Xinchao
    COMPUTER VISION - ECCV 2024, PT XXX, 2025, 15088 : 251 - 268
  • [26] Effectiveness assessment of recent large vision-language models
    Yao Jiang
    Xinyu Yan
    Ge-Peng Ji
    Keren Fu
    Meijun Sun
    Huan Xiong
    Deng-Ping Fan
    Fahad Shahbaz Khan
    Visual Intelligence, 2 (1):
  • [27] Evaluating Attribute Comprehension in Large Vision-Language Models
    Zhang, Haiwen
    Yang, Zixi
    Liu, Yuanzhi
    Wang, Xinran
    He, Zheqi
    Liang, Kongming
    Ma, Zhanyu
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 98 - 113
  • [28] On Evaluating Adversarial Robustness of Large Vision-Language Models
    Zhao, Yunqing
    Pang, Tianyu
    Du, Chao
    Yang, Xiao
    Li, Chongxuan
    Cheung, Ngai-Man
    Lin, Min
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] Evaluating Object Hallucination in Large Vision-Language Models
    Li, Yifan
    Du, Yifan
    Zhou, Kun
    Wang, Jinpeng
    Zhao, Wayne Xin
    Wen, Ji-Rong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 292 - 305
  • [30] Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition
    Sun, Hongbo
    He, Xiangteng
    Zhou, Jiahuan
    Peng, Yuxin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5828 - 5836