Visual In-Context Learning for Large Vision-Language Models

被引:0
|
作者
Zhou, Yucheng [1 ]
Le, Xiang [2 ]
Wang, Qianning [3 ]
Shen, Jianbing [1 ]
机构
[1] Univ Macau, CIS, SKL IOTSC, Taipa, Macao, Peoples R China
[2] Tianjin Univ, Tianjin, Peoples R China
[3] Nanjing Audit Univ, Nanjing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via "Retrieval & Rerank" paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining.
引用
收藏
页码:15890 / 15902
页数:13
相关论文
共 50 条
  • [31] Automatic smart contract comment generation via large language models and in-context learning
    Zhao, Junjie
    Chen, Xiang
    Yang, Guang
    Shen, Yiheng
    INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 168
  • [32] In-context learning enables multimodal large language models to classify cancer pathology images
    Ferber, Dyke
    Woelflein, Georg
    Wiest, Isabella C.
    Ligero, Marta
    Sainath, Srividhya
    Ghaffari Laleh, Narmin
    El Nahhas, Omar S. M.
    Mueller-Franzes, Gustav
    Jaeger, Dirk
    Truhn, Daniel
    Kather, Jakob Nikolas
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [33] Learning with Enriched Inductive Biases for Vision-Language Models
    Yang, Lingxiao
    Zhang, Ru-Yuan
    Chen, Qi
    Xie, Xiaohua
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [34] Learning Domain Invariant Prompt for Vision-Language Models
    Zhao, Cairong
    Wang, Yubin
    Jiang, Xinyang
    Shen, Yifei
    Song, Kaitao
    Li, Dongsheng
    Miao, Duoqian
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1348 - 1360
  • [35] Query-focused Submodular Demonstration Selection for In-context Learning in Large Language Models
    Trust, Paul
    Minghim, Rosane
    2023 31ST IRISH CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE, AICS, 2023,
  • [36] MAGNIFICO: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
    Patel, Arkil
    Bhattamishra, Satwik
    Reddy, Siva
    Bahdanau, Dzmitry
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2167 - 2189
  • [37] GPT-RE: In-context Learning for Relation Extraction using Large Language Models
    Wan, Zhen
    Cheng, Fei
    Mao, Zhuoyuan
    Liu, Qianying
    Song, Haiyue
    Li, Jiwei
    Kurohashi, Sadao
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3534 - 3547
  • [38] Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
    Wenhao Wu
    Zhun Sun
    Yuxin Song
    Jingdong Wang
    Wanli Ouyang
    International Journal of Computer Vision, 2024, 132 (2) : 392 - 409
  • [39] Evaluating Vision-Language Models in Visual Comprehension for Autonomous Driving
    Zhou, Shanmin
    Li, Jialong
    Yamauchi, Takuto
    Cai, Jinyu
    Tei, Kenji
    2024 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE, SEAI 2024, 2024, : 205 - 209
  • [40] Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
    Wu, Wenhao
    Sun, Zhun
    Song, Yuxin
    Wang, Jingdong
    Ouyang, Wanli
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (02) : 392 - 409