Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

被引:0
|
作者
Wang, Xintong [1 ]
Pan, Jingheng [1 ]
Ding, Liang [2 ]
Biemann, Chris [1 ]
机构
[1] Univ Hamburg, Dept Informat, Hamburg, Germany
[2] Univ Sydney, Sydney, NSW, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Vision-Language Models (LVLMs) are increasingly adept at generating contextually detailed and coherent responses from visual inputs. However, their application in multimodal decision-making and open-ended generation is hindered by a notable rate of hallucinations, where generated text inaccurately represents the visual contents. To address this issue, this paper introduces the Instruction Contrastive Decoding (ICD) method, a novel approach designed to reduce hallucinations during LVLM inference. Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules. ICD contrasts distributions from standard and instruction disturbance, thereby increasing alignment uncertainty and effectively subtracting hallucinated concepts from the original distribution. Through comprehensive experiments on discriminative benchmarks (POPE and MME) and a generative benchmark (LLaVa-Bench), we demonstrate that ICD significantly mitigates both object-level and attribute-level hallucinations. Moreover, our method not only addresses hallucinations but also significantly enhances the general perception and recognition capabilities of LVLMs.
引用
收藏
页码:15840 / 15853
页数:14
相关论文
共 50 条
  • [21] Visual In-Context Learning for Large Vision-Language Models
    Zhou, Yucheng
    Le, Xiang
    Wang, Qianning
    Shen, Jianbing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902
  • [22] Learning the Visualness of Text Using Large Vision-Language Models
    Verma, Gaurav
    Rossi, Ryan A.
    Tensmeyer, Christopher
    Gu, Jiuxiang
    Nenkova, Ani
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2394 - 2408
  • [23] CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
    Ouali, Yassine
    Bulat, Adrian
    Martinez, Brais
    Tzimiropoulos, Georgios
    COMPUTER VISION - ECCV 2024, PT LXXVI, 2025, 15134 : 395 - 413
  • [24] InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
    Dai, Wenliang
    Li, Junnan
    Li, Dongxu
    Tiong, Anthony Meng Huat
    Zhao, Junqi
    Wang, Weisheng
    Li, Boyang
    Fung, Pascale
    Hoi, Steven
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [25] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
  • [26] JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
    Jin, Haibo
    Hu, Leyang
    Li, Xinnuo
    Zhang, Peiyan
    Chen, Chonghan
    Zhuang, Jun
    Wang, Haohan
    arXiv,
  • [27] Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models
    Groot, Tobias
    Valdenegro-Toro, Matias
    arXiv,
  • [28] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [29] Vision-Language Models for Biomedical Applications
    Thapa, Surendrabikram
    Naseem, Usman
    Zhou, Luping
    Kim, Jinman
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON VISION-LANGUAGE MODELS FOR BIOMEDICAL APPLICATIONS, VLM4BIO 2024, 2024, : 1 - 2
  • [30] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    International Journal of Computer Vision, 2022, 130 : 2337 - 2348