Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

被引:0
|
作者
Wang, Xintong [1 ]
Pan, Jingheng [1 ]
Ding, Liang [2 ]
Biemann, Chris [1 ]
机构
[1] Univ Hamburg, Dept Informat, Hamburg, Germany
[2] Univ Sydney, Sydney, NSW, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Vision-Language Models (LVLMs) are increasingly adept at generating contextually detailed and coherent responses from visual inputs. However, their application in multimodal decision-making and open-ended generation is hindered by a notable rate of hallucinations, where generated text inaccurately represents the visual contents. To address this issue, this paper introduces the Instruction Contrastive Decoding (ICD) method, a novel approach designed to reduce hallucinations during LVLM inference. Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules. ICD contrasts distributions from standard and instruction disturbance, thereby increasing alignment uncertainty and effectively subtracting hallucinated concepts from the original distribution. Through comprehensive experiments on discriminative benchmarks (POPE and MME) and a generative benchmark (LLaVa-Bench), we demonstrate that ICD significantly mitigates both object-level and attribute-level hallucinations. Moreover, our method not only addresses hallucinations but also significantly enhances the general perception and recognition capabilities of LVLMs.
引用
收藏
页码:15840 / 15853
页数:14
相关论文
共 50 条
  • [1] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
    Leng, Sicong
    Zhang, Hang
    Chen, Guanzheng
    Li, Xin
    Lug, Shijian
    Miao, Chunyan
    Bing, Lidong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13872 - 13882
  • [2] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
    Zhang, Jinrui
    Wang, Teng
    Zhang, Haigang
    Lu, Ping
    Zheng, Feng
    COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 196 - 213
  • [3] Task Bias in Contrastive Vision-Language Models
    Menon, Sachit
    Chandratreya, Ishaan Preetam
    Vondrick, Carl
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (06) : 2026 - 2040
  • [4] Perceptual Grouping in Contrastive Vision-Language Models
    Ranasinghe, Kanchana
    McKinzie, Brandon
    Ravi, Sachin
    Yang, Yinfei
    Toshev, Alexander
    Shlens, Jonathon
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5548 - 5561
  • [5] Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
    Luo, Gen
    Zhou, Yiyi
    Ren, Tianhe
    Chen, Shengxin
    Sun, Xiaoshuai
    Ji, Rongrong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Contrastive Instruction-Trajectory Learning for Vision-Language Navigation
    Liang, Xiwen
    Zhu, Fengda
    Zhu, Yi
    Lin, Bingqian
    Wang, Bing
    Liang, Xiaodan
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1592 - 1600
  • [7] Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models
    Wu, Junfei
    Liu, Qiang
    Wang, Ding
    Zhang, Jinghao
    Wu, Shu
    Wang, Liang
    Tan, Tieniu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 6944 - 6962
  • [8] Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
    Kim, Minchan
    Kim, Minyeong
    Bae, Junik
    Choi, Suhwan
    Kim, Sungkyung
    Change, Buru
    COMPUTER VISION - ECCV 2024, PT LXXXVI, 2025, 15144 : 236 - 252
  • [9] Text encoders bottleneck compositionality in contrastive vision-language models
    Kamath, Amita
    Hessel, Jack
    Chang, Kai-Wei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4933 - 4944
  • [10] Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
    Ma, Chengcheng
    Liu, Yang
    Deng, Jiankang
    Xie, Lingxi
    Dong, Weiming
    Xu, Changsheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4616 - 4629