Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

被引:0
|
作者
Wang, Xintong [1 ]
Pan, Jingheng [1 ]
Ding, Liang [2 ]
Biemann, Chris [1 ]
机构
[1] Univ Hamburg, Dept Informat, Hamburg, Germany
[2] Univ Sydney, Sydney, NSW, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Vision-Language Models (LVLMs) are increasingly adept at generating contextually detailed and coherent responses from visual inputs. However, their application in multimodal decision-making and open-ended generation is hindered by a notable rate of hallucinations, where generated text inaccurately represents the visual contents. To address this issue, this paper introduces the Instruction Contrastive Decoding (ICD) method, a novel approach designed to reduce hallucinations during LVLM inference. Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules. ICD contrasts distributions from standard and instruction disturbance, thereby increasing alignment uncertainty and effectively subtracting hallucinated concepts from the original distribution. Through comprehensive experiments on discriminative benchmarks (POPE and MME) and a generative benchmark (LLaVa-Bench), we demonstrate that ICD significantly mitigates both object-level and attribute-level hallucinations. Moreover, our method not only addresses hallucinations but also significantly enhances the general perception and recognition capabilities of LVLMs.
引用
收藏
页码:15840 / 15853
页数:14
相关论文
共 50 条
  • [11] IVTP: Instruction-Guided Visual Token Pruning for Large Vision-Language Models
    Huang, Kai
    Zou, Hao
    Xi, Ye
    Wang, BoChen
    Xie, Zhen
    Yu, Liang
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 214 - 230
  • [12] Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding
    Sennrich, Rico
    Vamvas, Jannis
    Mohammadshahi, Alireza
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 21 - 33
  • [13] Contrastive Decoding Reduces Hallucinations in Large Multilingual Machine Translation Models
    Waldendorf, Jonas
    Haddow, Barry
    Birch, Alexandra
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2526 - 2539
  • [14] Attention Prompting on Image for Large Vision-Language Models
    Yu, Runpeng
    Yu, Weihao
    Wang, Xinchao
    COMPUTER VISION - ECCV 2024, PT XXX, 2025, 15088 : 251 - 268
  • [15] Effectiveness assessment of recent large vision-language models
    Yao Jiang
    Xinyu Yan
    Ge-Peng Ji
    Keren Fu
    Meijun Sun
    Huan Xiong
    Deng-Ping Fan
    Fahad Shahbaz Khan
    Visual Intelligence, 2 (1):
  • [16] Evaluating Attribute Comprehension in Large Vision-Language Models
    Zhang, Haiwen
    Yang, Zixi
    Liu, Yuanzhi
    Wang, Xinran
    He, Zheqi
    Liang, Kongming
    Ma, Zhanyu
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 98 - 113
  • [17] On Evaluating Adversarial Robustness of Large Vision-Language Models
    Zhao, Yunqing
    Pang, Tianyu
    Du, Chao
    Yang, Xiao
    Li, Chongxuan
    Cheung, Ngai-Man
    Lin, Min
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [18] Evaluating Object Hallucination in Large Vision-Language Models
    Li, Yifan
    Du, Yifan
    Zhou, Kun
    Wang, Jinpeng
    Zhao, Wayne Xin
    Wen, Ji-Rong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 292 - 305
  • [19] Detecting and Preventing Hallucinations in Large Vision Language Models
    Gunjal, Anisha
    Yin, Jihan
    Bas, Erhan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18135 - 18143
  • [20] Contrastive Region Guidance: Improving Grounding in Vision-Language Models Without Training
    Wan, David
    Cho, Jaemin
    Stengel-Eskin, Elias
    Bansal, Mohit
    COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 198 - 215