Object Hallucination Detection in Large Vision Language Models via Evidential Conflict

被引:0
|
作者
Liu, Zhekun [1 ,2 ]
Huang, Tao [1 ,2 ]
Wang, Rui [1 ,2 ]
Jing, Liping [1 ,2 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing, Peoples R China
[2] Beijing Key Lab Traff Data Anal & Min, Beijing, Peoples R China
关键词
LVLM; object hallucination; uncertainty quantification; Dempster-Shafer theory;
D O I
10.1007/978-3-031-67977-3_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite their remarkable ability to understand both textual and visual data, large vision-language models (LVLMs) still face issues with hallucination. This is particularly presented as the object hallucination, where the models inaccurately describe objects in the images. Current efforts mainly focus on detecting such erroneous behaviors through the semantic consistency of outputs via multiple inferences or by evaluating the entropy-based uncertainty of predictions. However, the former is resource-intensive, while the latter is often considered a less precise measure due to generally recognized overconfident predictions. To address the issue, we propose an object hallucination detection method based on evidential conflict. To be specific, we view the features in the last layer of the transformer decoder as evidence. Then, we combine the evidence based on Dempster's rule, following the approach presented in the work [6]. Hence, this enables us to detect hallucinations by evaluating the conflict among evidence. Preliminary experiments were conducted on a state-of-the-art LVLM, mPLUG-Owl2. Results show that our approach exhibits an enhancement over baseline methods, particularly in cases with highly uncertain inputs.
引用
收藏
页码:58 / 67
页数:10
相关论文
共 50 条
  • [1] Evaluating Object Hallucination in Large Vision-Language Models
    Li, Yifan
    Du, Yifan
    Zhou, Kun
    Wang, Jinpeng
    Zhao, Wayne Xin
    Wen, Ji-Rong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 292 - 305
  • [2] THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
    Kaul, Prannay
    Li, Zhizhong
    Yang, Hao
    Dukler, Yonatan
    Swaminathan, Ashwin
    Taylor, C. J.
    Soatto, Stefano
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27218 - +
  • [3] Hallucination Detection for Generative Large Language Models by Bayesian Sequential Estimation
    Wang, Xiaohua
    Yan, Yuliang
    Huang, Longtao
    Zheng, Xiaoqing
    Huang, Xuanjing
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 15361 - 15371
  • [4] Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
    Chen, Yuyan
    Fu, Qiang
    Yuan, Yichen
    Wen, Zhihao
    Fan, Ge
    Liu, Dayiheng
    Zhang, Dongmei
    Li, Zhixu
    Xiao, Yanghua
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 245 - 255
  • [5] VisDiaHalBench: A Visual Dialogue Benchmark For Diagnosing Hallucination in Large Vision-Language Models
    Cao, Qingxing
    Cheng, Junhao
    Liang, Xiaodan
    Lin, Liang
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 12161 - 12176
  • [6] Towards Mitigating Hallucination in Large Language Models via Self-Reflection
    Ji, Ziwei
    Yu, Tiezheng
    Xu, Yan
    Lee, Nayeon
    Ishii, Etsuko
    Fung, Pascale
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 1827 - 1843
  • [7] HILL: A Hallucination Identifier for Large Language Models
    Leiser, Florian
    Eckhardt, Sven
    Leuthe, Valentin
    Knaeble, Merlin
    Maedche, Alexander
    Schwabe, Gerhard
    Sunyaev, Ali
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
  • [8] Exploiting Unlabeled Data with Vision and Language Models for Object Detection
    Zhao, Shiyu
    Zhang, Zhixing
    Schulter, Samuel
    Zhao, Long
    Kumar, B. G. Vijay
    Stathopoulos, Anastasis
    Chandraker, Manmohan
    Metaxas, Dimitris N.
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 159 - 175
  • [9] Contextual Object Detection with Multimodal Large Language Models
    Zang, Yuhang
    Li, Wei
    Han, Jun
    Zhou, Kaiyang
    Loy, Chen Change
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 825 - 843
  • [10] Mitigating spatial hallucination in large language models for path planning via prompt engineering
    Zhang, Hongjie
    Deng, Hourui
    Ou, Jie
    Feng, Chaosheng
    SCIENTIFIC REPORTS, 2025, 15 (01):