Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

被引：0

作者：

Wu, Junfei ^{[1
,2
]}

Liu, Qiang ^{[1
,2
]}

Wang, Ding ^{[1
,2
]}

Zhang, Jinghao ^{[1
,2
]}

Wu, Shu ^{[1
,2
]}

Wang, Liang ^{[1
,2
]}

Tan, Tieniu ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, New Lab Pattern Recognit NLPR, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] Nanjing Univ, Nanjing, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational resources or depend on the detection result of external models. However, there remains an under-explored field to utilize the LVLM itself to alleviate object hallucinations. In this work, we adopt the intuition that the LVLM tends to respond logically consistently for existent objects but inconsistently for hallucinated objects. Therefore, we propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency probing to raise questions with logical correlations, inquiring about attributes from objects and vice versa. Whether their responses can form a logical closed loop serves as an indicator of object hallucination. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs. Comprehensive experiments conducted on three benchmarks across four LVLMs have demonstrated significant improvements brought by our method, indicating its effectiveness and generality(1).

引用

页码：6944 / 6962

页数：19

共 50 条

[1] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Leng, Sicong
Zhang, Hang
Chen, Guanzheng
Li, Xin
Lug, Shijian
Miao, Chunyan
Bing, Lidong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13872 - 13882
[2] Evaluating Object Hallucination in Large Vision-Language Models
Li, Yifan
Du, Yifan
Zhou, Kun
Wang, Jinpeng
Zhao, Wayne Xin
Wen, Ji-Rong
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 292 - 305
[3] Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
Wang, Xintong
Pan, Jingheng
Ding, Liang
Biemann, Chris
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15840 - 15853
[4] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Zhang, Jinrui
Wang, Teng
Zhang, Haigang
Lu, Ping
Zheng, Feng
COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 196 - 213
[5] Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Kim, Minchan
Kim, Minyeong
Bae, Junik
Choi, Suhwan
Kim, Sungkyung
Change, Buru
COMPUTER VISION - ECCV 2024, PT LXXXVI, 2025, 15144 : 236 - 252
[6] Attention Prompting on Image for Large Vision-Language Models
Yu, Runpeng
Yu, Weihao
Wang, Xinchao
COMPUTER VISION - ECCV 2024, PT XXX, 2025, 15088 : 251 - 268
[7] Effectiveness assessment of recent large vision-language models
Yao Jiang
Xinyu Yan
Ge-Peng Ji
Keren Fu
Meijun Sun
Huan Xiong
Deng-Ping Fan
Fahad Shahbaz Khan
Visual Intelligence, 2 (1):
[8] Evaluating Attribute Comprehension in Large Vision-Language Models
Zhang, Haiwen
Yang, Zixi
Liu, Yuanzhi
Wang, Xinran
He, Zheqi
Liang, Kongming
Ma, Zhanyu
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 98 - 113
[9] On Evaluating Adversarial Robustness of Large Vision-Language Models
Zhao, Yunqing
Pang, Tianyu
Du, Chao
Yang, Xiao
Li, Chongxuan
Cheung, Ngai-Man
Lin, Min
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] Detecting and Preventing Hallucinations in Large Vision Language Models
Gunjal, Anisha
Yin, Jihan
Bas, Erhan
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18135 - 18143

← 1 2 3 4 5 →