Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

被引：0

作者：

Wu, Junfei ^{[1
,2
]}

Liu, Qiang ^{[1
,2
]}

Wang, Ding ^{[1
,2
]}

Zhang, Jinghao ^{[1
,2
]}

Wu, Shu ^{[1
,2
]}

Wang, Liang ^{[1
,2
]}

Tan, Tieniu ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, New Lab Pattern Recognit NLPR, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] Nanjing Univ, Nanjing, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational resources or depend on the detection result of external models. However, there remains an under-explored field to utilize the LVLM itself to alleviate object hallucinations. In this work, we adopt the intuition that the LVLM tends to respond logically consistently for existent objects but inconsistently for hallucinated objects. Therefore, we propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency probing to raise questions with logical correlations, inquiring about attributes from objects and vice versa. Whether their responses can form a logical closed loop serves as an indicator of object hallucination. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs. Comprehensive experiments conducted on three benchmarks across four LVLMs have demonstrated significant improvements brought by our method, indicating its effectiveness and generality(1).

引用

页码：6944 / 6962

页数：19

共 50 条

[31] Divert More Attention to Vision-Language Object Tracking
Guo, Mingzhe
Zhang, Zhipeng
Jing, Liping
Ling, Haibin
Fan, Heng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8600 - 8618
[32] Multimodal Features Alignment for Vision-Language Object Tracking
Ye, Ping
Xiao, Gang
Liu, Jun
REMOTE SENSING, 2024, 16 (07)
[33] Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
Cheng, Hao
Xiao, Erjia
Gu, Jindong
Yang, Le
Duan, Jinhao
Zhang, Jize
Cao, Jiahang
Xu, Kaidi
Xu, Renjing
COMPUTER VISION - ECCV 2024, PT LIX, 2025, 15117 : 179 - 196
[34] Robust Calibration of Large Vision-Language Adapters
Murugesan, Balamurali
Silva-Rodriguez, Julio
Ben Ayed, Ismail
Dolz, Jose
COMPUTER VISION - ECCV 2024, PT XXIV, 2025, 15082 : 147 - 165
[35] LVLM-EHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Xu, Peng
Shao, Wenqi
Zhang, Kaipeng
Gao, Peng
Liu, Shuo
Lei, Meng
Meng, Fanqing
Huang, Siyuan
Qiao, Yu
Luo, Ping
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 1877 - 1893
[36] A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
Iguez, Julio Silva-Rodr
Hajimiri, Sina
Ben Ayed, Ismail
Dolz, Jose
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23681 - 23690
[37] UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
Li, Chuanhao
Li, Zhen
Jing, Chenchen
Liu, Shuo
Shao, Wenqi
Wu, Yuwei
Luo, Ping
Qiao, Yu
Zhang, Kaipeng
arXiv,
[38] White-box Multimodal Jailbreaks Against Large Vision-Language Models
Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University, Shanghai, China
不详
不详
MM - Proc. ACM Int. Conf. Multimed., (6920-6928):
[39] Unsupervised Prototype Adapter for Vision-Language Models
Zhang, Yi
Zhang, Ce
Hu, Xueting
He, Zhihai
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 197 - 209
[40] Conditional Prompt Learning for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16795 - 16804

← 1 2 3 4 5 →