Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

被引:0
|
作者
Wu, Junfei [1 ,2 ]
Liu, Qiang [1 ,2 ]
Wang, Ding [1 ,2 ]
Zhang, Jinghao [1 ,2 ]
Wu, Shu [1 ,2 ]
Wang, Liang [1 ,2 ]
Tan, Tieniu [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, New Lab Pattern Recognit NLPR, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Nanjing Univ, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational resources or depend on the detection result of external models. However, there remains an under-explored field to utilize the LVLM itself to alleviate object hallucinations. In this work, we adopt the intuition that the LVLM tends to respond logically consistently for existent objects but inconsistently for hallucinated objects. Therefore, we propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency probing to raise questions with logical correlations, inquiring about attributes from objects and vice versa. Whether their responses can form a logical closed loop serves as an indicator of object hallucination. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs. Comprehensive experiments conducted on three benchmarks across four LVLMs have demonstrated significant improvements brought by our method, indicating its effectiveness and generality(1).
引用
收藏
页码:6944 / 6962
页数:19
相关论文
共 50 条
  • [21] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [22] The Neglected Tails in Vision-Language Models
    Parashar, Shubham
    Lin, Zhiqiu
    Liu, Tian
    Dong, Xiangjue
    Li, Yanan
    Ramanan, Deva
    Caverlee, James
    Kong, Shu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12988 - 12997
  • [23] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
    Du, Yuqing
    Konyushkova, Ksenia
    Denil, Misha
    Raju, Akhil
    Landon, Jessica
    Hill, Felix
    de Freitas, Nando
    Cabi, Serkan
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136
  • [24] THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
    Kaul, Prannay
    Li, Zhizhong
    Yang, Hao
    Dukler, Yonatan
    Swaminathan, Ashwin
    Taylor, C. J.
    Soatto, Stefano
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27218 - +
  • [25] Correctable Landmark Discovery via Large Models for Vision-Language Navigation
    Lin, Bingqian
    Nie, Yunshuang
    Wei, Ziming
    Zhu, Yi
    Xu, Hang
    Ma, Shikui
    Liu, Jianzhuang
    Liang, Xiaodan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8534 - 8548
  • [26] Unveiling Vulnerabilities in Large Vision-Language Models: The SAVJ Jailbreak Approach
    Zhang, Gang
    Fan, Xiaowei
    Fang, Jingquan
    Sun, Yanna
    Shi, Xiayang
    Lu, Chunyang
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 417 - 434
  • [27] AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
    Gu, Zhaopeng
    Zhu, Bingke
    Zhu, Guibo
    Chen, Yingying
    Tang, Ming
    Wang, Jinqiao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 3, 2024, : 1932 - 1940
  • [28] Debiasing vision-language models for vision tasks: a survey
    Zhu, Beier
    Zhang, Hanwang
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (01)
  • [29] Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models
    Zheng, Kecheng
    Wu, Wei
    Feng, Ruili
    Zhu, Kai
    Liu, Jiawei
    Zhao, Deli
    Zha, Zheng-Jun
    Chen, Wei
    Shen, Yujun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11629 - 11639
  • [30] MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
    Wang, Kuo
    Cheng, Lechao
    Chen, Weikai
    Zhang, Pingping
    Lin, Liang
    Zhou, Fan
    Li, Guanbin
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 106 - 122