SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

被引:0
|
作者
Manakul, Potsawee [1 ]
Liusie, Adian [1 ]
Gales, Mark J. F. [1 ]
机构
[1] Univ Cambridge, ALTA Inst, Dept Engn, Cambridge, England
关键词
AGREEMENT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-factual statements which can undermine trust in their output. Existing fact-checking approaches either require access to the output probability distribution (which may not be available for systems such as ChatGPT) or external databases that are interfaced via separate, often complex, modules. In this work, we propose "SelfCheckGPT", a simple sampling-based approach that can be used to fact-check the responses of black-box models in a zero-resource fashion, i.e. without an external database. SelfCheckGPT leverages the simple idea that if an LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another. We investigate this approach by using GPT-3 to generate passages about individuals from theWikiBio dataset, and manually annotate the factuality of the generated passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and factual sentences; and ii) rank passages in terms of factuality. We compare our approach to several baselines and show that our approach has considerably higher AUC-PR scores in sentence-level hallucination detection and higher correlation scores in passage-level factuality assessment compared to grey-box methods.
引用
收藏
页码:9004 / 9017
页数:14
相关论文
共 50 条
  • [21] Robustness of generative AI detection: adversarial attacks on black-box neural text detectors
    Vitalii Fishchuk
    Daniel Braun
    International Journal of Speech Technology, 2024, 27 (4) : 861 - 874
  • [22] Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation
    Guo, Zixian
    Wei, Yuxiang
    Liu, Ming
    Ji, Zhilong
    Bai, Jinfeng
    Guo, Yiwen
    Zuo, Wangmeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5356 - 5368
  • [23] Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
    Chen, Yuyan
    Fu, Qiang
    Yuan, Yichen
    Wen, Zhihao
    Fan, Ge
    Liu, Dayiheng
    Zhang, Dongmei
    Li, Zhixu
    Xiao, Yanghua
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 245 - 255
  • [24] Explicitly Constrained Black-Box Optimization With Disconnected Feasible Domains Using Deep Generative Models
    Sakamoto, Naoki
    Sato, Rei
    Fukuchi, Kazuto
    Sakuma, Jun
    Akimoto, Youhei
    IEEE ACCESS, 2022, 10 : 117501 - 117514
  • [25] CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models
    Ormazabal, Aitor
    Artetxe, Mikel
    Agirre, Eneko
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2961 - 2974
  • [26] Jailbreaking Black Box Large Language Models in Twenty Queries
    Chao, Patrick
    Robey, Alexander
    Dobriban, Edgar
    Hassani, Hamed
    Pappas, George J.
    Wong, Eric
    arXiv, 2023,
  • [27] On the black-box explainability of object detection models for safe and trustworthy industrial applications
    Andres, Alain
    Martinez-Seras, Aitor
    Lana, Ibai
    Del Ser, Javier
    RESULTS IN ENGINEERING, 2024, 24
  • [28] PLENARY: Explaining black-box models in natural language through fuzzy linguistic summaries
    Kaczmarek-Majer, Katarzyna
    Casalino, Gabriella
    Castellano, Giovanna
    Dominiak, Monika
    Hryniewicz, Olgierd
    Kaminska, Olga
    Vessio, Gennaro
    Diaz-Rodriguez, Natalia
    INFORMATION SCIENCES, 2022, 614 : 374 - 399
  • [29] Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
    Zhan, Zhihan
    Wang, Shuohang
    Yu, Wenhao
    Xu, Yichong
    Iter, Dan
    Zeng, Qingkai
    Liu, Yang
    Zhu, Chenguang
    Jiang, Meng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9850 - 9867
  • [30] Object Hallucination Detection in Large Vision Language Models via Evidential Conflict
    Liu, Zhekun
    Huang, Tao
    Wang, Rui
    Jing, Liping
    BELIEF FUNCTIONS: THEORY AND APPLICATIONS, BELIEF 2024, 2024, 14909 : 58 - 67