SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

被引:0
|
作者
Manakul, Potsawee [1 ]
Liusie, Adian [1 ]
Gales, Mark J. F. [1 ]
机构
[1] Univ Cambridge, ALTA Inst, Dept Engn, Cambridge, England
关键词
AGREEMENT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-factual statements which can undermine trust in their output. Existing fact-checking approaches either require access to the output probability distribution (which may not be available for systems such as ChatGPT) or external databases that are interfaced via separate, often complex, modules. In this work, we propose "SelfCheckGPT", a simple sampling-based approach that can be used to fact-check the responses of black-box models in a zero-resource fashion, i.e. without an external database. SelfCheckGPT leverages the simple idea that if an LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another. We investigate this approach by using GPT-3 to generate passages about individuals from theWikiBio dataset, and manually annotate the factuality of the generated passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and factual sentences; and ii) rank passages in terms of factuality. We compare our approach to several baselines and show that our approach has considerably higher AUC-PR scores in sentence-level hallucination detection and higher correlation scores in passage-level factuality assessment compared to grey-box methods.
引用
收藏
页码:9004 / 9017
页数:14
相关论文
共 50 条
  • [1] Hallucination Detection for Generative Large Language Models by Bayesian Sequential Estimation
    Wang, Xiaohua
    Yan, Yuliang
    Huang, Longtao
    Zheng, Xiaoqing
    Huang, Xuanjing
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 15361 - 15371
  • [2] Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation
    Zhang, Bo
    Ma, Hui
    Ding, Jian
    Wang, Jian
    Xu, Bo
    Lin, Hongfei
    INFORMATION FUSION, 2025, 118
  • [3] SqliGPT: Evaluating and Utilizing Large Language Models for Automated SQL Injection Black-Box Detection
    Gui, Zhiwen
    Wang, Enze
    Deng, Binbin
    Zhang, Mingyuan
    Chen, Yitao
    Wei, Shengfei
    Xie, Wei
    Wang, Baosheng
    APPLIED SCIENCES-BASEL, 2024, 14 (16):
  • [4] Open Sesame! Universal Black-Box Jailbreaking of Large Language Models
    Lapid, Raz
    Langberg, Ron
    Sipper, Moshe
    APPLIED SCIENCES-BASEL, 2024, 14 (16):
  • [5] TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models
    Xue, Jiaqi
    Zheng, Mengxin
    Hua, Ting
    Shen, Yilin
    Liu, Yepeng
    Boloni, Ladislau
    Lou, Qian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] 23 Security Risks in Black-Box Large Language Model Foundation Models
    Mcgraw, Gary
    Bonett, Richie
    Figueroa, Harold
    Mcmahon, Katie
    COMPUTER, 2024, 57 (04) : 160 - 164
  • [7] InferDPT: Privacy-preserving Inference for Black-box Large Language Models
    Tong, Meng
    Chen, Kejiang
    Zhang, Jie
    Qi, Yuang
    Zhang, Weiming
    Yu, Nenghai
    Zhang, Tianwei
    Zhang, Zhikun
    arXiv, 2023,
  • [8] Spoken Term Detection of Zero-Resource Language using Machine Learning
    Ito, Akinori
    Koizumi, Masatoshi
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY (ICIIT 2018), 2018, : 45 - 49
  • [9] SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
    Zhang, Jiaxin
    Lie, Zhuohang
    Das, Kamalika
    Malin, Bradley
    Kumar, Sricharan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15445 - 15458
  • [10] Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
    Cheng, Jiale
    Liu, Xiao
    Zheng, Kehan
    Ke, Pei
    Wang, Hongning
    Dong, Yuxiao
    Tang, Jie
    Huang, Minlie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3201 - 3219