SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

被引：0

作者：

Manakul, Potsawee ^{[1
]}

Liusie, Adian ^{[1
]}

Gales, Mark J. F. ^{[1
]}

机构：

[1] Univ Cambridge, ALTA Inst, Dept Engn, Cambridge, England

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023) | 2023年

关键词：

AGREEMENT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-factual statements which can undermine trust in their output. Existing fact-checking approaches either require access to the output probability distribution (which may not be available for systems such as ChatGPT) or external databases that are interfaced via separate, often complex, modules. In this work, we propose "SelfCheckGPT", a simple sampling-based approach that can be used to fact-check the responses of black-box models in a zero-resource fashion, i.e. without an external database. SelfCheckGPT leverages the simple idea that if an LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another. We investigate this approach by using GPT-3 to generate passages about individuals from theWikiBio dataset, and manually annotate the factuality of the generated passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and factual sentences; and ii) rank passages in terms of factuality. We compare our approach to several baselines and show that our approach has considerably higher AUC-PR scores in sentence-level hallucination detection and higher correlation scores in passage-level factuality assessment compared to grey-box methods.

引用

页码：9004 / 9017

页数：14

共 50 条

[1] Hallucination Detection for Generative Large Language Models by Bayesian Sequential Estimation
Wang, Xiaohua
Yan, Yuliang
Huang, Longtao
Zheng, Xiaoqing
Huang, Xuanjing
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 15361 - 15371
[2] Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation
Zhang, Bo
Ma, Hui
Ding, Jian
Wang, Jian
Xu, Bo
Lin, Hongfei
INFORMATION FUSION, 2025, 118
[3] SqliGPT: Evaluating and Utilizing Large Language Models for Automated SQL Injection Black-Box Detection
Gui, Zhiwen
Wang, Enze
Deng, Binbin
Zhang, Mingyuan
Chen, Yitao
Wei, Shengfei
Xie, Wei
Wang, Baosheng
APPLIED SCIENCES-BASEL, 2024, 14 (16):
[4] Open Sesame! Universal Black-Box Jailbreaking of Large Language Models
Lapid, Raz
Langberg, Ron
Sipper, Moshe
APPLIED SCIENCES-BASEL, 2024, 14 (16):
[5] TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models
Xue, Jiaqi
Zheng, Mengxin
Hua, Ting
Shen, Yilin
Liu, Yepeng
Boloni, Ladislau
Lou, Qian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] 23 Security Risks in Black-Box Large Language Model Foundation Models
Mcgraw, Gary
Bonett, Richie
Figueroa, Harold
Mcmahon, Katie
COMPUTER, 2024, 57 (04) : 160 - 164
[7] InferDPT: Privacy-preserving Inference for Black-box Large Language Models
Tong, Meng
Chen, Kejiang
Zhang, Jie
Qi, Yuang
Zhang, Weiming
Yu, Nenghai
Zhang, Tianwei
Zhang, Zhikun
arXiv, 2023,
[8] Spoken Term Detection of Zero-Resource Language using Machine Learning
Ito, Akinori
Koizumi, Masatoshi
2018 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY (ICIIT 2018), 2018, : 45 - 49
[9] SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
Zhang, Jiaxin
Lie, Zhuohang
Das, Kamalika
Malin, Bradley
Kumar, Sricharan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15445 - 15458
[10] Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Cheng, Jiale
Liu, Xiao
Zheng, Kehan
Ke, Pei
Wang, Hongning
Dong, Yuxiao
Tang, Jie
Huang, Minlie
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3201 - 3219

← 1 2 3 4 5 →