SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

被引：0

作者：

Manakul, Potsawee ^{[1
]}

Liusie, Adian ^{[1
]}

Gales, Mark J. F. ^{[1
]}

机构：

[1] Univ Cambridge, ALTA Inst, Dept Engn, Cambridge, England

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023) | 2023年

关键词：

AGREEMENT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-factual statements which can undermine trust in their output. Existing fact-checking approaches either require access to the output probability distribution (which may not be available for systems such as ChatGPT) or external databases that are interfaced via separate, often complex, modules. In this work, we propose "SelfCheckGPT", a simple sampling-based approach that can be used to fact-check the responses of black-box models in a zero-resource fashion, i.e. without an external database. SelfCheckGPT leverages the simple idea that if an LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another. We investigate this approach by using GPT-3 to generate passages about individuals from theWikiBio dataset, and manually annotate the factuality of the generated passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and factual sentences; and ii) rank passages in terms of factuality. We compare our approach to several baselines and show that our approach has considerably higher AUC-PR scores in sentence-level hallucination detection and higher correlation scores in passage-level factuality assessment compared to grey-box methods.

引用

页码：9004 / 9017

页数：14

共 50 条

[21] Robustness of generative AI detection: adversarial attacks on black-box neural text detectors
Vitalii Fishchuk
Daniel Braun
International Journal of Speech Technology, 2024, 27 (4) : 861 - 874
[22] Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation
Guo, Zixian
Wei, Yuxiang
Liu, Ming
Ji, Zhilong
Bai, Jinfeng
Guo, Yiwen
Zuo, Wangmeng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5356 - 5368
[23] Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
Chen, Yuyan
Fu, Qiang
Yuan, Yichen
Wen, Zhihao
Fan, Ge
Liu, Dayiheng
Zhang, Dongmei
Li, Zhixu
Xiao, Yanghua
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 245 - 255
[24] Explicitly Constrained Black-Box Optimization With Disconnected Feasible Domains Using Deep Generative Models
Sakamoto, Naoki
Sato, Rei
Fukuchi, Kazuto
Sakuma, Jun
Akimoto, Youhei
IEEE ACCESS, 2022, 10 : 117501 - 117514
[25] CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models
Ormazabal, Aitor
Artetxe, Mikel
Agirre, Eneko
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2961 - 2974
[26] Jailbreaking Black Box Large Language Models in Twenty Queries
Chao, Patrick
Robey, Alexander
Dobriban, Edgar
Hassani, Hamed
Pappas, George J.
Wong, Eric
arXiv, 2023,
[27] On the black-box explainability of object detection models for safe and trustworthy industrial applications
Andres, Alain
Martinez-Seras, Aitor
Lana, Ibai
Del Ser, Javier
RESULTS IN ENGINEERING, 2024, 24
[28] PLENARY: Explaining black-box models in natural language through fuzzy linguistic summaries
Kaczmarek-Majer, Katarzyna
Casalino, Gabriella
Castellano, Giovanna
Dominiak, Monika
Hryniewicz, Olgierd
Kaminska, Olga
Vessio, Gennaro
Diaz-Rodriguez, Natalia
INFORMATION SCIENCES, 2022, 614 : 374 - 399
[29] Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Zhan, Zhihan
Wang, Shuohang
Yu, Wenhao
Xu, Yichong
Iter, Dan
Zeng, Qingkai
Liu, Yang
Zhu, Chenguang
Jiang, Meng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9850 - 9867
[30] Object Hallucination Detection in Large Vision Language Models via Evidential Conflict
Liu, Zhekun
Huang, Tao
Wang, Rui
Jing, Liping
BELIEF FUNCTIONS: THEORY AND APPLICATIONS, BELIEF 2024, 2024, 14909 : 58 - 67

← 1 2 3 4 5 →