Detecting implicit biases of large language models with Bayesian hypothesis testingDetecting Implicit Biases of Large Language Models...S. Si et al.

被引:0
|
作者
Shijing Si [1 ]
Xiaoming Jiang [2 ]
Qinliang Su [6 ]
Lawrence Carin [3 ]
机构
[1] Shanghai International Studies University,School of Economics and Finance
[2] Shanghai International Studies University,Institute of Language Sciences
[3] Sun Yat-sen University,School of Computer Science and Engineering
[4] Guangdong Key Laboratory of Big Data Analysis and Processing,Department of Electronic and Computer Engineering
[5] Duke University, Key Laboratory of Language Sciences and Multilingual Intelligence Applications
[6] Shanghai International Studies University,undefined
关键词
Large language models; Group bias; Fairness; Bayes factor;
D O I
10.1038/s41598-025-95825-x
中图分类号
学科分类号
摘要
Despite the remarkable performance of large language models (LLMs), such as generative pre-trained Transformers (GPTs), across various tasks, they often perpetuate social biases and stereotypes embedded in their training data. In this paper, we introduce a novel framework that reformulates bias detection in LLMs as a hypothesis testing problem, where the null hypothesis \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} represents the absence of implicit bias. Our framework leverages binary-choice questions to measure social bias in both open-source and proprietary LLMs accessible via APIs. We demonstrate the flexibility of our approach by integrating classical statistical methods, such as the exact binomial test, with Bayesian inference using Bayes factors for bias detection and quantification. Extensive experiments are conducted on prominent models, including ChatGPT (GPT-3.5-Turbo), DeepSeek-V3, and Llama-3.1-70B, utilizing publicly available datasets such as BBQ, CrowS-Pairs (in both English and French), and Winogender. While the exact Binomial test fails to distinguish between no evidence of bias and evidence of no bias, our results underscore the advantages of Bayes factors, particularly their capacity to quantify evidence for both competing hypotheses and their robustness to small sample size. Additionally, our experiments reveal that the bias behavior of LLMs is largely consistent across the English and French versions of the CrowS-Pairs dataset, with subtle differences likely arising from variations in social norms across linguistic and cultural contexts.
引用
收藏
相关论文
共 50 条
  • [31] The 'Implicit Intelligence' of artificial intelligence. Investigating the potential of large language models in social science research
    Cappelli, Ottorino
    Aliberti, Marco
    Praino, Rodrigo
    POLITICAL RESEARCH EXCHANGE, 2024, 6 (01):
  • [32] Large language models based vulnerability detection: How does it enhance performance?Large language models based vulnerability detection: How does it enhance performance?C.D. Xuan et al.
    Cho Do Xuan
    Dat Bui Quang
    Vinh Dang Quang
    International Journal of Information Security, 2025, 24 (1)
  • [33] Large pre-trained language models contain human-like biases of what is right and wrong to do
    Patrick Schramowski
    Cigdem Turan
    Nico Andersen
    Constantin A. Rothkopf
    Kristian Kersting
    Nature Machine Intelligence, 2022, 4 : 258 - 268
  • [34] Do Large Language Models Show Human-like Biases? Exploring Confidence-Competence Gap in AI
    Singh, Aniket Kumar
    Lamichhane, Bishal
    Devkota, Suman
    Dhakal, Uttam
    Dhakal, Chandra
    INFORMATION, 2024, 15 (02)
  • [35] Large pre-trained language models contain human-like biases of what is right and wrong to do
    Schramowski, Patrick
    Turan, Cigdem
    Andersen, Nico
    Rothkopf, Constantin A.
    Kersting, Kristian
    NATURE MACHINE INTELLIGENCE, 2022, 4 (03) : 258 - +
  • [36] Conceptual Combination in Large Language Models: Uncovering Implicit Relational Interpretations in Compound Words With Contextualized Word Embeddings
    Ciapparelli, Marco
    Zarbo, Calogero
    Marelli, Marco
    COGNITIVE SCIENCE, 2025, 49 (03)
  • [37] An Implicit Semantic Enhanced Fine-Grained Fake News Detection Method Based on Large Language Models
    Jing K.
    Zheyong X.
    Tong X.
    Yuhao C.
    Xiangwen L.
    Enhong C.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1250 - 1260
  • [38] Unveiling the Implicit Toxicity in Large Language Models Warning: This paper discusses and contains content that can be offensive or upsetting.
    Wen, Jiaxin
    Ke, Pei
    Sun, Hao
    Zhang, Zhexin
    Li, Chengfei
    Bai, Jinfeng
    Huang, Minlie
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1322 - 1338
  • [39] Mitigating Cognitive Biases in Clinical Decision-Making Through Multi-Agent Conversations Using Large Language Models: Simulation Study
    Ke, Yuhe
    Yang, Rui
    Lie, Sui An
    Lim, Taylor Xin Yi
    Ning, Yilin
    Li, Irene
    Abdullah, Hairil Rizal
    Ting, Daniel Shu Wei
    Liu, Nan
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [40] Detecting neuropsychiatric fluctuations in Parkinson’s Disease using patients’ own words: the potential of large language models
    Matilde Castelli
    Mario Sousa
    Illner Vojtech
    Michael Single
    Deborah Amstutz
    Marie Elise Maradan-Gachet
    Andreia D. Magalhães
    Ines Debove
    Jan Rusz
    Pablo Martinez-Martin
    Raphael Sznitman
    Paul Krack
    Tobias Nef
    npj Parkinson's Disease, 11 (1)