Detecting implicit biases of large language models with Bayesian hypothesis testingDetecting Implicit Biases of Large Language Models...S. Si et al.

被引:0
|
作者
Shijing Si [1 ]
Xiaoming Jiang [2 ]
Qinliang Su [6 ]
Lawrence Carin [3 ]
机构
[1] Shanghai International Studies University,School of Economics and Finance
[2] Shanghai International Studies University,Institute of Language Sciences
[3] Sun Yat-sen University,School of Computer Science and Engineering
[4] Guangdong Key Laboratory of Big Data Analysis and Processing,Department of Electronic and Computer Engineering
[5] Duke University, Key Laboratory of Language Sciences and Multilingual Intelligence Applications
[6] Shanghai International Studies University,undefined
关键词
Large language models; Group bias; Fairness; Bayes factor;
D O I
10.1038/s41598-025-95825-x
中图分类号
学科分类号
摘要
Despite the remarkable performance of large language models (LLMs), such as generative pre-trained Transformers (GPTs), across various tasks, they often perpetuate social biases and stereotypes embedded in their training data. In this paper, we introduce a novel framework that reformulates bias detection in LLMs as a hypothesis testing problem, where the null hypothesis \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} represents the absence of implicit bias. Our framework leverages binary-choice questions to measure social bias in both open-source and proprietary LLMs accessible via APIs. We demonstrate the flexibility of our approach by integrating classical statistical methods, such as the exact binomial test, with Bayesian inference using Bayes factors for bias detection and quantification. Extensive experiments are conducted on prominent models, including ChatGPT (GPT-3.5-Turbo), DeepSeek-V3, and Llama-3.1-70B, utilizing publicly available datasets such as BBQ, CrowS-Pairs (in both English and French), and Winogender. While the exact Binomial test fails to distinguish between no evidence of bias and evidence of no bias, our results underscore the advantages of Bayes factors, particularly their capacity to quantify evidence for both competing hypotheses and their robustness to small sample size. Additionally, our experiments reveal that the bias behavior of LLMs is largely consistent across the English and French versions of the CrowS-Pairs dataset, with subtle differences likely arising from variations in social norms across linguistic and cultural contexts.
引用
收藏
相关论文
共 50 条
  • [1] A Review on the Use of Large Language Models as Virtual TutorsA Review on the Use of Large Language Models...S. García-Méndez et al.
    Silvia García-Méndez
    Francisco de Arriba-Pérez
    María del Carmen Somoza-López
    Science & Education, 2025, 34 (2) : 877 - 892
  • [2] Benchmarking Cognitive Biases in Large Language Models as Evaluators
    Koo, Ryan
    Lee, Minhwa
    Raheja, Vipul
    Park, Jongin
    Kim, Zae Myung
    Kang, Dongyeop
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 517 - 545
  • [3] Biases in Large Language Models: Origins, Inventory, and Discussion
    Navigli, Roberto
    Conia, Simone
    Ross, Bjorn
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2023, 15 (02):
  • [4] (Ir)rationality and cognitive biases in large language models
    Macmillan-Scott, Olivia
    Musolesi, Mirco
    ROYAL SOCIETY OPEN SCIENCE, 2024, 11 (06):
  • [5] Performance and biases of Large Language Models in public opinion simulation
    Qu, Yao
    Wang, Jue
    HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2024, 11 (01):
  • [6] Confirmation and Specificity Biases in Large Language Models: An Explorative Study
    O'Leary, Daniel E.
    IEEE INTELLIGENT SYSTEMS, 2025, 40 (01) : 63 - 68
  • [7] Welcome to the Era of ChatGPT et al. The Prospects of Large Language Models
    Teubner, Timm
    Flath, Christoph M.
    Weinhardt, Christof
    van der Aalst, Wil
    Hinz, Oliver
    BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2023, 65 (02) : 95 - 101
  • [8] A toolbox for surfacing health equity harms and biases in large language models
    Pfohl, Stephen R.
    Cole-Lewis, Heather
    Sayres, Rory
    Neal, Darlene
    Asiedu, Mercy
    Dieng, Awa
    Tomasev, Nenad
    Rashid, Qazi Mamunur
    Azizi, Shekoofeh
    Rostamzadeh, Negar
    Mccoy, Liam G.
    Celi, Leo Anthony
    Liu, Yun
    Schaekermann, Mike
    Walton, Alanna
    Parrish, Alicia
    Nagpal, Chirag
    Singh, Preeti
    Dewitt, Akeiylah
    Mansfield, Philip
    Prakash, Sushant
    Heller, Katherine
    Karthikesalingam, Alan
    Semturs, Christopher
    Barral, Joelle
    Corrado, Greg
    Matias, Yossi
    Smith-Loud, Jamila
    Horn, Ivor
    Singhal, Karan
    NATURE MEDICINE, 2024, 30 (12)
  • [9] Capturing Failures of Large Language Models via Human Cognitive Biases
    Jones, Erik
    Steinhardt, Jacob
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] A toolbox for surfacing health equity harms and biases in large language models
    Pfohl, Stephen R.
    Cole-Lewis, Heather
    Sayres, Rory
    Neal, Darlene
    Asiedu, Mercy
    Dieng, Awa
    Tomasev, Nenad
    Rashid, Qazi Mamunur
    Azizi, Shekoofeh
    Rostamzadeh, Negar
    Mccoy, Liam G.
    Celi, Leo Anthony
    Liu, Yun
    Schaekermann, Mike
    Walton, Alanna
    Parrish, Alicia
    Nagpal, Chirag
    Singh, Preeti
    Dewitt, Akeiylah
    Mansfield, Philip
    Prakash, Sushant
    Heller, Katherine
    Karthikesalingam, Alan
    Semturs, Christopher
    Barral, Joelle
    Corrado, Greg
    Matias, Yossi
    Smith-Loud, Jamila
    Horn, Ivor
    Singhal, Karan
    NATURE MEDICINE, 2024, 30 (12) : 3590 - 3600