Detecting implicit biases of large language models with Bayesian hypothesis testingDetecting Implicit Biases of Large Language Models...S. Si et al.

被引:0
|
作者
Shijing Si [1 ]
Xiaoming Jiang [2 ]
Qinliang Su [6 ]
Lawrence Carin [3 ]
机构
[1] Shanghai International Studies University,School of Economics and Finance
[2] Shanghai International Studies University,Institute of Language Sciences
[3] Sun Yat-sen University,School of Computer Science and Engineering
[4] Guangdong Key Laboratory of Big Data Analysis and Processing,Department of Electronic and Computer Engineering
[5] Duke University, Key Laboratory of Language Sciences and Multilingual Intelligence Applications
[6] Shanghai International Studies University,undefined
关键词
Large language models; Group bias; Fairness; Bayes factor;
D O I
10.1038/s41598-025-95825-x
中图分类号
学科分类号
摘要
Despite the remarkable performance of large language models (LLMs), such as generative pre-trained Transformers (GPTs), across various tasks, they often perpetuate social biases and stereotypes embedded in their training data. In this paper, we introduce a novel framework that reformulates bias detection in LLMs as a hypothesis testing problem, where the null hypothesis \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} represents the absence of implicit bias. Our framework leverages binary-choice questions to measure social bias in both open-source and proprietary LLMs accessible via APIs. We demonstrate the flexibility of our approach by integrating classical statistical methods, such as the exact binomial test, with Bayesian inference using Bayes factors for bias detection and quantification. Extensive experiments are conducted on prominent models, including ChatGPT (GPT-3.5-Turbo), DeepSeek-V3, and Llama-3.1-70B, utilizing publicly available datasets such as BBQ, CrowS-Pairs (in both English and French), and Winogender. While the exact Binomial test fails to distinguish between no evidence of bias and evidence of no bias, our results underscore the advantages of Bayes factors, particularly their capacity to quantify evidence for both competing hypotheses and their robustness to small sample size. Additionally, our experiments reveal that the bias behavior of LLMs is largely consistent across the English and French versions of the CrowS-Pairs dataset, with subtle differences likely arising from variations in social norms across linguistic and cultural contexts.
引用
收藏
相关论文
共 50 条
  • [21] Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset
    Ozeki, Kentaro
    Ando, Risako
    Morishita, Takanobu
    Abe, Hirohiko
    Mineshima, Koji
    Okada, Mitsuhiro
    arXiv,
  • [22] Fairness in AI-Driven Oncology: Investigating Racial and Gender Biases in Large Language Models
    Agrawal, Anjali
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (09)
  • [23] Large language models show human- like content biases in transmission chain experiments
    Acerbi, Alberto
    Stubbersfield, Joseph M.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (44)
  • [24] Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT
    Hagendorff, Thilo
    Fabi, Sarah
    Kosinski, Michal
    NATURE COMPUTATIONAL SCIENCE, 2023, 3 (10): : 833 - +
  • [25] Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT
    Thilo Hagendorff
    Sarah Fabi
    Michal Kosinski
    Nature Computational Science, 2023, 3 : 833 - 838
  • [26] Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation
    Zhang, Bo
    Ma, Hui
    Ding, Jian
    Wang, Jian
    Xu, Bo
    Lin, Hongfei
    INFORMATION FUSION, 2025, 118
  • [27] Welcome to the Era of ChatGPT et al.The Prospects of Large Language Models
    Timm Teubner
    Christoph M. Flath
    Christof Weinhardt
    Wil van der Aalst
    Oliver Hinz
    Business & Information Systems Engineering, 2023, 65 : 95 - 101
  • [28] The Future of Large Language Models in Social Science Research: Reply to Berger (2024) and Carrillo et al. (2024)
    Banker, Sachin
    Chatterjee, Promothesh
    Mishra, Himanshu
    Mishra, Arul
    AMERICAN PSYCHOLOGIST, 2024, 79 (06) : 803 - 804
  • [29] Large language models display human-like social desirability biases in Big Five personality surveys
    Salecha, Aadesh
    Ireland, Molly E.
    Subrahmanya, Shashanka
    Sedoc, Joao
    Ungar, Lyle H.
    Eichstaedt, Johannes C.
    PNAS NEXUS, 2024, 3 (12):
  • [30] Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models
    Kumar, Abhishek
    Yunusov, Sarfaroz
    Emami, Ali
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 375 - 392