Measuring bias in Instruction-Following models with P-AT

被引:0
|
作者
Onorati, Dario [1 ,2 ]
Ruzzetti, Elena Sofia [2 ]
Venditti, Davide [2 ]
Ranaldi, Leonardo [2 ,3 ]
Zanzotto, Fabio Massimo [2 ]
机构
[1] Sapienza Univ Rome, Rome, Italy
[2] Univ Roma Tor Vergata, Rome, Italy
[3] Idiap Res Inst, Martigny, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Instruction-Following Language Models (IFLMs) are promising and versatile tools for solving many downstream, informationseeking tasks. Given their success, there is an urgent need to have a shared resource to determine whether existing and new IFLMs are prone to produce biased language interactions. In this paper, we propose Prompt Association Test (P-AT): a new resource for testing the presence of social biases in IFLMs. P-AT stems from WEAT (Caliskan et al., 2017) and generalizes the notion of measuring social biases to IFLMs. Basically, we cast WEAT word tests in promptized classification tasks, and we associate a metric - the bias score. Our resource consists of 2310 prompts. We then experimented with several families of IFLMs discovering gender and race biases in all the analyzed models. We expect P-AT to be an important tool for quantifying bias across different dimensions and, therefore, for encouraging the creation of fairer IFLMs before their distortions have consequences in the real world.
引用
收藏
页码:8006 / 8034
页数:29
相关论文
共 50 条
  • [31] Measuring and Mitigating Gender Bias in Legal Contextualized Language Models
    Bozdag, Mustafa
    Sevim, Nurullah
    Koc, Aykut
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (04)
  • [32] Measuring Political Bias in Large Language Models: What Is Said and How It Is Said
    Bang, Yejin
    Chen, Delong
    Lee, Nayeon
    Fung, Pascale
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11142 - 11159
  • [33] Measuring Bias in AI Models: An Statistical Approach Introducing N-Sigma
    DeAlcala, Daniel
    Serna, Ignacio
    Morales, Aythami
    Fierrez, Julian
    Ortega-Garcia, Javier
    2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 1167 - 1172
  • [34] On least-squares bias in the AR(p) models: Bias correction using the bootstrap methods
    Hisashi Tanizaki
    Shigeyuki Hamori
    Yoichi Matsubayashi
    Statistical Papers, 2006, 47 : 109 - 124
  • [35] On least-squares bias in the AR(p) models:: Bias correction using the bootstrap methods
    Tanizaki, H
    Hamori, S
    Matsubayashi, Y
    STATISTICAL PAPERS, 2006, 47 (01) : 109 - 124
  • [36] The process dissociation procedure: Testable models for measuring controlled, automatic, and response bias processes
    VaterrodtPlunnecke, B
    Kruger, T
    Gerdes, H
    Bredenkamp, J
    ZEITSCHRIFT FUR EXPERIMENTELLE PSYCHOLOGIE, 1996, 43 (03): : 483 - 519
  • [37] This Prompt is Measuring <MASK>: Evaluating Bias Evaluation in Language Models
    Goldfarb-Tarrant, Seraphina
    Ungless, Eddie
    Balkir, Esma
    Blodgett, Su Lin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 2209 - 2225
  • [38] Measuring Implicit Bias in ICU Notes Using Word-Embedding Neural Network Models
    Cobert, Julien
    Mills, Hunter
    Lee, Albert
    Gologorskaya, Oksana
    Espejo, Edie
    Jeon, Sun Young
    Boscardin, W. John
    Heintz, Timothy A.
    Kennedy, Christopher J.
    Ashana, Deepshikha C.
    Chapman, Allyson Cook
    Raghunathan, Karthik
    Smith, Alex K.
    Lee, Sei J.
    CHEST, 2024, 165 (06) : 1481 - 1490
  • [39] Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models
    Zhang, Yi
    Wang, Junyang
    Sang, Jitao
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4996 - 5004
  • [40] Innovation and Validation of an Assessment Method UsingMolecular Models Following Stereochemistry Instruction in anOrganic Chemistry Course
    Alsfouk, Aisha
    JOURNAL OF CHEMICAL EDUCATION, 2022, 99 (05) : 1900 - 1905