Measuring bias in Instruction-Following models with P-AT

被引:0
|
作者
Onorati, Dario [1 ,2 ]
Ruzzetti, Elena Sofia [2 ]
Venditti, Davide [2 ]
Ranaldi, Leonardo [2 ,3 ]
Zanzotto, Fabio Massimo [2 ]
机构
[1] Sapienza Univ Rome, Rome, Italy
[2] Univ Roma Tor Vergata, Rome, Italy
[3] Idiap Res Inst, Martigny, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Instruction-Following Language Models (IFLMs) are promising and versatile tools for solving many downstream, informationseeking tasks. Given their success, there is an urgent need to have a shared resource to determine whether existing and new IFLMs are prone to produce biased language interactions. In this paper, we propose Prompt Association Test (P-AT): a new resource for testing the presence of social biases in IFLMs. P-AT stems from WEAT (Caliskan et al., 2017) and generalizes the notion of measuring social biases to IFLMs. Basically, we cast WEAT word tests in promptized classification tasks, and we associate a metric - the bias score. Our resource consists of 2310 prompts. We then experimented with several families of IFLMs discovering gender and race biases in all the analyzed models. We expect P-AT to be an important tool for quantifying bias across different dimensions and, therefore, for encouraging the creation of fairer IFLMs before their distortions have consequences in the real world.
引用
收藏
页码:8006 / 8034
页数:29
相关论文
共 50 条
  • [41] Measuring neuropsychological change following breast cancer treatment: An analysis of statistical models
    Ouimet, L. A.
    Stewart, A.
    Collins, B.
    Schindler, D.
    Bielajew, C.
    JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY, 2009, 31 (01) : 73 - 89
  • [42] Measuring Fairness with Biased Rulers: A Comparative Study on Bias Metrics for Pre-trained Language Models
    Delobelle, Pieter
    Tokpo, Ewoenam Kwaku
    Calders, Toon
    Berendt, Bettina
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1693 - 1706
  • [43] The effects of climate change on Australia's only endemic Pokemon: Measuring bias in species distribution models
    Warren, Dan L.
    Dornburg, Alex
    Zapfe, Katerina
    Iglesias, Teresa L.
    METHODS IN ECOLOGY AND EVOLUTION, 2021, 12 (06): : 985 - 995
  • [44] CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following
    Zhang, Kaiyan
    Wang, Jianyu
    Hua, Ermo
    Qi, Biqing
    Ding, Ning
    Zhou, Bowen
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 4295 - 4312
  • [45] Measuring judgement bias and emotional reactivity in sheep following long-term exposure to unpredictable and aversive events
    Doyle, Rebecca E.
    Lee, Caroline
    Deiss, Veronique
    Fisher, Andrew D.
    Hinch, Geoff N.
    Boissy, Alain
    PHYSIOLOGY & BEHAVIOR, 2011, 102 (05) : 503 - 510
  • [46] Measuring Physical Activity in Preschoolers: Reliability and Validity of the System for Observing Fitness Instruction Time for Preschoolers (SOFIT-P)
    Sharma, Shreela
    Chuang, Ru-Jye
    Skala, Katherine
    Atteberry, Heather
    MEASUREMENT IN PHYSICAL EDUCATION AND EXERCISE SCIENCE, 2011, 15 (04) : 257 - 273
  • [47] Unification of contemporary negative bias temperature instability models for p-MOSFET energy degradation
    Karim, Nissar Mohammad
    Manzoor, Sadia
    Soin, Norhayati
    RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2013, 26 : 776 - 780
  • [48] INVESTIGATION OF CARRIER BEHAVIOR IN BASE REGIONS OF A P-N-P-N-STRUCTURE FOLLOWING STRAIGHT BIAS OF A COLLECTOR JUNCTION
    TOGATOV, VV
    RADIOTEKHNIKA I ELEKTRONIKA, 1974, 19 (01): : 136 - 141
  • [49] Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following Demonstrations
    Ranaldi, Leonardo
    Pucci, Giulia
    Freitas, Andre
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 7961 - 7973
  • [50] French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English
    Neveol, Aurelie
    Dupont, Yoann
    Bezancon, Julien
    Fort, Karen
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8521 - 8531