Measuring bias in Instruction-Following models with P-AT

被引:0
|
作者
Onorati, Dario [1 ,2 ]
Ruzzetti, Elena Sofia [2 ]
Venditti, Davide [2 ]
Ranaldi, Leonardo [2 ,3 ]
Zanzotto, Fabio Massimo [2 ]
机构
[1] Sapienza Univ Rome, Rome, Italy
[2] Univ Roma Tor Vergata, Rome, Italy
[3] Idiap Res Inst, Martigny, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Instruction-Following Language Models (IFLMs) are promising and versatile tools for solving many downstream, informationseeking tasks. Given their success, there is an urgent need to have a shared resource to determine whether existing and new IFLMs are prone to produce biased language interactions. In this paper, we propose Prompt Association Test (P-AT): a new resource for testing the presence of social biases in IFLMs. P-AT stems from WEAT (Caliskan et al., 2017) and generalizes the notion of measuring social biases to IFLMs. Basically, we cast WEAT word tests in promptized classification tasks, and we associate a metric - the bias score. Our resource consists of 2310 prompts. We then experimented with several families of IFLMs discovering gender and race biases in all the analyzed models. We expect P-AT to be an important tool for quantifying bias across different dimensions and, therefore, for encouraging the creation of fairer IFLMs before their distortions have consequences in the real world.
引用
收藏
页码:8006 / 8034
页数:29
相关论文
共 50 条
  • [1] Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination
    Yang, Nakyeong
    Kang, Taegwan
    Choi, Jungkyu
    Lee, Honglak
    Jung, Kyomin
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9061 - 9073
  • [2] Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
    Adlakha, Vaibhav
    Ghader, Parishad Behnam
    Lu, Xing Han
    Meade, Nicholas
    Reddy, Siva
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 681 - 699
  • [3] Efficient Inference of Vision Instruction-Following Models with Elastic Cache
    Liu, Zuyan
    Liu, Benlin
    Wang, Jiahui
    Dong, Yuhao
    Chen, Guangyi
    Rao, Yongming
    Krishna, Ranjay
    Lu, Jiwen
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 54 - 69
  • [4] Recombinative Instruction-Following without Reinforcement
    Fábio Freire Laporte
    Raquel Maria de Melo
    The Psychological Record, 2023, 73 : 237 - 251
  • [5] Recombinative Instruction-Following without Reinforcement
    Laporte, Fabio Freire
    de Melo, Raquel Maria
    PSYCHOLOGICAL RECORD, 2023, 73 (02): : 237 - 251
  • [6] TEACHING AND GENERALIZATION OF INSTRUCTION-FOLLOWING IN AN AUTISTIC CHILD
    CRAIGHEAD, WE
    OLEARY, KD
    ALLEN, JS
    JOURNAL OF BEHAVIOR THERAPY AND EXPERIMENTAL PSYCHIATRY, 1973, 4 (02) : 171 - 176
  • [7] ANTECEDENT AND CONSEQUENTIAL CONTROL OF DERIVED INSTRUCTION-FOLLOWING
    O'Hora, Denis
    Barnes-Holmes, Dermot
    Stewart, Ian
    JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 2014, 102 (01) : 66 - 85
  • [8] An evaluation of textual prompts and generalized textual instruction-following
    Phillips, Cara L.
    Vollmer, Timothy R.
    Porter, Allen
    JOURNAL OF APPLIED BEHAVIOR ANALYSIS, 2019, 52 (04) : 1140 - 1160
  • [9] Effects of Monitoring and Social Reprimands on Instruction-Following in Undergraduate Students
    Josiane Maria Donadeli
    Bruno Angelo Strapasson
    The Psychological Record, 2015, 65 : 177 - 188
  • [10] FactLLaMA: Optimizing Instruction-Following Language Models with External Knowledge for Automated Fact-Checking
    Cheung, Tsun-Hin
    Lam, Kin-Man
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 846 - 853