Measuring bias in Instruction-Following models with P-AT

被引:0
|
作者
Onorati, Dario [1 ,2 ]
Ruzzetti, Elena Sofia [2 ]
Venditti, Davide [2 ]
Ranaldi, Leonardo [2 ,3 ]
Zanzotto, Fabio Massimo [2 ]
机构
[1] Sapienza Univ Rome, Rome, Italy
[2] Univ Roma Tor Vergata, Rome, Italy
[3] Idiap Res Inst, Martigny, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Instruction-Following Language Models (IFLMs) are promising and versatile tools for solving many downstream, informationseeking tasks. Given their success, there is an urgent need to have a shared resource to determine whether existing and new IFLMs are prone to produce biased language interactions. In this paper, we propose Prompt Association Test (P-AT): a new resource for testing the presence of social biases in IFLMs. P-AT stems from WEAT (Caliskan et al., 2017) and generalizes the notion of measuring social biases to IFLMs. Basically, we cast WEAT word tests in promptized classification tasks, and we associate a metric - the bias score. Our resource consists of 2310 prompts. We then experimented with several families of IFLMs discovering gender and race biases in all the analyzed models. We expect P-AT to be an important tool for quantifying bias across different dimensions and, therefore, for encouraging the creation of fairer IFLMs before their distortions have consequences in the real world.
引用
收藏
页码:8006 / 8034
页数:29
相关论文
共 50 条
  • [21] Multimodal Attention-Based Instruction-Following Part-Level Affordance Grounding
    Qu, Wen
    Guo, Lulu
    Cui, Jian
    Jin, Xiao
    APPLIED SCIENCES-BASEL, 2024, 14 (11):
  • [22] DIFFERENTIAL EFFECTS OF TOKEN REINFORCEMENT ON INSTRUCTION-FOLLOWING BEHAVIOR IN RETARTED STUDENTS INSTRUCTED AS A GROUP
    ZIMMERMAN, EH
    ZIMMERMAN, J
    RUSSELL, CD
    JOURNAL OF APPLIED BEHAVIOR ANALYSIS, 1969, 2 (02) : 101 - +
  • [23] Answer is All You Need: Instruction-following Text Embedding via Answering the Question
    Peng, Letian
    Zhang, Yuwei
    Wang, Zilong
    Srinivasa, Jayanth
    Liu, Gaowen
    Wang, Zihan
    Shang, Jingbo
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 459 - 477
  • [24] GENERALIZED ACTION OBJECT VERBAL INSTRUCTION-FOLLOWING BY PROFOUNDLY MENTALLY-RETARDED ADULTS
    MCCULLER, WR
    SALZBERG, CL
    AMERICAN JOURNAL OF MENTAL DEFICIENCY, 1984, 88 (04): : 442 - 445
  • [25] Measuring Agreeableness Bias in Multimodal Models
    Lim, Jaehyuk
    Lee, Bruce W.
    arXiv,
  • [26] Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM
    Zhang, Ruohong
    Wang, Yau-Shian
    Yang, Yiming
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 659 - 673
  • [27] EFFECTS OF DIRECT, INTERMITTENT, AND VICARIOUS REINFORCEMENT PROCEDURES ON DEVELOPMENT AND MAINTENANCE OF INSTRUCTION-FOLLOWING BEHAVIORS IN A GROUP OF YOUNG-CHILDREN
    WEISBERG, P
    CLEMENTS, P
    JOURNAL OF APPLIED BEHAVIOR ANALYSIS, 1977, 10 (02) : 314 - 314
  • [28] Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias
    Itzhak, Itay
    Stanovsky, Gabriel
    Rosenfeld, Nir
    Belinkov, Yonatan
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 771 - 785
  • [29] INSTRUCTION-FOLLOWING BEHAVIOR OF AN ENTIRE 3RD-GRADE CLASSROOM - EFFECTS OF A TIMING DEVICE, PUBLIC POSTING AND TEACHER PRAISE
    BURNETT, L
    MCLAUGHLIN, TF
    HUNSAKER, D
    BEHAVIORAL ENGINEERING, 1978, 5 (02): : 37 - 40
  • [30] Structural equation models and the regression bias for measuring correlates of change
    Cribbie, RA
    Jamieson, J
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2000, 60 (06) : 893 - 907