BBQ: A Hand-Built Bias Benchmark for Question Answering

被引:0
|
作者
Parrish, Alicia [1 ]
Chen, Angelica [2 ]
Nangia, Nikita [2 ]
Padmakumar, Vishakh [2 ]
Phang, Jason [2 ]
Thompson, Jana [2 ]
Phu Mon Htut [2 ]
Bowman, Samuel R. [1 ,2 ,3 ]
机构
[1] NYU, Dept Linguist, New York, NY 10003 USA
[2] NYU, Ctr Data Sci, New York, NY 10003 USA
[3] NYU, Dept Comp Sci, New York, NY 10003 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well documented that NLP models learn social biases, but little work has been done on how these biases manifest in model outputs for applied tasks like question answering (QA). We introduce the Bias Benchmark for QA (BBQ), a dataset of question sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions relevant for U.S. English-speaking contexts. Our task evaluates model responses at two levels: (i) given an under-informative context, we test how strongly responses refect social biases, and (ii) given an adequately informative context, we test whether the model's biases override a correct answer choice. We fnd that models often rely on stereotypes when the context is under-informative, meaning the model's outputs consistently reproduce harmful biases in this setting. Though models are more accurate when the context provides an informative answer, they still rely on stereotypes and average up to 3.4 percentage points higher accuracy when the correct answer aligns with a social bias than when it conficts, with this difference widening to over 5 points on examples targeting gender for most models tested.
引用
收藏
页码:2086 / 2105
页数:20
相关论文
共 50 条
  • [1] KoBBQ: Korean Bias Benchmark for Question Answering
    Jin, Jiho
    Kim, Jiseon
    Lee, Nayeon
    Yoo, Haneul
    Oh, Alice
    Lee, Hwaran
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 507 - 524
  • [2] HAND-BUILT HOUSES OF HORNBY IS
    HELLIWELL, B
    MCNAMARA, M
    [J]. ARCHITECTURAL DESIGN, 1978, 48 (07) : 450 - &
  • [3] Hand-built line offers tasteful production
    [J]. Packag Dig, 4 (02):
  • [4] CUTTING COSTS IN PRODUCTION OF HAND-BUILT CARS
    HUSH, JS
    [J]. INDUSTRIAL DIAMOND REVIEW, 1969, 29 (339): : 74 - &
  • [5] A HAND-BUILT MODEL OF HYDROGENATED AMORPHOUS-SILICON
    MOSSERI, R
    DIXMIER, J
    [J]. JOURNAL OF NON-CRYSTALLINE SOLIDS, 1981, 44 (2-3) : 383 - 385
  • [6] TempQuestions: A Benchmark for Temporal Question Answering
    Jia, Zhen
    Abujabal, Abdalghani
    Roy, Rishiraj Saha
    Stroetgen, Jannik
    Weikum, Gerhard
    [J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1057 - 1062
  • [7] Natural Questions: A Benchmark for Question Answering Research
    Kwiatkowski T.
    Palomaki J.
    Redfield O.
    Collins M.
    Parikh A.
    Alberti C.
    Epstein D.
    Polosukhin I.
    Devlin J.
    Lee K.
    Toutanova K.
    Jones L.
    Kelcey M.
    Chang M.-W.
    Dai A.M.
    Uszkoreit J.
    Le Q.
    Petrov S.
    [J]. Transactions of the Association for Computational Linguistics, 2019, 7 : 453 - 466
  • [8] Question and Answer Classification in Czech Question Answering Benchmark Dataset
    Kusnirakova, Dasa
    Medved, Marek
    Horak, Ales
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 701 - 706
  • [9] Natural Questions: A Benchmark for Question Answering Research
    Kwiatkowski, Tom
    Palomaki, Jennimaria
    Redfield, Olivia
    Collins, Michael
    Parikh, Ankur
    Alberti, Chris
    Epstein, Danielle
    Polosukhin, Illia
    Devlin, Jacob
    Lee, Kenton
    Toutanova, Kristina
    Jones, Llion
    Kelcey, Matthew
    Chang, Ming-Wei
    Dai, Andrew M.
    Uszkoreit, Jakob
    Quoc Le
    Petrov, Slav
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2019, 7 : 453 - 466
  • [10] AgXQA: A benchmark for advanced Agricultural Extension question answering
    Kpodo, Josue
    Kordjamshidi, Parisa
    Nejadhashemi, A. Pouyan
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 225