Unmasking the Mask - Evaluating Social Biases in Masked Language Models

被引:0
|
作者
Kaneko, Masahiro [1 ]
Bollegala, Danushka [2 ,3 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
[2] Univ Liverpool, Liverpool, Merseyside, England
[3] Amazon, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked Language Models (MLMs) have shown superior performances in numerous downstream Natural Language Processing (NLP) tasks. Unfortunately, MLMs also demonstrate significantly worrying levels of social biases. We show that the previously proposed evaluation metrics for quantifying the social biases in MLMs are problematic due to the following reasons: (1) prediction accuracy of the masked tokens itself tend to be low in some MLMs, which leads to unreliable evaluation metrics, and (2) in most downstream NLP tasks, masks are not used; therefore prediction of the mask is not directly related to them, and (3) high-frequency words in the training data are masked more often, introducing noise due to this selection bias in the test cases. Therefore, we propose All Unmasked Likelihood (AUL), a bias evaluation measure that predicts all tokens in a test case given the MLM embedding of the unmasked input and AUL with Attention weights (AULA) to evaluate tokens based on their importance in a sentence. Our experimental results show that the proposed bias evaluation measures accurately detect different types of biases in MLMs, and unlike AUL and AULA, previously proposed measures for MLMs systematically overestimate the measured biases and are heavily influenced by the unmasked tokens in the context.
引用
收藏
页码:11954 / 11962
页数:9
相关论文
共 50 条
  • [41] Mask and Cloze: Automatic Open Cloze Question Generation Using a Masked Language Model
    Matsumori, Shoya
    Okuoka, Kohei
    Shibata, Ryoichi
    Inoue, Minami
    Fukuchi, Yosuke
    Imai, Michita
    IEEE ACCESS, 2023, 11 : 9835 - 9850
  • [42] French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English
    Neveol, Aurelie
    Dupont, Yoann
    Bezancon, Julien
    Fort, Karen
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8521 - 8531
  • [43] Context Analysis for Pre-trained Masked Language Models
    Lai, Yi-An
    Lalwani, Garima
    Zhang, Yi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3789 - 3804
  • [44] Unsupervised Text Style Transfer with Padded Masked Language Models
    Malmi, Eric
    Severyn, Aliaksei
    Rothe, Sascha
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8671 - 8680
  • [45] Evaluating generative patent language models
    Lee, Jieh-Sheng
    WORLD PATENT INFORMATION, 2023, 72
  • [46] Evaluating Approaches to Personalizing Language Models
    King, Milton
    Cook, Paul
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2461 - 2469
  • [47] Evaluating Text GANs as Language Models
    Tevet, Guy
    Habib, Gavriel
    Shwartz, Vered
    Berant, Jonathan
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2241 - 2247
  • [48] Statistically Profiling Biases in Natural Language Reasoning Datasets and Models
    Huang, Shanshan
    Zhu, Kenny Q.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4521 - 4530
  • [49] Learning with Enriched Inductive Biases for Vision-Language Models
    Yang, Lingxiao
    Zhang, Ru-Yuan
    Chen, Qi
    Xie, Xiaohua
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [50] Large language models display human-like social desirability biases in Big Five personality surveys
    Salecha, Aadesh
    Ireland, Molly E.
    Subrahmanya, Shashanka
    Sedoc, Joao
    Ungar, Lyle H.
    Eichstaedt, Johannes C.
    PNAS NEXUS, 2024, 3 (12):