Towards Understanding and Mitigating Social Biases in Language Models

被引:0
|
作者
Liang, Paul Pu [1 ]
Wu, Chiyu [1 ]
Morency, Louis-Philippe [1 ]
Salakhutdinov, Ruslan [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As machine learning methods are deployed in real-world settings such as healthcare, legal systems, and social science, it is crucial to recognize how they shape social biases and stereotypes in these sensitive decision-making processes. Among such real-world deployments are large-scale pretrained language models (LMs) that can be potentially dangerous in manifesting undesirable representational biases - harmful biases resulting from stereotyping that propagate negative generalizations involving gender, race, religion, and other social constructs. As a step towards improving the fairness of LMs, we carefully define several sources of representational biases before proposing new benchmarks and metrics to measure them. With these tools, we propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information for high-fidelity text generation, thereby pushing forward the performance-fairness Pareto frontier.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Towards understanding and mitigating unintended biases in language model-driven conversational recommendation
    Shen, Tianshu
    Li, Jiaru
    Bouadjenek, Mohamed Reda
    Mai, Zheda
    Sanner, Scott
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (01)
  • [2] Using Natural Sentences for Understanding Biases in Language Models
    Alnegheimish, Sarah
    Guo, Alicia
    Sun, Yi
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2824 - 2830
  • [3] Understanding Social Reasoning in Language Models with Language Models
    Gandhi, Kanishk
    Franken, J. -Philipp
    Gerstenberg, Tobias
    Goodman, Noah D.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Understanding, Uncovering, and Mitigating the Causes of Inference Slowdown for Language Models
    Varma, Kamala
    Numanoglu, Arda
    Kaya, Yigitcan
    Dumitras, Tudor
    [J]. IEEE CONFERENCE ON SAFE AND TRUSTWORTHY MACHINE LEARNING, SATML 2024, 2024, : 723 - 740
  • [5] Measuring and mitigating language model biases in abusive language detection
    Song, Rui
    Giunchiglia, Fausto
    Li, Yingji
    Shi, Lida
    Xu, Hao
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [6] Mitigating social biases of pre-trained language models via contrastive self-debiasing with double data augmentation
    Li, Yingji
    Du, Mengnan
    Song, Rui
    Wang, Xin
    Sun, Mingchen
    Wang, Ying
    [J]. ARTIFICIAL INTELLIGENCE, 2024, 332
  • [7] Unmasking the Mask - Evaluating Social Biases in Masked Language Models
    Kaneko, Masahiro
    Bollegala, Danushka
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11954 - 11962
  • [8] Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
    Ma, Chengcheng
    Liu, Yang
    Deng, Jiankang
    Xie, Lingxi
    Dong, Weiming
    Xu, Changsheng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4616 - 4629
  • [9] He is very intelligent, she is very beautiful? On Mitigating Social Biases in Language Modelling and Generation
    Garimella, Aparna
    Amarnath, Akhash
    Rathlavath, Kiran Kumar
    Yalla, Akash Pramod
    Anandhavelu, N.
    Chhaya, Niyati
    Srinivasan, Balaji Vasan
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4534 - 4545
  • [10] Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models
    Liu, Yang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18707 - 18715