Detoxifying Language Models Risks Marginalizing Minority Voices

被引:0
|
作者
Xu, Albert [1 ]
Pathak, Eshaan [1 ]
Wallace, Eric [1 ]
Gururangan, Suchin [2 ]
Sap, Maarten [2 ]
Klein, Dan [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Univ Washington, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language models (LMs) must be both safe and equitable to be responsibly deployed in practice. With safety in mind, numerous detoxification techniques (e.g., Dathathri et al. 2020; Krause et al. 2020) have been proposed to mitigate toxic LM generations. In this work, we show that these detoxification techniques hurt equity: they decrease the utility of LMs on language used by marginalized groups (e.g., African-American English and minority identity mentions). In particular, we perform automatic and human evaluations of text generation quality when LMs are conditioned on inputs with different dialects and group identifiers. We find that detoxification makes LMs more brittle to distribution shift, especially on language used by marginalized groups. We identify that these failures stem from detoxification methods exploiting spurious correlations in toxicity datasets. Overall, our results highlight the tension between the controllability and distributional robustness of LMs.
引用
收藏
页码:2390 / 2397
页数:8
相关论文
共 50 条
  • [1] Challenges in Detoxifying Language Models
    Welbl, Johannes
    Glaese, Amelia
    Uesato, Jonathan
    Dathathri, Sumanth
    Mellor, John
    Hendricks, Lisa Anne
    Anderson, Kirsty
    Kohli, Pushmeet
    Coppin, Ben
    Huang, Po-Sen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2447 - 2469
  • [2] Their Way or No Way: "Whiteness" as Agent for Marginalizing and Silencing Minority Voices In Academic Research and Publication
    Baffoe, Michael
    Asimeng-Boahene, Lewis
    Ogbuagu, Buster C.
    EUROPEAN JOURNAL OF SUSTAINABLE DEVELOPMENT, 2014, 3 (01): : 13 - 31
  • [3] Detoxifying Language Models with a Toxic Corpus
    Park, Yoon A.
    Rudzicz, Frank
    PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 41 - 46
  • [4] Training Hybrid Language Models by Marginalizing over Segmentations
    Grave, Edouard
    Sukhbaatar, Sainbayar
    Bojanowski, Piotr
    Joulin, Armand
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1477 - 1482
  • [5] Detoxifying Large Language Models via Knowledge Editing
    Wang, Mengru
    Zhang, Ningyu
    Xu, Ziwen
    Xi, Zekun
    Deng, Shumin
    Yao, Yunzhi
    Zhang, Qishen
    Yang, Linyi
    Wang, Jindong
    Chen, Huajun
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3093 - 3118
  • [6] Revitalizing Minority Voices: Language Issues in the New Millennium
    Sarmiento, Brenda
    JOURNAL OF LANGUAGE IDENTITY AND EDUCATION, 2018, 17 (03): : 198 - 200
  • [7] Self-Detoxifying Language Models via Toxification Reversal
    Leong, Chak Tou
    Cheng, Yi
    Wang, Jiashuo
    Wang, Jian
    Li, Wenjie
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4433 - 4449
  • [8] Centring and marginalizing: the "soft middle" and Japanese minority education
    Bondy, Christopher
    ASIA PACIFIC JOURNAL OF EDUCATION, 2014, 34 (01) : 93 - 106
  • [9] Marginalizing and conditioning in graphical models
    Koster, JTA
    BERNOULLI, 2002, 8 (06) : 817 - 840
  • [10] MINORITY VOICES
    MORA, P
    EDUCATION FOR A MULTICULTURAL SOCIETY : A NEW AGENDA FOR CONTINUING HIGHER EDUCATION, 1989, : 2 - 10