Detoxifying Language Models with a Toxic Corpus

被引:0
|
作者
Park, Yoon A. [1 ,2 ]
Rudzicz, Frank [1 ,2 ,3 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst Artificial Intelligence, Toronto, ON, Canada
[3] Unity Hlth Toronto, Toronto, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing studies have investigated the tendency of autoregressive language models to generate contexts that exhibit undesired biases and toxicity. Various debiasing approaches have been proposed, which are primarily categorized into data-based and decoding-based. In our study, we investigate the ensemble of the two debiasing paradigms, proposing to use toxic corpus as an additional resource to reduce the toxicity. Our result shows that toxic corpus can indeed help to reduce the toxicity of the language generation process substantially, complementing the existing debiasing methods.
引用
收藏
页码:41 / 46
页数:6
相关论文
共 50 条
  • [1] Challenges in Detoxifying Language Models
    Welbl, Johannes
    Glaese, Amelia
    Uesato, Jonathan
    Dathathri, Sumanth
    Mellor, John
    Hendricks, Lisa Anne
    Anderson, Kirsty
    Kohli, Pushmeet
    Coppin, Ben
    Huang, Po-Sen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2447 - 2469
  • [2] Large Language Models are Not Models of Natural Language: They are Corpus Models
    Veres, Csaba
    IEEE ACCESS, 2022, 10 : 61970 - 61979
  • [3] Detoxifying Language Models Risks Marginalizing Minority Voices
    Xu, Albert
    Pathak, Eshaan
    Wallace, Eric
    Gururangan, Suchin
    Sap, Maarten
    Klein, Dan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2390 - 2397
  • [4] Detoxifying Large Language Models via Knowledge Editing
    Wang, Mengru
    Zhang, Ningyu
    Xu, Ziwen
    Xi, Zekun
    Deng, Shumin
    Yao, Yunzhi
    Zhang, Qishen
    Yang, Linyi
    Wang, Jindong
    Chen, Huajun
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3093 - 3118
  • [5] Self-Detoxifying Language Models via Toxification Reversal
    Leong, Chak Tou
    Cheng, Yi
    Wang, Jiashuo
    Wang, Jian
    Li, Wenjie
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4433 - 4449
  • [6] Detoxifying Large Language Models via Kahneman-Tversky Optimization
    Li, Qingquan
    Du, Wenlong
    Liu, Jin
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT V, NLPCC 2024, 2025, 15363 : 409 - 417
  • [7] The Norwegian Colossal Corpus: A Text Corpus for Training Large Norwegian Language Models
    Kummervold, Per E.
    Wetjen, Freddy
    de la Rosa, Javier
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3852 - 3860
  • [8] IS DESCRIBING LANGUAGE MERE BUTTERFLY COLLECTION? ON EPISTEMOLOGY, STATISTICAL LANGUAGE MODELS, AND CORPUS
    de Uzeda-Garrao, Milena
    12TH INTERNATIONAL CONFERENCE OF EDUCATION, RESEARCH AND INNOVATION (ICERI2019), 2019, : 10900 - 10903
  • [9] REALTOXICITYPROMPTS: Evaluating Neural Toxic Degeneration in Language Models
    Gehman, Samuel
    Gururangan, Suchin
    Sap, Maarten
    Choi, Yejin
    Smith, Noah A.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [10] Efficient Detection of Toxic Prompts in Large Language Models
    Liu, Yi
    Yu, Junzhe
    Sun, Huijia
    Shi, Ling
    Deng, Gelei
    Chen, Yuqi
    Liu, Yang
    arXiv, 1600,