Detoxifying Language Models with a Toxic Corpus

被引:0
|
作者
Park, Yoon A. [1 ,2 ]
Rudzicz, Frank [1 ,2 ,3 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst Artificial Intelligence, Toronto, ON, Canada
[3] Unity Hlth Toronto, Toronto, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing studies have investigated the tendency of autoregressive language models to generate contexts that exhibit undesired biases and toxicity. Various debiasing approaches have been proposed, which are primarily categorized into data-based and decoding-based. In our study, we investigate the ensemble of the two debiasing paradigms, proposing to use toxic corpus as an additional resource to reduce the toxicity. Our result shows that toxic corpus can indeed help to reduce the toxicity of the language generation process substantially, complementing the existing debiasing methods.
引用
收藏
页码:41 / 46
页数:6
相关论文
共 50 条
  • [11] Statistical Analysis of Multilingual Text Corpus and Development of Language Models
    Agrawal, Shyam S.
    Bansal, Abhimanue Shweta
    Mahajan, Minakshi
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2436 - 2440
  • [12] Corpus-Steered Query Expansion with Large Language Models
    Lei, Yibin
    Cao, Yu
    Zhou, Tianyi
    Shen, Tao
    Yates, Andrew
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 393 - 401
  • [13] Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models
    Choe, Jaeyoung
    Noh, Keonwoong
    Kim, Nayeon
    Ahn, Seyun
    Jung, Woohwan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2101 - 2112
  • [14] Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
    Wang, Boxin
    Ping, Wei
    Xiao, Chaowei
    Xu, Peng
    Patwary, Mostofa
    Shoeybi, Mohammad
    Li, Bo
    Anandkumar, Anima
    Catanzaro, Bryan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [15] A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models
    Snaebjarnarson, Vesteinn
    Simonarson, Haukur Barri
    Ragnarsson, Petur Orri
    Ingolfsdottir, Svanhvit Lilja
    Jonsson, Haukur Pall
    Thorsteinsson, Vilhjalmur
    Einarsson, Hafsteinn
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4356 - 4366
  • [16] Parallel Corpus Filtering via Pre-trained Language Models
    DiDi Labs
    arXiv, 2020,
  • [17] MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning
    Zhang, Xu
    Wan, Xiaojun
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 190 - 202
  • [18] Unveiling Toxic Tendencies of Small Language Models in Unconstrained Generation Tasks
    Chandra, Lakshay
    Susan, Seba
    Kumar, Dhruv
    Kant, Krishan
    10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, CONECCT 2024, 2024,
  • [19] Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models
    Zhang, Jiang
    Wu, Qiong
    Xu, Yiming
    Cao, Cheng
    Du, Zheng
    Psounis, Konstantinos
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21779 - 21787
  • [20] Probing Toxic Content in Large Pre-Trained Language Models
    Ousidhoum, Nedjma
    Zhao, Xinran
    Fang, Tianqing
    Song, Yangqiu
    Yeung, Dit-Yan
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4262 - 4274