Detoxifying Language Models with a Toxic Corpus

被引：0

作者：

Park, Yoon A. ^{[1
,2
]}

Rudzicz, Frank ^{[1
,2
,3
]}

机构：

[1] Univ Toronto, Toronto, ON, Canada

[2] Vector Inst Artificial Intelligence, Toronto, ON, Canada

[3] Unity Hlth Toronto, Toronto, ON, Canada

来源：

PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing studies have investigated the tendency of autoregressive language models to generate contexts that exhibit undesired biases and toxicity. Various debiasing approaches have been proposed, which are primarily categorized into data-based and decoding-based. In our study, we investigate the ensemble of the two debiasing paradigms, proposing to use toxic corpus as an additional resource to reduce the toxicity. Our result shows that toxic corpus can indeed help to reduce the toxicity of the language generation process substantially, complementing the existing debiasing methods.

引用

页码：41 / 46

页数：6

共 50 条

[11] Statistical Analysis of Multilingual Text Corpus and Development of Language Models
Agrawal, Shyam S.
Bansal, Abhimanue Shweta
Mahajan, Minakshi
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2436 - 2440
[12] Corpus-Steered Query Expansion with Large Language Models
Lei, Yibin
Cao, Yu
Zhou, Tianyi
Shen, Tao
Yates, Andrew
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 393 - 401
[13] Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models
Choe, Jaeyoung
Noh, Keonwoong
Kim, Nayeon
Ahn, Seyun
Jung, Woohwan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2101 - 2112
[14] Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Wang, Boxin
Ping, Wei
Xiao, Chaowei
Xu, Peng
Patwary, Mostofa
Shoeybi, Mohammad
Li, Bo
Anandkumar, Anima
Catanzaro, Bryan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[15] A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models
Snaebjarnarson, Vesteinn
Simonarson, Haukur Barri
Ragnarsson, Petur Orri
Ingolfsdottir, Svanhvit Lilja
Jonsson, Haukur Pall
Thorsteinsson, Vilhjalmur
Einarsson, Hafsteinn
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4356 - 4366
[16] Parallel Corpus Filtering via Pre-trained Language Models
DiDi Labs
arXiv, 2020,
[17] MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning
Zhang, Xu
Wan, Xiaojun
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 190 - 202
[18] Unveiling Toxic Tendencies of Small Language Models in Unconstrained Generation Tasks
Chandra, Lakshay
Susan, Seba
Kumar, Dhruv
Kant, Krishan
10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, CONECCT 2024, 2024,
[19] Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models
Zhang, Jiang
Wu, Qiong
Xu, Yiming
Cao, Cheng
Du, Zheng
Psounis, Konstantinos
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21779 - 21787
[20] Probing Toxic Content in Large Pre-Trained Language Models
Ousidhoum, Nedjma
Zhao, Xinran
Fang, Tianqing
Song, Yangqiu
Yeung, Dit-Yan
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4262 - 4274

← 1 2 3 4 5 →