Detoxifying Language Models with a Toxic Corpus

被引:0
|
作者
Park, Yoon A. [1 ,2 ]
Rudzicz, Frank [1 ,2 ,3 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst Artificial Intelligence, Toronto, ON, Canada
[3] Unity Hlth Toronto, Toronto, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing studies have investigated the tendency of autoregressive language models to generate contexts that exhibit undesired biases and toxicity. Various debiasing approaches have been proposed, which are primarily categorized into data-based and decoding-based. In our study, we investigate the ensemble of the two debiasing paradigms, proposing to use toxic corpus as an additional resource to reduce the toxicity. Our result shows that toxic corpus can indeed help to reduce the toxicity of the language generation process substantially, complementing the existing debiasing methods.
引用
收藏
页码:41 / 46
页数:6
相关论文
共 50 条
  • [41] A Multimedia Corpus of the Yiddish Language
    Arkhangel'skii, T. A.
    Sozinova, O. A.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2015, 49 (02) : 47 - 53
  • [42] Dispersion of words in a language corpus
    Hlavácova, J
    Rychly, P
    TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 321 - 324
  • [43] Information, language, corpus and linguistics
    Cermák, F
    TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 39 - 43
  • [44] A multimedia corpus of the Yiddish language
    T. A. Arkhangel’skii
    O. A. Sozinova
    Automatic Documentation and Mathematical Linguistics, 2015, 49 (2) : 47 - 53
  • [45] Corpus and specialty language dictionaries
    Pastor Enriquez, Veronica
    TERMINOLOGY, 2009, 15 (02): : 291 - 297
  • [46] Language modeling based on corpus
    Xu, Wei
    Yuan, Chunfa
    Huang, Changning
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 1997, 37 (03): : 71 - 75
  • [47] Mobile Foreign Language Learning Using Language Corpus
    Hao, Jinmei
    2017 4TH INTERNATIONAL CONFERENCE ON EDUCATION REFORM AND MANAGEMENT INNOVATION (ERMI 2017), 2017, 96 : 119 - 124
  • [48] The PaGeS Corpus, a Parallel Corpus of the Contemporary German and Spanish Language
    Doval, Irene
    REVISTA DE FILOLOGIA ALEMANA, 2018, 26 : 181 - 197
  • [49] Linguistic Corpus and Representativeness: The Usefulness of Data in Child Language Corpus
    Fernandez-Perez, Milagros
    RILCE-REVISTA DE FILOLOGIA HISPANICA, 2020, 36 (02): : 651 - 673
  • [50] An Arabic Sign Language Corpus for Instructional Language in School
    Almohimeed, Abdulaziz
    Wald, Mike
    Damper, Robert
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : A7 - A10