Unveiling Toxic Tendencies of Small Language Models in Unconstrained Generation Tasks

被引:0
|
作者
Chandra, Lakshay [1 ]
Susan, Seba [1 ]
Kumar, Dhruv [1 ]
Kant, Krishan [1 ]
机构
[1] Delhi Technol Univ, Dept Informat Technol, Delhi 110042, India
关键词
toxicity analysis; small language models; language generation; deep learning;
D O I
10.1109/CONECCT62155.2024.10677188
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The prevalence of toxicity online presents a significant challenge for platforms and publishers alike. Recent studies conducted on Small Language Models (SLMs) have identified the inherent toxicity that dwell in these models. In this work, we study and benchmark the extent to which SLMs can be prompted to generate toxic language. The following SLMs are evaluated for their toxicity levels: GPT-2 Large, Gemma-2B, Mistral-7B, Falcon-7B, and Llama 2-13B. We go a step closer to understanding the correlation between toxicity and the intrinsic parameters of the state-of-the-art SLMs. Next, we study the efficacy of a basic word-filtering approach to controlled text generation. Following this, we proceed to establish a mathematical ground for computing the weighted toxicity of continuations with respect to the toxicity of prompts by treating toxicity as a fuzzy metric. Finally, we extend our analysis to examine the unexpected toxicity levels of generated continuations when prompted with non-toxic inputs.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Unifying Vision-and-Language Tasks via Text Generation
    Cho, Jaemin
    Lei, Jie
    Tan, Hao
    Bansal, Mohit
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [42] Flexible use of evolutionary solutions for Natural Language Generation tasks
    Hervas, Raquel
    Gervas, Pablo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 187 - 194
  • [43] Agent-based solutions for natural language generation tasks
    Hervas, Raquel
    Gervas, Pablo
    CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE, 2006, 4177 : 103 - 112
  • [44] A Natural Bias for Language Generation Models
    Meister, Clara
    Stokowiec, Wojciech
    Pimentel, Tiago
    Yu, Lei
    Rimell, Laura
    Kuncoro, Adhiguna
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 243 - 255
  • [45] DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation
    Fang, Yingying
    Wu, Shuang
    Jin, Zihao
    Wang, Shiyi
    Xu, Caiwen
    Walsh, Simon
    Yang, Guang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X, 2024, 15010 : 208 - 218
  • [46] Reasoning with Large Language Models on Graph Tasks: The Influence of Temperature
    Wang, Yiming
    Zhang, Ziyang
    Chen, Hanwei
    Shen, Huayi
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 630 - 634
  • [47] Multimodal large language models for inclusive collaboration learning tasks
    Lewis, Armanda
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 202 - 210
  • [48] Towards an understanding of large language models in software engineering tasks
    Zheng, Zibin
    Ning, Kaiwen
    Zhong, Qingyuan
    Chen, Jiachi
    Chen, Wenqing
    Guo, Lianghong
    Wang, Weicheng
    Wang, Yanlin
    EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (02)
  • [49] Language Models for Multi-Lingual Tasks- A Survey
    Jafari, Amir Reza
    Heidary, Behnam
    Farahbakhsh, Reza
    Salehi, Mostafa
    Crespi, Noel
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (06) : 1458 - 1472
  • [50] TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models
    Ge, Huibin
    Zhao, Xiaohu
    Liu, Chuang
    Zeng, Yulong
    Liu, Qun
    Xiong, Deyi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,