Unveiling Toxic Tendencies of Small Language Models in Unconstrained Generation Tasks

被引:0
|
作者
Chandra, Lakshay [1 ]
Susan, Seba [1 ]
Kumar, Dhruv [1 ]
Kant, Krishan [1 ]
机构
[1] Delhi Technol Univ, Dept Informat Technol, Delhi 110042, India
关键词
toxicity analysis; small language models; language generation; deep learning;
D O I
10.1109/CONECCT62155.2024.10677188
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The prevalence of toxicity online presents a significant challenge for platforms and publishers alike. Recent studies conducted on Small Language Models (SLMs) have identified the inherent toxicity that dwell in these models. In this work, we study and benchmark the extent to which SLMs can be prompted to generate toxic language. The following SLMs are evaluated for their toxicity levels: GPT-2 Large, Gemma-2B, Mistral-7B, Falcon-7B, and Llama 2-13B. We go a step closer to understanding the correlation between toxicity and the intrinsic parameters of the state-of-the-art SLMs. Next, we study the efficacy of a basic word-filtering approach to controlled text generation. Following this, we proceed to establish a mathematical ground for computing the weighted toxicity of continuations with respect to the toxicity of prompts by treating toxicity as a fuzzy metric. Finally, we extend our analysis to examine the unexpected toxicity levels of generated continuations when prompted with non-toxic inputs.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
    Mendelsohn, Julia
    Le Bras, Ronan
    Choi, Yejin
    Sap, Maarten
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15162 - 15180
  • [32] Unsupervised multi-sense language models for natural language processing tasks
    Roh, Jihyeon
    Park, Sungjin
    Kim, Bo-Kyeong
    Oh, Sang-Hoon
    Lee, Soo-Young
    NEURAL NETWORKS, 2021, 142 : 397 - 409
  • [33] Unveiling the potential of large language models in generating semantic and cross-language clones
    Roy, Palash R.
    Alam, Ajmain I.
    Al-omari, Farouq
    Roy, Banani
    Roy, Chanchal K.
    Schneider, Kevin A.
    2023 IEEE 17TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES, IWSC 2023, 2023, : 22 - 28
  • [34] Visually Grounded Language Learning: a Review of Language Games, Datasets, Tasks, and Models
    Suglia, Alessandro
    Konstas, Ioannis
    Lemon, Oliver
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 79 : 173 - 239
  • [35] Visually Grounded Language Learning: a Review of Language Games, Datasets, Tasks, and Models
    Suglia A.
    Konstas I.
    Lemon O.
    Journal of Artificial Intelligence Research, 2024, 79 : 173 - 239
  • [36] Are genomic language models all you need? Exploring genomic language models on protein downstream tasks
    Boshar, Sam
    Trop, Evan
    de Almeida, Bernardo P.
    Copoiu, Liviu
    Pierrot, Thomas
    BIOINFORMATICS, 2024, 40 (09)
  • [37] REALTOXICITYPROMPTS: Evaluating Neural Toxic Degeneration in Language Models
    Gehman, Samuel
    Gururangan, Suchin
    Sap, Maarten
    Choi, Yejin
    Smith, Noah A.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [38] Offline recognition of unconstrained handwritten texts using HMMs and statistical language models
    Vinciarelli, A
    Bengio, S
    Bunke, H
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (06) : 709 - 720
  • [39] Efficient Detection of Toxic Prompts in Large Language Models
    Liu, Yi
    Yu, Junzhe
    Sun, Huijia
    Shi, Ling
    Deng, Gelei
    Chen, Yuqi
    Liu, Yang
    arXiv, 1600,
  • [40] A Simple Method to Improve the Performance of Small Pre-trained Language Models on Few-shot Tasks
    Zhang, Yanan
    Wu, Chaofan
    Shi, Rongkun
    Zhang, Yiying
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1572 - 1577