Unveiling Toxic Tendencies of Small Language Models in Unconstrained Generation Tasks

被引:0
|
作者
Chandra, Lakshay [1 ]
Susan, Seba [1 ]
Kumar, Dhruv [1 ]
Kant, Krishan [1 ]
机构
[1] Delhi Technol Univ, Dept Informat Technol, Delhi 110042, India
关键词
toxicity analysis; small language models; language generation; deep learning;
D O I
10.1109/CONECCT62155.2024.10677188
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The prevalence of toxicity online presents a significant challenge for platforms and publishers alike. Recent studies conducted on Small Language Models (SLMs) have identified the inherent toxicity that dwell in these models. In this work, we study and benchmark the extent to which SLMs can be prompted to generate toxic language. The following SLMs are evaluated for their toxicity levels: GPT-2 Large, Gemma-2B, Mistral-7B, Falcon-7B, and Llama 2-13B. We go a step closer to understanding the correlation between toxicity and the intrinsic parameters of the state-of-the-art SLMs. Next, we study the efficacy of a basic word-filtering approach to controlled text generation. Following this, we proceed to establish a mathematical ground for computing the weighted toxicity of continuations with respect to the toxicity of prompts by treating toxicity as a fuzzy metric. Finally, we extend our analysis to examine the unexpected toxicity levels of generated continuations when prompted with non-toxic inputs.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Evaluating Large Language Models on Controlled Generation Tasks
    Sun, Jiao
    Tian, Yufei
    Zhou, Wangchunshu
    Xu, Nan
    Hu, Qian
    Gupta, Rahul
    Wieting, John
    Peng, Nanyun
    Ma, Xuezhe
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3155 - 3168
  • [2] (sic) UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
    Liang, Xun
    Song, Shichao
    Niu, Simin
    Li, Zhiyu
    Xiong, Feiyu
    Tang, Bo
    Wang, Yezhaohui
    He, Dawei
    Cheng, Peng
    Wang, Zhonghao
    Deng, Haiying
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5266 - 5293
  • [3] Question Generation Capabilities of "Small" Large Language Models
    Berger, Joshua
    Koss, Jonathan
    Stamatakis, Markos
    Hoppe, Anett
    Ewerth, Ralph
    Wartenal, Christian
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 183 - 194
  • [4] Enhancing Small Language Models for Graph Tasks Through Graph Encoder Integration
    Oh, Dongryul
    Kang, Sujin
    Kim, Heejin
    Oh, Dongsuk
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [5] Detoxifying Language Models with a Toxic Corpus
    Park, Yoon A.
    Rudzicz, Frank
    PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 41 - 46
  • [6] Knowledge Editing of Large Language Models Unconstrained by Word Order
    Ishigaki, Ryoma
    Suzuki, Jundai
    Shuzo, Masaki
    Maeda, Eisaku
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 177 - 187
  • [7] Invited: Automated Code generation for Information Technology Tasks in YAML through Large Language Models
    Pujar, Saurabh
    Buratti, Luca
    Guo, Xiaojie
    Dupuis, Nicolas
    Lewis, Burn
    Suneja, Sahil
    Sood, Atin
    Nalawade, Ganesh
    Jones, Matt
    Morari, Alessandro
    Puri, Ruchir
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [8] Performance Evaluation and Application Potential of Small Large Language Models in Complex Sentiment Analysis Tasks
    Yang, Yunchu
    Li, Jiaxuan
    Guo, Jielong
    Pang, Patrick Cheong-Iao
    Wang, Yapeng
    Yang, Xu
    Im, Sio-Kei
    IEEE ACCESS, 2025, 13 : 49007 - 49017
  • [9] Unlocking language barriers: Assessing pre-trained large language models across multilingual tasks and unveiling the black box with Explainable Artificial Intelligence
    Kastrati, Muhamet
    Imran, Ali Shariq
    Hashmi, Ehtesham
    Kastrati, Zenun
    Daudpota, Sher Muhammad
    Biba, Marenglen
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 149
  • [10] Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks
    Kang, Minki
    Lee, Seanie
    Baek, Jinheon
    Kawaguchi, Kenji
    Hwang, Sung Ju
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,