Unveiling Toxic Tendencies of Small Language Models in Unconstrained Generation Tasks

被引：0

作者：

Chandra, Lakshay ^{[1
]}

Susan, Seba ^{[1
]}

Kumar, Dhruv ^{[1
]}

Kant, Krishan ^{[1
]}

机构：

[1] Delhi Technol Univ, Dept Informat Technol, Delhi 110042, India

来源：

10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, CONECCT 2024 | 2024年

关键词：

toxicity analysis; small language models; language generation; deep learning;

D O I：

10.1109/CONECCT62155.2024.10677188

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The prevalence of toxicity online presents a significant challenge for platforms and publishers alike. Recent studies conducted on Small Language Models (SLMs) have identified the inherent toxicity that dwell in these models. In this work, we study and benchmark the extent to which SLMs can be prompted to generate toxic language. The following SLMs are evaluated for their toxicity levels: GPT-2 Large, Gemma-2B, Mistral-7B, Falcon-7B, and Llama 2-13B. We go a step closer to understanding the correlation between toxicity and the intrinsic parameters of the state-of-the-art SLMs. Next, we study the efficacy of a basic word-filtering approach to controlled text generation. Following this, we proceed to establish a mathematical ground for computing the weighted toxicity of continuations with respect to the toxicity of prompts by treating toxicity as a fuzzy metric. Finally, we extend our analysis to examine the unexpected toxicity levels of generated continuations when prompted with non-toxic inputs.

引用

页数：6

共 50 条

[1] Evaluating Large Language Models on Controlled Generation Tasks
Sun, Jiao
Tian, Yufei
Zhou, Wangchunshu
Xu, Nan
Hu, Qian
Gupta, Rahul
Wieting, John
Peng, Nanyun
Ma, Xuezhe
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3155 - 3168
[2] (sic) UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
Liang, Xun
Song, Shichao
Niu, Simin
Li, Zhiyu
Xiong, Feiyu
Tang, Bo
Wang, Yezhaohui
He, Dawei
Cheng, Peng
Wang, Zhonghao
Deng, Haiying
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5266 - 5293
[3] Question Generation Capabilities of "Small" Large Language Models
Berger, Joshua
Koss, Jonathan
Stamatakis, Markos
Hoppe, Anett
Ewerth, Ralph
Wartenal, Christian
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 183 - 194
[4] Enhancing Small Language Models for Graph Tasks Through Graph Encoder Integration
Oh, Dongryul
Kang, Sujin
Kim, Heejin
Oh, Dongsuk
APPLIED SCIENCES-BASEL, 2025, 15 (05):
[5] Detoxifying Language Models with a Toxic Corpus
Park, Yoon A.
Rudzicz, Frank
PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 41 - 46
[6] Knowledge Editing of Large Language Models Unconstrained by Word Order
Ishigaki, Ryoma
Suzuki, Jundai
Shuzo, Masaki
Maeda, Eisaku
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 177 - 187
[7] Invited: Automated Code generation for Information Technology Tasks in YAML through Large Language Models
Pujar, Saurabh
Buratti, Luca
Guo, Xiaojie
Dupuis, Nicolas
Lewis, Burn
Suneja, Sahil
Sood, Atin
Nalawade, Ganesh
Jones, Matt
Morari, Alessandro
Puri, Ruchir
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[8] Performance Evaluation and Application Potential of Small Large Language Models in Complex Sentiment Analysis Tasks
Yang, Yunchu
Li, Jiaxuan
Guo, Jielong
Pang, Patrick Cheong-Iao
Wang, Yapeng
Yang, Xu
Im, Sio-Kei
IEEE ACCESS, 2025, 13 : 49007 - 49017
[9] Unlocking language barriers: Assessing pre-trained large language models across multilingual tasks and unveiling the black box with Explainable Artificial Intelligence
Kastrati, Muhamet
Imran, Ali Shariq
Hashmi, Ehtesham
Kastrati, Zenun
Daudpota, Sher Muhammad
Biba, Marenglen
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 149
[10] Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks
Kang, Minki
Lee, Seanie
Baek, Jinheon
Kawaguchi, Kenji
Hwang, Sung Ju
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →