Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation

被引:0
|
作者
Cantini, Riccardo [1 ]
Cosenza, Giada [1 ]
Orsino, Alessio [1 ]
Talia, Domenico [1 ]
机构
[1] Univ Calabria, Arcavacata Di Rende, CS, Italy
来源
关键词
Large Language Models; Bias; Stereotype; Jailbreak; Adversarial Robustness; Sustainable Artificial Intelligence;
D O I
10.1007/978-3-031-78977-9_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities. However, these models are inherently prone to various biases stemming from their training data. These include selection, linguistic, and confirmation biases, along with common stereotypes related to gender, ethnicity, sexual orientation, religion, socioeconomic status, disability, and age. This study explores the presence of these biases within the responses given by the most recent LLMs, analyzing the impact on their fairness and reliability. We also investigate how known prompt engineering techniques can be exploited to effectively reveal hidden biases of LLMs, testing their adversarial robustness against jailbreak prompts specially crafted for bias elicitation. Extensive experiments are conducted using the most widespread LLMs at different scales, confirming that LLMs can still be manipulated to produce biased or inappropriate responses, despite their advanced capabilities and sophisticated alignment processes. Our findings underscore the importance of enhancing mitigation techniques to address these safety issues, toward a more sustainable and inclusive artificial intelligence.
引用
收藏
页码:52 / 68
页数:17
相关论文
共 50 条
  • [1] Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
    University of Calabria, Italy
    arXiv, 1600,
  • [2] Assessing political bias in large language models
    Rettenberger, Luca
    Reischl, Markus
    Schutera, Mark
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2025, 8 (02):
  • [3] TERMS OF EQUALITY - A GUIDE TO BIAS-FREE LANGUAGE
    PICKENS, JE
    PERSONNEL JOURNAL, 1985, 64 (08) : 24 - &
  • [4] Visual Adversarial Examples Jailbreak Aligned Large Language Models
    Princeton University, United States
    Proc. AAAI Conf. Artif. Intell., 19 (21527-21536):
  • [5] Visual Adversarial Examples Jailbreak Aligned Large Language Models
    Qi, Xiangyu
    Huang, Kaixuan
    Panda, Ashwinee
    Henderson, Peter
    Wang, Mengdi
    Mittal, Prateek
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21527 - 21536
  • [6] Bias-free language in research as a tool to prevent ageism
    Aquino, Marcos Paulo Miranda de
    Cristina, Elisangela
    Hernandes, Ramos
    Alfonsi, Maynara do Amaral
    Perracini, Monica
    BRAZILIAN JOURNAL OF PHYSICAL THERAPY, 2025, 29 (03)
  • [7] Don’t Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
    Yu, Zhiyuan
    Liu, Xiaogeng
    Liang, Shunning
    Cameron, Zach
    Xiao, Chaowei
    Zhang, Ning
    arXiv,
  • [8] Assessing the Risk of Bias in Randomized Clinical Trials With Large Language Models
    Lai, Honghao
    Ge, Long
    Sun, Mingyao
    Pan, Bei
    Huang, Jiajie
    Hou, Liangying
    Yang, Qiuyu
    Liu, Jiayi
    Liu, Jianing
    Ye, Ziying
    Xia, Danni
    Zhao, Weilong
    Wang, Xiaoman
    Liu, Ming
    Talukdar, Jhalok Ronjan
    Tian, Jinhui
    Yang, Kehu
    Estill, Janne
    JAMA NETWORK OPEN, 2024, 7 (05) : E2412687
  • [10] Bias-Free Language: LGBTQ+Clients and the New APA Manual
    Noble, Nicole
    Bradley, Loretta
    Hendricks, Bret
    JOURNAL OF LGBTQ ISSUES IN COUNSELING, 2021, 15 (01): : 128 - 139