Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation

被引:0
|
作者
Cantini, Riccardo [1 ]
Cosenza, Giada [1 ]
Orsino, Alessio [1 ]
Talia, Domenico [1 ]
机构
[1] Univ Calabria, Arcavacata Di Rende, CS, Italy
来源
关键词
Large Language Models; Bias; Stereotype; Jailbreak; Adversarial Robustness; Sustainable Artificial Intelligence;
D O I
10.1007/978-3-031-78977-9_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities. However, these models are inherently prone to various biases stemming from their training data. These include selection, linguistic, and confirmation biases, along with common stereotypes related to gender, ethnicity, sexual orientation, religion, socioeconomic status, disability, and age. This study explores the presence of these biases within the responses given by the most recent LLMs, analyzing the impact on their fairness and reliability. We also investigate how known prompt engineering techniques can be exploited to effectively reveal hidden biases of LLMs, testing their adversarial robustness against jailbreak prompts specially crafted for bias elicitation. Extensive experiments are conducted using the most widespread LLMs at different scales, confirming that LLMs can still be manipulated to produce biased or inappropriate responses, despite their advanced capabilities and sophisticated alignment processes. Our findings underscore the importance of enhancing mitigation techniques to address these safety issues, toward a more sustainable and inclusive artificial intelligence.
引用
收藏
页码:52 / 68
页数:17
相关论文
共 50 条
  • [21] A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily
    National Key Laboratory for Novel Software Technology, Nanjing University, China
    不详
    arXiv,
  • [22] Pipelines for Social Bias Testing of Large Language Models
    Nozza, Debora
    Bianchi, Federico
    Hovy, Dirk
    PROCEEDINGS OF WORKSHOP ON CHALLENGES & PERSPECTIVES IN CREATING LARGE LANGUAGE MODELS (BIGSCIENCE EPISODE #5), 2022, : 68 - 74
  • [23] A Causal View of Entity Bias in (Large) Language Models
    Wang, Fei
    Mo, Wenjie
    Wang, Yiwei
    Zhou, Wenxuan
    Chen, Muhao
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15173 - 15184
  • [24] Cultural bias and cultural alignment of large language models
    Tao, Yan
    Viberg, Olga
    Baker, Ryan S.
    Kizilcec, Rene F.
    PNAS NEXUS, 2024, 3 (09):
  • [25] Locating and Mitigating Gender Bias in Large Language Models
    Cai, Yuchen
    Cao, Ding
    Guo, Rongxi
    Wen, Yaqin
    Liu, Guiquan
    Chen, Enhong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 471 - 482
  • [26] Do Large Language Models Bias Human Evaluations?
    O'Leary, Daniel E.
    IEEE INTELLIGENT SYSTEMS, 2024, 39 (04) : 83 - 87
  • [27] Terahertz Focusing and Polarization Control in Large-Area Bias-Free Semiconductor Emitters
    Joanna L. Carthy
    Paul C. Gow
    Sam A. Berry
    Ben Mills
    Vasilis Apostolopoulos
    Journal of Infrared, Millimeter, and Terahertz Waves, 2018, 39 : 223 - 235
  • [28] Terahertz Focusing and Polarization Control in Large-Area Bias-Free Semiconductor Emitters
    Carthy, Joanna L.
    Gow, Paul C.
    Berry, Sam A.
    Mills, Ben
    Apostolopoulos, Vasilis
    JOURNAL OF INFRARED MILLIMETER AND TERAHERTZ WAVES, 2018, 39 (03) : 223 - 235
  • [29] Learning Bias-Free Representation for Large-Scale Person Re-Identification
    Xu, Jiaming
    Zhu, En
    IEEE ACCESS, 2019, 7 : 143331 - 143346
  • [30] Statistical approach for bias-free identification of a parallel manipulator affected by large measurement noise
    Abdellatif, Houssem
    Heimann, Bodo
    Grotjahn, Martin
    2005 44TH IEEE CONFERENCE ON DECISION AND CONTROL & EUROPEAN CONTROL CONFERENCE, VOLS 1-8, 2005, : 3357 - 3362