Jailbreak Attack for Large Language Models: A Survey

被引:0
|
作者
Li, Nan [1 ]
Ding, Yidong [1 ]
Jiang, Haoyu [1 ]
Niu, Jiafei [1 ]
Yi, Ping [1 ]
机构
[1] School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai,200240, China
基金
中国国家自然科学基金;
关键词
Computational linguistics - Multi agent systems - Natural language processing systems - Network security - Speech processing;
D O I
10.7544/issn1000-1239.202330962
中图分类号
学科分类号
摘要
In recent years, large language models (LLMs) have been widely applied in a range of downstream tasks and have demonstrated remarkable text understanding, generation, and reasoning capabilities in various fields. However, jailbreak attacks are emerging as a new threat to LLMs. Jailbreak attacks can bypass the security mechanisms of LLMs, weaken the influence of safety alignment, and induce harmful outputs from aligned LLMs. Issues such as abuse, hijacking and leakage caused by jailbreak attacks have posed serious threats to both dialogue systems and applications based on LLMs. We present a systematic review of jailbreak attacks in recent years, categorize these attacks into three distinct types based on their underlying mechanism: manually designed attacks, LLM-generated attacks, and optimization-based attacks. We provide a comprehensive summary of the core principles, implementation methods, and research findings derived from relevant studies, thoroughly examine the evolutionary trajectory of jailbreak attacks on LLMs, offering a valuable reference for future research endeavors. Moreover, a concise overview of the existing security measures is offered. It introduces pertinent techniques from the perspectives of internal defense and external defense, which aim to mitigate jailbreak attacks and enhance the content security of LLM generation. Finally, we delve into the existing challenges and frontier directions in the field of jailbreak attacks on LLMs, examine the potential of multimodal approaches, model editing, and multi-agent methodologies in tackling jailbreak attacks, providing valuable insights and research prospects to further advance the field of LLM security. © 2024 Science Press. All rights reserved.
引用
收藏
页码:1156 / 1181
相关论文
共 50 条
  • [31] MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
    Deng, Gelei
    Liu, Yi
    Li, Yuekang
    Wang, Kailong
    Zhang, Ying
    Li, Zefeng
    Wang, Haoyu
    Zhang, Tianwei
    Liu, Yang
    [J]. arXiv, 2023,
  • [32] Large language models for generative information extraction: a survey
    Xu, Derong
    Chen, Wei
    Peng, Wenjun
    Zhang, Chao
    Xu, Tong
    Zhao, Xiangyu
    Wu, Xian
    Zheng, Yefeng
    Wang, Yang
    Chen, Enhong
    [J]. Frontiers of Computer Science, 2024, 18 (06)
  • [33] Examining the Feasibility of Large Language Models as Survey Respondents
    Kitadai, Ayato
    Ogawa, Kazuhito
    Nishino, Nariaki
    [J]. Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, 2024, : 3858 - 3864
  • [34] A survey on integration of large language models with intelligent robots
    Kim, Yeseung
    Kim, Dohyun
    Choi, Jieun
    Park, Jisang
    Oh, Nayoung
    Park, Daehyung
    [J]. INTELLIGENT SERVICE ROBOTICS, 2024, 17 (05) : 1091 - 1107
  • [35] A Survey on Multimodal Large Language Models for Autonomous Driving
    Cui, Can
    Ma, Yunsheng
    Cao, Xu
    Ye, Wenqian
    Zhou, Yang
    Liang, Kaizhao
    Chen, Jintai
    Lu, Juanwu
    Yang, Zichong
    Liao, Kuei-Da
    Gao, Tianren
    Li, Erlong
    Tang, Kun
    Cao, Zhipeng
    Zhou, Tong
    Liu, Ao
    Yan, Xinrui
    Mei, Shuqi
    Cao, Jianguo
    Wang, Ziran
    Zheng, Chao
    [J]. 2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 958 - 979
  • [36] A survey of large language models for cyber threat detection☆
    Chen, Yiren
    Cui, Mengjiao
    Wang, Ding
    Cao, Yiyang
    Yang, Peian
    Jiang, Bo
    Lu, Zhigang
    Liu, Baoxu
    [J]. COMPUTERS & SECURITY, 2024, 145
  • [37] A Survey of Text Watermarking in the Era of Large Language Models
    Liu, Aiwei
    Pan, Leyi
    Lu, Yijian
    Li, Jingjing
    Hu, Xuming
    Zhang, Xi
    Wen, Lijie
    King, Irwin
    Xiong, Hui
    Yu, Philip
    [J]. ACM Computing Surveys, 2024, 57 (02)
  • [38] A Survey of Testing Techniques Based on Large Language Models
    Qi, Fei
    Hou, Yingnan
    Lin, Ning
    Bao, Shanshan
    Xu, Nuo
    [J]. PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024, 2024, : 280 - 284
  • [39] A Survey of Clinicians' Views of the Utility of Large Language Models
    Spotnitz, Matthew
    Idnay, Betina
    Gordon, Emily R.
    Shyu, Rebecca
    Zhang, Gongbo
    Liu, Cong
    Cimino, James J.
    Weng, Chunhua
    [J]. APPLIED CLINICAL INFORMATICS, 2024, 15 (02): : 306 - 312
  • [40] Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
    Wei, Zeming
    Wang, Yifei
    Li, Ang
    Mo, Yichuan
    Wang, Yisen
    [J]. arXiv, 2023,