Visual Adversarial Examples Jailbreak Aligned Large Language Models

被引:0
|
作者
Princeton University, United States [1 ]
机构
来源
Proc. AAAI Conf. Artif. Intell. | / 19卷 / 21527-21536期
关键词
Computational linguistics;
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [41] A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily
    National Key Laboratory for Novel Software Technology, Nanjing University, China
    不详
    arXiv,
  • [42] Hidden You Malicious Goal Into Benigh Narratives: Jailbreak Large Language Models through Logic Chain Injection
    The Pennsylvania State University, United States
    不详
    arXiv,
  • [43] MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
    Deng, Gelei
    Liu, Yi
    Li, Yuekang
    Wang, Kailong
    Zhang, Ying
    Li, Zefeng
    Wang, Haoyu
    Zhang, Tianwei
    Liu, Yang
    arXiv, 2023,
  • [44] CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples
    Yu, Honggang
    Yang, Kaichen
    Zhang, Teng
    Tsai, Yun-Yun
    Ho, Tsung-Yi
    Jin, Yier
    27TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2020), 2020,
  • [45] Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
    Schwinn, Leo
    Dobre, David
    Guennemann, Stephan
    Gidel, Gauthier
    PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER: FAILURE MODES IN THE AGE OF FOUNDATION MODELS AT NEURIPS 2023 WORKSHOPS, 2023, 239 : 103 - 117
  • [46] Analyzing the Use of Large Language Models for Content Moderation with ChatGPT Examples
    Franco, Mirko
    Gaggi, Ombretta
    Palazzi, Claudio E.
    PROCEEDINGS OF THE 2023 WORKSHOP ON OPEN CHALLENGES IN ONLINE SOCIAL NETWORKS, OASIS 2023/ 34TH ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA, HT 2023, 2023, : 1 - 8
  • [47] Evolving Interpretable Visual Classifiers with Large Language Models
    Chiquier, Mia
    Mall, Utkarsh
    Vondrick, Carl
    COMPUTER VISION - ECCV 2024, PT LXIV, 2025, 15122 : 183 - 201
  • [48] Generating transferable adversarial examples based on perceptually-aligned perturbation
    Chen, Hongqiao
    Lu, Keda
    Wang, Xianmin
    Li, Jin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (11) : 3295 - 3307
  • [49] Generating transferable adversarial examples based on perceptually-aligned perturbation
    Hongqiao Chen
    Keda Lu
    Xianmin Wang
    Jin Li
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 3295 - 3307
  • [50] Generating adversarial examples with collaborative generative models
    Xu, Lei
    Zhai, Junhai
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2024, 23 (02) : 1077 - 1091