Visual Adversarial Examples Jailbreak Aligned Large Language Models

被引:0
|
作者
Princeton University, United States [1 ]
机构
来源
Proc. AAAI Conf. Artif. Intell. | / 19卷 / 21527-21536期
关键词
Computational linguistics;
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [21] Lion: Adversarial Distillation of Proprietary Large Language Models
    Jiang, Yuxin
    Chan, Chunkit
    Chen, Mingyang
    Wang, Wei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3134 - 3154
  • [22] Adversarial examples for generative models
    Kos, Jernej
    Fischer, Ian
    Song, Dawn
    2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, : 36 - 42
  • [23] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
    Zhao, Wei
    Li, Zhe
    Li, Yige
    Zhang, Ye
    Sun, Jun
    EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024, 2024, : 5094 - 5109
  • [24] FUZZLLM: A NOVEL AND UNIVERSAL FUZZING FRAMEWORK FOR PROACTIVELY DISCOVERING JAILBREAK VULNERABILITIES IN LARGE LANGUAGE MODELS
    Yao, Dongyu
    Zhang, Jianshu
    Harris, Ian G.
    Carlsson, Marcel
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4485 - 4489
  • [25] Adversarial Examples for Models of Code
    Yefet, Noam
    Alon, Uri
    Yahav, Eran
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2020, 4 (04):
  • [26] Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
    Zhao, Wei
    Li, Zhe
    Li, Yige
    Zhang, Ye
    Sun, Jun
    arXiv,
  • [27] Generating Natural Language Adversarial Examples
    Alzantot, Moustafa
    Sharma, Yash
    Elgohary, Ahmed
    Ho, Bo-Jhang
    Srivastava, Mani B.
    Chang, Kai-Wei
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2890 - 2896
  • [28] Reevaluating Adversarial Examples in Natural Language
    Morris, John X.
    Lifland, Eli
    Lanchantin, Jack
    Ji, Yangfeng
    Qi, Yanjun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3829 - 3839
  • [29] Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples
    Cisse, Moustapha
    Adi, Yossi
    Neverova, Natalia
    Keshet, Joseph
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [30] Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning
    Chen, Hongge
    Zhang, Huan
    Chen, Pin-Yu
    Yi, Jinfeng
    Hsieh, Cho-Jui
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2587 - 2597