Visual Adversarial Examples Jailbreak Aligned Large Language Models

被引:0
|
作者
Princeton University, United States [1 ]
机构
来源
Proc. AAAI Conf. Artif. Intell. | / 19卷 / 21527-21536期
关键词
Computational linguistics;
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [31] Large Language Models for Code: Security Hardening and Adversarial Testing
    He, Jingxuan
    Vechev, Martin
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1865 - 1879
  • [32] On Evaluating Adversarial Robustness of Large Vision-Language Models
    Zhao, Yunqing
    Pang, Tianyu
    Du, Chao
    Yang, Xiao
    Li, Chongxuan
    Cheung, Ngai-Man
    Lin, Min
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] Large Language Models are Visual Reasoning Coordinators
    Chen, Liangyu
    Li, Bo
    Shen, Sheng
    Yang, Jingkang
    Li, Chunyuan
    Keutzer, Kurt
    Darrell, Trevor
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [34] Visual cognition in multimodal large language models
    Buschoff, Luca M. Schulze
    Akata, Elif
    Bethge, Matthias
    Schulz, Eric
    NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 96 - 106
  • [35] Learning to Retrieve In-Context Examples for Large Language Models
    Wang, Liang
    Yang, Nan
    Wei, Furu
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1752 - 1767
  • [36] Generation and Validation of Teaching Examples Based on Large Language Models
    He, Qing
    Wang, Yu
    Rao, Gaoqi
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 389 - 395
  • [37] BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger
    Chen, Yulin
    Li, Haoran
    Zheng, Zihao
    Song, Yangqiu
    arXiv,
  • [38] Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models via Diffusion Models
    Guo, Qi
    Pang, Shanmin
    Jia, Xiaojun
    Liu, Yang
    Guo, Qing
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1333 - 1348
  • [39] Using Adversarial Examples in Natural Language Processing
    Belohlavek, Petr
    Platek, Ondrej
    Zabokrtsky, Zdenek
    Straka, Milan
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3693 - 3700
  • [40] JAILBREAK ANTIDOTE: RUNTIME SAFETY-UTILITY BALANCE VIA SPARSE REPRESENTATION ADJUSTMENT IN LARGE LANGUAGE MODELS
    Shen, Guobin
    Zhao, Dongcheng
    Dong, Yiting
    He, Xiang
    Zeng, Yi
    arXiv,