Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models via Diffusion Models

被引:0
|
作者
Guo, Qi [1 ,2 ]
Pang, Shanmin [1 ]
Jia, Xiaojun [3 ]
Liu, Yang [3 ]
Guo, Qing [2 ,4 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[2] Agcy Sci Technol & Res, Ctr Frontier AI Res, Singapore 138632, Singapore
[3] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore 639798, Singapore
[4] Agcy Sci Technol & Res, Inst High Performance Comp, Singapore 138632, Singapore
基金
中国国家自然科学基金; 新加坡国家研究基金会;
关键词
Adversarial attack; visual language models; diffusion models; score matching;
D O I
10.1109/TIFS.2024.3518072
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Adversarial attacks, particularly targeted transfer-based attacks, can be used to assess the adversarial robustness of large visual-language models (VLMs), allowing for a more thorough examination of potential security flaws before deployment. However, previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure. Furthermore, due to the unnaturalness of adversarial semantics, the generated adversarial examples have low transferability. These issues limit the utility of existing methods for assessing robustness. To address these issues, we propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples via score matching. Specifically, AdvDiffVLM uses Adaptive Ensemble Gradient Estimation (AEGE) to modify the score during the diffusion model's reverse generation process, ensuring that the produced adversarial examples have natural adversarial targeted semantics, which improves their transferability. Simultaneously, to improve the quality of adversarial examples, we use the GradCAM-guided Mask Generation (GCMG) to disperse adversarial semantics throughout the image rather than concentrating them in a single area. Finally, AdvDiffVLM embeds more target semantics into adversarial examples after multiple iterations. Experimental results show that our method generates adversarial examples 5x to 10x faster than state-of-the-art (SOTA) transfer-based adversarial attacks while maintaining higher quality adversarial examples. Furthermore, compared to previous transfer-based adversarial attacks, the adversarial examples generated by our method have better transferability. Notably, AdvDiffVLM can successfully attack a variety of commercial VLMs in a black-box environment, including GPT-4V. The code is available at https://github.com/gq-max/AdvDiffVLM.
引用
收藏
页码:1333 / 1348
页数:16
相关论文
共 50 条
  • [1] Adversarial Prompt Tuning for Vision-Language Models
    Zhang, Jiaming
    Ma, Xingjun
    Wang, Xin
    Qiu, Lingyu
    Wang, Jiaqi
    Jiang, Yu-Gang
    Sang, Jitao
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 56 - 72
  • [2] On Evaluating Adversarial Robustness of Large Vision-Language Models
    Zhao, Yunqing
    Pang, Tianyu
    Du, Chao
    Yang, Xiao
    Li, Chongxuan
    Cheung, Ngai-Man
    Lin, Min
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Boosting adversarial transferability in vision-language models via multimodal feature heterogeneity
    Chen, Long
    Chen, Yuling
    Ouyang, Zhi
    Dou, Hui
    Zhang, Yangwen
    Sang, Haiwei
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [4] Transferable Multimodal Attack on Vision-Language Pre-training Models
    Wang, Haodi
    Dong, Kai
    Zhu, Zhilei
    Qin, Haotong
    Liu, Aishan
    Fang, Xiaolin
    Wang, Jiakai
    Liu, Xianglong
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 1722 - 1740
  • [5] MixPrompt: Enhancing Generalizability and Adversarial Robustness for Vision-Language Models via Prompt Fusion
    Fan, Hao
    Ma, Zhaoyang
    Li, Yong
    Tian, Rui
    Chen, Yunli
    Gao, Chenlong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024, 2024, 14870 : 328 - 339
  • [6] Universal Adversarial Perturbations for Vision-Language Pre-trained Models
    Zhang, Peng-Fei
    Huang, Zi
    Bai, Guangdong
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 862 - 871
  • [7] VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
    Yin, Ziyi
    Ye, Muchao
    Zhang, Tianrong
    Du, Tianyu
    Zhu, Jinguo
    Liu, Han
    Chen, Jinghui
    Wang, Ting
    Ma, Fenglong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Towards Adversarial Attack on Vision-Language Pre-training Models
    Zhang, Jiaming
    Yi, Qi
    Sang, Jitao
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5005 - 5013
  • [9] FlexAttention for Efficient High-Resolution Vision-Language Models
    Li, Junyan
    Chen, Delin
    Cai, Tianle
    Chen, Peihao
    Hong, Yining
    Chen, Zhenfang
    Shen, Yikang
    Gan, Chuang
    COMPUTER VISION - ECCV 2024, PT XXV, 2025, 15083 : 286 - 302
  • [10] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644