Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models via Diffusion Models

被引:0
|
作者
Guo, Qi [1 ,2 ]
Pang, Shanmin [1 ]
Jia, Xiaojun [3 ]
Liu, Yang [3 ]
Guo, Qing [2 ,4 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[2] Agcy Sci Technol & Res, Ctr Frontier AI Res, Singapore 138632, Singapore
[3] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore 639798, Singapore
[4] Agcy Sci Technol & Res, Inst High Performance Comp, Singapore 138632, Singapore
基金
中国国家自然科学基金; 新加坡国家研究基金会;
关键词
Adversarial attack; visual language models; diffusion models; score matching;
D O I
10.1109/TIFS.2024.3518072
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Adversarial attacks, particularly targeted transfer-based attacks, can be used to assess the adversarial robustness of large visual-language models (VLMs), allowing for a more thorough examination of potential security flaws before deployment. However, previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure. Furthermore, due to the unnaturalness of adversarial semantics, the generated adversarial examples have low transferability. These issues limit the utility of existing methods for assessing robustness. To address these issues, we propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples via score matching. Specifically, AdvDiffVLM uses Adaptive Ensemble Gradient Estimation (AEGE) to modify the score during the diffusion model's reverse generation process, ensuring that the produced adversarial examples have natural adversarial targeted semantics, which improves their transferability. Simultaneously, to improve the quality of adversarial examples, we use the GradCAM-guided Mask Generation (GCMG) to disperse adversarial semantics throughout the image rather than concentrating them in a single area. Finally, AdvDiffVLM embeds more target semantics into adversarial examples after multiple iterations. Experimental results show that our method generates adversarial examples 5x to 10x faster than state-of-the-art (SOTA) transfer-based adversarial attacks while maintaining higher quality adversarial examples. Furthermore, compared to previous transfer-based adversarial attacks, the adversarial examples generated by our method have better transferability. Notably, AdvDiffVLM can successfully attack a variety of commercial VLMs in a black-box environment, including GPT-4V. The code is available at https://github.com/gq-max/AdvDiffVLM.
引用
收藏
页码:1333 / 1348
页数:16
相关论文
共 50 条
  • [21] Diffusion Models for Imperceptible and Transferable Adversarial Attack
    Chen, Jianqi
    Chen, Hao
    Chen, Keyan
    Zhang, Yilan
    Zou, Zhengxia
    Shi, Zhenwei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (02) : 961 - 977
  • [22] Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
    Luo, Gen
    Zhou, Yiyi
    Ren, Tianhe
    Chen, Shengxin
    Sun, Xiaoshuai
    Ji, Rongrong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] Collaboration between clinicians and vision-language models in radiology report generation
    Tanno, Ryutaro
    Barrett, David G. T.
    Sellergren, Andrew
    Ghaisas, Sumedh
    Dathathri, Sumanth
    See, Abigail
    Welbl, Johannes
    Lau, Charles
    Tu, Tao
    Azizi, Shekoofeh
    Singhal, Karan
    Schaekermann, Mike
    May, Rhys
    Lee, Roy
    Man, SiWai
    Mahdavi, Sara
    Ahmed, Zahra
    Matias, Yossi
    Barral, Joelle
    Eslami, S. M. Ali
    Belgrave, Danielle
    Liu, Yun
    Kalidindi, Sreenivasa Raju
    Shetty, Shravya
    Natarajan, Vivek
    Kohli, Pushmeet
    Huang, Po-Sen
    Karthikesalingam, Alan
    Ktena, Ira
    NATURE MEDICINE, 2025, 31 (02) : 599 - 608
  • [24] HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
    Ning, Shan
    Qiu, Longtian
    Liu, Yongfei
    He, Xuming
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23507 - 23517
  • [25] Debiasing vision-language models for vision tasks: a survey
    Zhu, Beier
    Zhang, Hanwang
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (01)
  • [26] Correctable Landmark Discovery via Large Models for Vision-Language Navigation
    Lin, Bingqian
    Nie, Yunshuang
    Wei, Ziming
    Zhu, Yi
    Xu, Hang
    Ma, Shikui
    Liu, Jianzhuang
    Liang, Xiaodan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8534 - 8548
  • [27] Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
    Ye, Shuquan
    Xie, Yujia
    Chen, Dongdong
    Xu, Yichong
    Yuan, Lu
    Zhu, Chenguang
    Liao, Jing
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2634 - 2645
  • [28] Transferable adversarial examples can efficiently fool topic models
    Wang, Zhen
    Zheng, Yitao
    Zhu, Hai
    Yang, Chang
    Chen, Tianyi
    COMPUTERS & SECURITY, 2022, 118
  • [29] Unsupervised Prototype Adapter for Vision-Language Models
    Zhang, Yi
    Zhang, Ce
    Hu, Xueting
    He, Zhihai
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 197 - 209
  • [30] Conditional Prompt Learning for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16795 - 16804