Prompt suffix-attack against text-to-image diffusion models

被引:0
|
作者
Xiong, Siyun [1 ]
Du, Yanhui [1 ]
Wang, Zhuohao [2 ]
Sun, Peiqi [1 ]
机构
[1] Peoples Publ Secur Univ China, Inst Informat & Network Secur, Beijing 100038, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
关键词
Text-to-image diffusion models; Prompt suffix attack; Adversarial robustness; CLIP model;
D O I
10.1016/j.neucom.2025.129659
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image diffusion models (T2I DMs) have achieved excellent performance in image generation. However, the adversarial robustness of T2I DMs has not been sufficiently explored. Most existing works focus on guiding T2I DMs to generate not safe for work (NSFW) images, primarily aiming to bypass the safety checkers of T2I DMs rather than address the adversarial vulnerabilities inherent to the models themselves. While some existing studies have achieved perturbations in the content of the generated images by appending a certain number of suffix characters to the text prompts, which we call prompt suffix attack (PSA). However, they lack a systematic exploration of the underlying mechanisms of these attacks. In this work, we conduct a detailed study to better understand such character-level failure modes of the T2I DMs' text encoders. To achieve this, we investigate the performance of various PSA strategies targeting the text encoder of Stable Diffusion. We incorporate four established algorithms, namely Particle Swarm Optimization (PSO), Simulated Annealing (SA), Hotflip and GPT-based approaches. Furthermore, we propose an integrated gradient-free algorithm, G&S, and systematically compare its performance with existing methods using quantitative metrics and visualization techniques. Experimental results demonstrate that G&S achieves significant advantages in character-level perturbation attacks. Subsequently, we use G&S to investigate the mechanism behind the character-level failure modes of SD's text encoder, attributing these vulnerabilities to two key factors: the fragility of contextual encoding and the nonlinear effects within the semantic space. Additionally, through extensive experiments in both gray-box and black-box settings, we demonstrate that this vulnerability in text encoder is pervasive across T2I DMs, offering new insights for future research on the robustness of T2I DMs.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models Against Stochastic Perturbation
    Zhang, Yi
    Tang, Yun
    Ruan, Wenjie
    Huang, Xiaowei
    Khastgir, Siddartha
    Jennings, Paul
    Zhao, Xingyu
    COMPUTER VISION - ECCV 2024, PT XXXII, 2025, 15090 : 455 - 472
  • [22] T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
    Wang, Zhongqi
    Zhang, Jie
    Shan, Shiguang
    Chen, Xilin
    COMPUTER VISION - ECCV 2024, PT LXXXV, 2025, 15143 : 107 - 124
  • [23] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
    Yang, Jingyuan
    Feng, Jiawei
    Huang, Hui
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
  • [24] A taxonomy of prompt modifiers for text-to-image generation
    Oppenlaender, Jonas
    BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
  • [25] Text-to-Image Diffusion Models are Zero-Shot Classifiers
    Clark, Kevin
    Jaini, Priyank
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [26] The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
    Avrahami, Omri
    Hertz, Amir
    Vinker, Yael
    Arar, Moab
    Fruchter, Shlomi
    Fried, Ohad
    Cohen-Or, Daniel
    Lischinski, Dani
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [27] Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models
    Shan, Shawn
    Ding, Wenxin
    Passananti, Josephine
    Wu, Stanley
    Zheng, Haitao
    Zhao, Ben Y.
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 807 - 825
  • [28] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
  • [29] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
  • [30] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
    Saharia, Chitwan
    Chan, William
    Saxena, Saurabh
    Li, Lala
    Whang, Jay
    Denton, Emily
    Ghasemipour, Seyed Kamyar Seyed
    Ayan, Burcu Karagol
    Mahdavi, S. Sara
    Gontijo-Lopes, Raphael
    Salimans, Tim
    Ho, Jonathan
    Fleet, David J.
    Norouzi, Mohammad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,