Prompt suffix-attack against text-to-image diffusion models

被引：0

作者：

Xiong, Siyun ^{[1
]}

Du, Yanhui ^{[1
]}

Wang, Zhuohao ^{[2
]}

Sun, Peiqi ^{[1
]}

机构：

[1] Peoples Publ Secur Univ China, Inst Informat & Network Secur, Beijing 100038, Peoples R China

[2] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 630卷

关键词：

Text-to-image diffusion models; Prompt suffix attack; Adversarial robustness; CLIP model;

D O I：

10.1016/j.neucom.2025.129659

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-image diffusion models (T2I DMs) have achieved excellent performance in image generation. However, the adversarial robustness of T2I DMs has not been sufficiently explored. Most existing works focus on guiding T2I DMs to generate not safe for work (NSFW) images, primarily aiming to bypass the safety checkers of T2I DMs rather than address the adversarial vulnerabilities inherent to the models themselves. While some existing studies have achieved perturbations in the content of the generated images by appending a certain number of suffix characters to the text prompts, which we call prompt suffix attack (PSA). However, they lack a systematic exploration of the underlying mechanisms of these attacks. In this work, we conduct a detailed study to better understand such character-level failure modes of the T2I DMs' text encoders. To achieve this, we investigate the performance of various PSA strategies targeting the text encoder of Stable Diffusion. We incorporate four established algorithms, namely Particle Swarm Optimization (PSO), Simulated Annealing (SA), Hotflip and GPT-based approaches. Furthermore, we propose an integrated gradient-free algorithm, G&S, and systematically compare its performance with existing methods using quantitative metrics and visualization techniques. Experimental results demonstrate that G&S achieves significant advantages in character-level perturbation attacks. Subsequently, we use G&S to investigate the mechanism behind the character-level failure modes of SD's text encoder, attributing these vulnerabilities to two key factors: the fragility of contextual encoding and the nonlinear effects within the semantic space. Additionally, through extensive experiments in both gray-box and black-box settings, we demonstrate that this vulnerability in text encoder is pervasive across T2I DMs, offering new insights for future research on the robustness of T2I DMs.

引用

页数：10

共 50 条

[21] ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models Against Stochastic Perturbation
Zhang, Yi
Tang, Yun
Ruan, Wenjie
Huang, Xiaowei
Khastgir, Siddartha
Jennings, Paul
Zhao, Xingyu
COMPUTER VISION - ECCV 2024, PT XXXII, 2025, 15090 : 455 - 472
[22] T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
Wang, Zhongqi
Zhang, Jie
Shan, Shiguang
Chen, Xilin
COMPUTER VISION - ECCV 2024, PT LXXXV, 2025, 15143 : 107 - 124
[23] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Yang, Jingyuan
Feng, Jiawei
Huang, Hui
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
[24] A taxonomy of prompt modifiers for text-to-image generation
Oppenlaender, Jonas
BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
[25] Text-to-Image Diffusion Models are Zero-Shot Classifiers
Clark, Kevin
Jaini, Priyank
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[26] The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Avrahami, Omri
Hertz, Amir
Vinker, Yael
Arar, Moab
Fruchter, Shlomi
Fried, Ohad
Cohen-Or, Daniel
Lischinski, Dani
PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
[27] Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models
Shan, Shawn
Ding, Wenxin
Passananti, Josephine
Wu, Stanley
Zheng, Haitao
Zhao, Ben Y.
45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 807 - 825
[28] Exposing fake images generated by text-to-image diffusion models
Xu, Qiang
Wang, Hao
Meng, Laijin
Mi, Zhongjie
Yuan, Jianye
Yan, Hong
PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
[29] Exposing fake images generated by text-to-image diffusion models
Xu, Qiang
Wang, Hao
Meng, Laijin
Mi, Zhongjie
Yuan, Jianye
Yan, Hong
PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
[30] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Saharia, Chitwan
Chan, William
Saxena, Saurabh
Li, Lala
Whang, Jay
Denton, Emily
Ghasemipour, Seyed Kamyar Seyed
Ayan, Burcu Karagol
Mahdavi, S. Sara
Gontijo-Lopes, Raphael
Salimans, Tim
Ho, Jonathan
Fleet, David J.
Norouzi, Mohammad
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →