Prompt suffix-attack against text-to-image diffusion models

被引:0
|
作者
Xiong, Siyun [1 ]
Du, Yanhui [1 ]
Wang, Zhuohao [2 ]
Sun, Peiqi [1 ]
机构
[1] Peoples Publ Secur Univ China, Inst Informat & Network Secur, Beijing 100038, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
关键词
Text-to-image diffusion models; Prompt suffix attack; Adversarial robustness; CLIP model;
D O I
10.1016/j.neucom.2025.129659
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image diffusion models (T2I DMs) have achieved excellent performance in image generation. However, the adversarial robustness of T2I DMs has not been sufficiently explored. Most existing works focus on guiding T2I DMs to generate not safe for work (NSFW) images, primarily aiming to bypass the safety checkers of T2I DMs rather than address the adversarial vulnerabilities inherent to the models themselves. While some existing studies have achieved perturbations in the content of the generated images by appending a certain number of suffix characters to the text prompts, which we call prompt suffix attack (PSA). However, they lack a systematic exploration of the underlying mechanisms of these attacks. In this work, we conduct a detailed study to better understand such character-level failure modes of the T2I DMs' text encoders. To achieve this, we investigate the performance of various PSA strategies targeting the text encoder of Stable Diffusion. We incorporate four established algorithms, namely Particle Swarm Optimization (PSO), Simulated Annealing (SA), Hotflip and GPT-based approaches. Furthermore, we propose an integrated gradient-free algorithm, G&S, and systematically compare its performance with existing methods using quantitative metrics and visualization techniques. Experimental results demonstrate that G&S achieves significant advantages in character-level perturbation attacks. Subsequently, we use G&S to investigate the mechanism behind the character-level failure modes of SD's text encoder, attributing these vulnerabilities to two key factors: the fragility of contextual encoding and the nonlinear effects within the semantic space. Additionally, through extensive experiments in both gray-box and black-box settings, we demonstrate that this vulnerability in text encoder is pervasive across T2I DMs, offering new insights for future research on the robustness of T2I DMs.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
    Xu, Xingqian
    Guo, Jiayi
    Wang, Zhangyang
    Huang, Gao
    Essa, Irfan
    Shi, Humphrey
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8682 - 8692
  • [2] Prompt Stealing Attacks Against Text-to-Image Generation Models
    Shen, Xinyue
    Qu, Yiting
    Backes, Michael
    Zhang, Yang
    PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 5823 - 5840
  • [3] Ambiguity attack against text-to-image diffusion model watermarking
    Yuan, Zihan
    Li, Li
    Wang, Zichi
    Zhang, Xinpeng
    SIGNAL PROCESSING, 2024, 221
  • [4] Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models
    Huang, Yihao
    Juefei-Xu, Felix
    Guo, Qing
    Zhang, Jie
    Wu, Yutong
    Hu, Ming
    Li, Tianlin
    Pu, Geguang
    Liu, Yang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21169 - 21178
  • [5] Debiasing Text-to-Image Diffusion Models
    He, Ruifei
    Xue, Chuhui
    Tan, Haoru
    Zhang, Wenqing
    Yu, Yingchen
    Bai, Song
    Qi, Xiaojuan
    PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36
  • [6] Decoupling Control in Text-to-Image Diffusion Models
    Cao, Shitong
    Zhang, Xuejie
    Wang, Jin
    Zhou, Xiaobing
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 312 - 322
  • [7] Ablating Concepts in Text-to-Image Diffusion Models
    Kumari, Nupur
    Zhang, Bingliang
    Wang, Sheng-Yu
    Shechtman, Eli
    Zhang, Richard
    Zhu, Jun-Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22634 - 22645
  • [8] Design Guidelines for Prompt Engineering Text-to-Image Generative Models
    Liu, Vivian
    Chilton, Lydia B.
    PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [9] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
    Zhang, Zhixing
    Han, Ligong
    Ghosh, Arnab
    Metaxas, Dimitris
    Ren, Jian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6027 - 6037
  • [10] BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
    Vice, Jordan
    Akhtar, Naveed
    Hartley, Richard
    Mian, Ajmal
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4865 - 4880