Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

被引:3
|
作者
Qu, Yiting [1 ]
Shen, Xinyue [1 ]
He, Xinlei [1 ]
Backes, Michael [1 ]
Zannettou, Savvas [2 ]
Zhang, Yang [1 ]
机构
[1] CISPA Helmholtz Ctr Informat Secur, Saarbrucken, Germany
[2] Delft Univ Technol, Delft, Netherlands
关键词
Text-To-Image Models; Unsafe Images; Hateful Memes;
D O I
10.1145/3576915.3616679
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art Text-to-Image models like Stable Diffusion and DALLE center dot 2 are revolutionizing how people generate visual content. At the same time, society has serious concerns about how adversaries can exploit such models to generate problematic or unsafe images. In this work, we focus on demystifying the generation of unsafe images and hateful memes from Text-to-Image models. We first construct a typology of unsafe images consisting of five categories (sexually explicit, violent, disturbing, hateful, and political). Then, we assess the proportion of unsafe images generated by four advanced Text-to-Image models using four prompt datasets. We find that Text-to-Image models can generate a substantial percentage of unsafe images; across four models and four prompt datasets, 14.56% of all generated images are unsafe. When comparing the four Text-to-Image models, we find different risk levels, with Stable Diffusion being the most prone to generating unsafe content (18.92% of all generated images are unsafe). Given Stable Diffusion's tendency to generate more unsafe content, we evaluate its potential to generate hateful meme variants if exploited by an adversary to attack a specific individual or community. We employ three image editing methods, DreamBooth, Textual Inversion, and SDEdit, which are supported by Stable Diffusion to generate variants. Our evaluation result shows that 24% of the generated images using DreamBooth are hateful meme variants that present the features of the original hateful meme and the target individual/community; these generated images are comparable to hateful meme variants collected from the real world. Overall, our results demonstrate that the danger of large-scale generation of unsafe images is imminent. We discuss several mitigating measures, such as curating training data, regulating prompts, and implementing safety filters, and encourage better safeguard tools to be developed to prevent unsafe generation.
引用
收藏
页码:3403 / 3417
页数:15
相关论文
共 50 条
  • [21] Adversarial attacks and defenses on text-to-image diffusion models: A survey
    Zhang, Chenyu
    Hu, Mingwang
    Li, Wenhui
    Wang, Lanjun
    [J]. Information Fusion, 2025, 114
  • [22] Towards Consistent Video Editing with Text-to-Image Diffusion Models
    Zhang, Zicheng
    Li, Bonan
    Nie, Xuecheng
    Han, Congying
    Guo, Tiande
    Liu, Luoqi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
    Ahn, Namhyuk
    Lee, Junsoo
    Lee, Chunggi
    Kim, Kunhee
    Kim, Daesik
    Nam, Seung-Hun
    Hong, Kibeom
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 674 - 681
  • [24] DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models
    Sha, Zeyang
    Li, Zheng
    Yu, Ning
    Zhang, Yang
    [J]. PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 3418 - 3432
  • [25] Swinv2-Imagen: hierarchical vision transformer diffusion models for text-to-image generation
    Li, Ruijun
    Li, Weihua
    Yang, Yi
    Wei, Hanyu
    Jiang, Jianhua
    Bai, Quan
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 36 (28): : 17245 - 17260
  • [26] Expressive Text-to-Image Generation with Rich Text
    Ge, Songwei
    Park, Taesung
    Zhu, Jun-Yan
    Huang, Jia-Bin
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
  • [27] RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
    Xue, Zeyue
    Song, Guanglu
    Guo, Qiushan
    Liu, Boxiao
    Zong, Zhuofan
    Liu, Yu
    Luo, Ping
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks
    Bakhtiarnia, Arian
    Zhang, Qi
    Iosifidis, Alexandros
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [29] SEMANTICALLY INVARIANT TEXT-TO-IMAGE GENERATION
    Sah, Shagan
    Peri, Dheeraj
    Shringi, Ameya
    Zhang, Chi
    Dominguez, Miguel
    Savakis, Andreas
    Ptucha, Ray
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3783 - 3787
  • [30] Point-Cloud Completion with Pretrained Text-to-image Diffusion Models
    Kasten, Yoni
    Rahamim, Ohad
    Chechik, Gal
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,