Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

被引:3
|
作者
Qu, Yiting [1 ]
Shen, Xinyue [1 ]
He, Xinlei [1 ]
Backes, Michael [1 ]
Zannettou, Savvas [2 ]
Zhang, Yang [1 ]
机构
[1] CISPA Helmholtz Ctr Informat Secur, Saarbrucken, Germany
[2] Delft Univ Technol, Delft, Netherlands
关键词
Text-To-Image Models; Unsafe Images; Hateful Memes;
D O I
10.1145/3576915.3616679
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art Text-to-Image models like Stable Diffusion and DALLE center dot 2 are revolutionizing how people generate visual content. At the same time, society has serious concerns about how adversaries can exploit such models to generate problematic or unsafe images. In this work, we focus on demystifying the generation of unsafe images and hateful memes from Text-to-Image models. We first construct a typology of unsafe images consisting of five categories (sexually explicit, violent, disturbing, hateful, and political). Then, we assess the proportion of unsafe images generated by four advanced Text-to-Image models using four prompt datasets. We find that Text-to-Image models can generate a substantial percentage of unsafe images; across four models and four prompt datasets, 14.56% of all generated images are unsafe. When comparing the four Text-to-Image models, we find different risk levels, with Stable Diffusion being the most prone to generating unsafe content (18.92% of all generated images are unsafe). Given Stable Diffusion's tendency to generate more unsafe content, we evaluate its potential to generate hateful meme variants if exploited by an adversary to attack a specific individual or community. We employ three image editing methods, DreamBooth, Textual Inversion, and SDEdit, which are supported by Stable Diffusion to generate variants. Our evaluation result shows that 24% of the generated images using DreamBooth are hateful meme variants that present the features of the original hateful meme and the target individual/community; these generated images are comparable to hateful meme variants collected from the real world. Overall, our results demonstrate that the danger of large-scale generation of unsafe images is imminent. We discuss several mitigating measures, such as curating training data, regulating prompts, and implementing safety filters, and encourage better safeguard tools to be developed to prevent unsafe generation.
引用
收藏
页码:3403 / 3417
页数:15
相关论文
共 50 条
  • [1] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    [J]. PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
  • [2] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    [J]. PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
  • [3] Shifted Diffusion for Text-to-image Generation
    Zhou, Yufan
    Liu, Bingchen
    Zhu, Yizhe
    Yang, Xiao
    Chen, Changyou
    Xu, Jinhui
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166
  • [4] Comparative Review of Text-to-Image Generation Techniques Based on Diffusion Models
    Gao, Xinyu
    Du, Fang
    Song, Lijuan
    [J]. Computer Engineering and Applications, 2024, 60 (24) : 44 - 64
  • [5] MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models
    Zhao, Jing
    Zheng, Heliang
    Wang, Chaoyue
    Lan, Long
    Yang, Wenjing
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22535 - 22545
  • [6] Inspecting the Geographical Representativeness of Images from Text-to-Image Models
    Basu, Abhipsa
    Babu, R. Venkatesh
    Pruthi, Danish
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5113 - 5124
  • [7] Ablating Concepts in Text-to-Image Diffusion Models
    Kumari, Nupur
    Zhang, Bingliang
    Wang, Sheng-Yu
    Shechtman, Eli
    Zhang, Richard
    Zhu, Jun-Yan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22634 - 22645
  • [8] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
    Zhang, Zhixing
    Han, Ligong
    Ghosh, Arnab
    Metaxas, Dimitris
    Ren, Jian
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6027 - 6037
  • [9] Unleashing Text-to-Image Diffusion Models for Visual Perception
    Zhao, Wenliang
    Rao, Yongming
    Liu, Zuyan
    Liu, Benlin
    Zhou, Jie
    Lu, Jiwen
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5706 - 5716
  • [10] Adding Conditional Control to Text-to-Image Diffusion Models
    Zhang, Lvmin
    Rao, Anyi
    Agrawala, Maneesh
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3813 - 3824