Localization and Manipulation of Immoral Visual Cues for Safe Text-to-Image Generation

被引：0

作者：

Park, Seongbeom ^{[1
]}

Moon, Suhong ^{[2
]}

Park, Seunghyun ^{[3
]}

Kim, Jinkyu ^{[1
]}

机构：

[1] Korea Univ, CSE, Seoul, South Korea

[2] Univ Calif Berkeley, EECS, Berkeley, CA USA

[3] NAVER Cloud AI, Seoul, South Korea

来源：

2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024 | 2024年

关键词：

D O I：

10.1109/WACV57701.2024.00461

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current text-to-image generation methods produce high-resolution and high-quality images, but they should not produce immoral images that may contain inappropriate content from the perspective of commonsense morality. Conventional approaches, however, often neglect these ethical concerns, and existing solutions are often limited to ensure moral compatibility. To address this, we propose a novel method that has three main capabilities: (1) our model recognizes the degree of visual commonsense immorality of a given generated image, (2) our model localizes immoral visual (and textual) attributes that make the image visually immoral, and (3) our model manipulates such immoral visual cues into a morally-qualifying alternative. We conduct experiments with various text-to-image generation models, including the state-of-the-art Stable Diffusion model, demonstrating the efficacy of our ethical image manipulation approach. Our human study further confirms that ours is indeed able to generate morally-satisfying images from immoral ones.

引用

页码：4663 / 4672

页数：10

共 50 条

[1] Visual Programming for Text-to-Image Generation and Evaluation
Cho, Jaemin
Zala, Abhay
Bansal, Mohit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] Controllable Text-to-Image Generation
Li, Bowen
Qi, Xiaojuan
Lukasiewicz, Thomas
Torr, Philip H. S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Surgical text-to-image generation
Nwoye, Chinedu Innocent
Bose, Rupak
Elgohary, Kareem
Arboit, Lorenzo
Carlino, Giorgio
Lavanchy, Joel L.
Mascagni, Pietro
Padoy, Nicolas
PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
[4] Expressive Text-to-Image Generation with Rich Text
Ge, Songwei
Park, Taesung
Zhu, Jun-Yan
Huang, Jia-Bin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
[5] Visual question answering based evaluation metrics for text-to-image generation
Miyamoto, Mizuki
Morita, Ryugo
Zhou, Jinjia
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[6] SEMANTICALLY INVARIANT TEXT-TO-IMAGE GENERATION
Sah, Shagan
Peri, Dheeraj
Shringi, Ameya
Zhang, Chi
Dominguez, Miguel
Savakis, Andreas
Ptucha, Ray
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3783 - 3787
[7] Shifted Diffusion for Text-to-image Generation
Zhou, Yufan
Liu, Bingchen
Zhu, Yizhe
Yang, Xiao
Chen, Changyou
Xu, Jinhui
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166
[8] Text-to-Image Generation for Abstract Concepts
Liao, Jiayi
Chen, Xu
Fu, Qiang
Du, Lun
He, Xiangnan
Wang, Xiang
Han, Shi
Zhang, Dongmei
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3360 - 3368
[9] Semantics Disentangling for Text-to-Image Generation
Yin, Guojun
Liu, Bin
Sheng, Lu
Yu, Nenghai
Wang, Xiaogang
Shao, Jing
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2322 - 2331
[10] Perceptions and Realities of Text-to-Image Generation
Oppenlaender, Jonas
Silvennoinen, Johanna
Paananen, Ville
Visuri, Aku
PROCEEDINGS OF THE 26TH INTERNATIONAL ACADEMIC MINDTREK, MINDTREK 2023, 2023, : 279 - 288

← 1 2 3 4 5 →