Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models

被引：1

作者：

Zhang, Yasi ^{[1
]}

Yu, Peiyu ^{[1
]}

Wu, Ying Nian ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Stat & Data Sci, Los Angeles, CA 90095 USA

来源：

COMPUTER VISION - ECCV 2024, PT XLII | 2025年 / 15100卷

关键词：

Attention Map Alignment; Energy-Based Models; Text-to-Image Diffusion Models;

D O I：

10.1007/978-3-031-72946-1_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-image diffusion models have shown great success in generating high-quality text-guided images. Yet, these models may still fail to semantically align generated images with the provided text prompts, leading to problems like incorrect attribute binding and/or catastrophic object neglect. Given the pervasive object-oriented structure underlying text prompts, we introduce a novel object-conditioned Energy-Based Attention Map Alignment (EBAMA) method to address the aforementioned problems. We show that an object-centric attribute binding loss naturally emerges by approximately maximizing the log-likelihood of a z-parameterized energy-based model with the help of the negative sampling technique. We further propose an object-centric intensity regularizer to prevent excessive shifts of objects attention towards their attributes. Extensive qualitative and quantitative experiments, including human evaluation, on several challenging benchmarks demonstrate the superior performance of our method over previous strong counterparts. With better aligned attention maps, our approach shows great promise in further enhancing the text-controlled image editing ability of diffusion models. The code is available at https://github.com/YasminZhang/EBAMA.

引用

页码：55 / 71

页数：17

共 50 条

[31] The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Avrahami, Omri
Hertz, Amir
Vinker, Yael
Arar, Moab
Fruchter, Shlomi
Fried, Ohad
Cohen-Or, Daniel
Lischinski, Dani
PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
[32] Exposing fake images generated by text-to-image diffusion models
Xu, Qiang
Wang, Hao
Meng, Laijin
Mi, Zhongjie
Yuan, Jianye
Yan, Hong
PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
[33] Exposing fake images generated by text-to-image diffusion models
Xu, Qiang
Wang, Hao
Meng, Laijin
Mi, Zhongjie
Yuan, Jianye
Yan, Hong
PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
[34] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Saharia, Chitwan
Chan, William
Saxena, Saurabh
Li, Lala
Whang, Jay
Denton, Emily
Ghasemipour, Seyed Kamyar Seyed
Ayan, Burcu Karagol
Mahdavi, S. Sara
Gontijo-Lopes, Raphael
Salimans, Tim
Ho, Jonathan
Fleet, David J.
Norouzi, Mohammad
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[35] Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Wu, Xiaoshi
Hao, Yiming
Zhang, Manyuan
Sun, Keqiang
Huang, Zhaoyang
Song, Guanglu
Liu, Yu
Li, Hongsheng
COMPUTER VISION - ECCV 2024, PT LXXXIII, 2025, 15141 : 108 - 124
[36] Adversarial attacks and defenses on text-to-image diffusion models: A survey
Zhang, Chenyu
Hu, Mingwang
Li, Wenhui
Wang, Lanjun
INFORMATION FUSION, 2025, 114
[37] Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
Gong, Chao
Chen, Kai
Wei, Zhipeng
Chen, Jingjing
Jiang, Yu-Gang
COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 73 - 88
[38] DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
Ahn, Namhyuk
Lee, Junsoo
Lee, Chunggi
Kim, Kunhee
Kim, Daesik
Nam, Seung-Hun
Hong, Kibeom
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 674 - 681
[39] Towards Consistent Video Editing with Text-to-Image Diffusion Models
Zhang, Zicheng
Li, Bonan
Nie, Xuecheng
Han, Congying
Guo, Tiande
Liu, Luoqi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[40] Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
Jung, Sanghyun
Jung, Seohyeon
Kim, Balhae
Choi, Moonseok
Shin, Jinwoo
Lee, Juho
COMPUTER VISION - ECCV 2024, PT LXVII, 2025, 15125 : 128 - 145

← 1 2 3 4 5 →