Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models

被引:1
|
作者
Zhang, Yasi [1 ]
Yu, Peiyu [1 ]
Wu, Ying Nian [1 ]
机构
[1] Univ Calif Los Angeles, Dept Stat & Data Sci, Los Angeles, CA 90095 USA
来源
关键词
Attention Map Alignment; Energy-Based Models; Text-to-Image Diffusion Models;
D O I
10.1007/978-3-031-72946-1_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image diffusion models have shown great success in generating high-quality text-guided images. Yet, these models may still fail to semantically align generated images with the provided text prompts, leading to problems like incorrect attribute binding and/or catastrophic object neglect. Given the pervasive object-oriented structure underlying text prompts, we introduce a novel object-conditioned Energy-Based Attention Map Alignment (EBAMA) method to address the aforementioned problems. We show that an object-centric attribute binding loss naturally emerges by approximately maximizing the log-likelihood of a z-parameterized energy-based model with the help of the negative sampling technique. We further propose an object-centric intensity regularizer to prevent excessive shifts of objects attention towards their attributes. Extensive qualitative and quantitative experiments, including human evaluation, on several challenging benchmarks demonstrate the superior performance of our method over previous strong counterparts. With better aligned attention maps, our approach shows great promise in further enhancing the text-controlled image editing ability of diffusion models. The code is available at https://github.com/YasminZhang/EBAMA.
引用
收藏
页码:55 / 71
页数:17
相关论文
共 50 条
  • [31] The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
    Avrahami, Omri
    Hertz, Amir
    Vinker, Yael
    Arar, Moab
    Fruchter, Shlomi
    Fried, Ohad
    Cohen-Or, Daniel
    Lischinski, Dani
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [32] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
  • [33] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
  • [34] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
    Saharia, Chitwan
    Chan, William
    Saxena, Saurabh
    Li, Lala
    Whang, Jay
    Denton, Emily
    Ghasemipour, Seyed Kamyar Seyed
    Ayan, Burcu Karagol
    Mahdavi, S. Sara
    Gontijo-Lopes, Raphael
    Salimans, Tim
    Ho, Jonathan
    Fleet, David J.
    Norouzi, Mohammad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [35] Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
    Wu, Xiaoshi
    Hao, Yiming
    Zhang, Manyuan
    Sun, Keqiang
    Huang, Zhaoyang
    Song, Guanglu
    Liu, Yu
    Li, Hongsheng
    COMPUTER VISION - ECCV 2024, PT LXXXIII, 2025, 15141 : 108 - 124
  • [36] Adversarial attacks and defenses on text-to-image diffusion models: A survey
    Zhang, Chenyu
    Hu, Mingwang
    Li, Wenhui
    Wang, Lanjun
    INFORMATION FUSION, 2025, 114
  • [37] Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
    Gong, Chao
    Chen, Kai
    Wei, Zhipeng
    Chen, Jingjing
    Jiang, Yu-Gang
    COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 73 - 88
  • [38] DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
    Ahn, Namhyuk
    Lee, Junsoo
    Lee, Chunggi
    Kim, Kunhee
    Kim, Daesik
    Nam, Seung-Hun
    Hong, Kibeom
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 674 - 681
  • [39] Towards Consistent Video Editing with Text-to-Image Diffusion Models
    Zhang, Zicheng
    Li, Bonan
    Nie, Xuecheng
    Han, Congying
    Guo, Tiande
    Liu, Luoqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
    Jung, Sanghyun
    Jung, Seohyeon
    Kim, Balhae
    Choi, Moonseok
    Shin, Jinwoo
    Lee, Juho
    COMPUTER VISION - ECCV 2024, PT LXVII, 2025, 15125 : 128 - 145