Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models

被引:1
|
作者
Zhang, Yasi [1 ]
Yu, Peiyu [1 ]
Wu, Ying Nian [1 ]
机构
[1] Univ Calif Los Angeles, Dept Stat & Data Sci, Los Angeles, CA 90095 USA
来源
关键词
Attention Map Alignment; Energy-Based Models; Text-to-Image Diffusion Models;
D O I
10.1007/978-3-031-72946-1_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image diffusion models have shown great success in generating high-quality text-guided images. Yet, these models may still fail to semantically align generated images with the provided text prompts, leading to problems like incorrect attribute binding and/or catastrophic object neglect. Given the pervasive object-oriented structure underlying text prompts, we introduce a novel object-conditioned Energy-Based Attention Map Alignment (EBAMA) method to address the aforementioned problems. We show that an object-centric attribute binding loss naturally emerges by approximately maximizing the log-likelihood of a z-parameterized energy-based model with the help of the negative sampling technique. We further propose an object-centric intensity regularizer to prevent excessive shifts of objects attention towards their attributes. Extensive qualitative and quantitative experiments, including human evaluation, on several challenging benchmarks demonstrate the superior performance of our method over previous strong counterparts. With better aligned attention maps, our approach shows great promise in further enhancing the text-controlled image editing ability of diffusion models. The code is available at https://github.com/YasminZhang/EBAMA.
引用
收藏
页码:55 / 71
页数:17
相关论文
共 50 条
  • [1] Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models
    Park, Geon Yeong
    Kim, Jeongsol
    Kim, Beomsu
    Lee, Sang Wan
    Ye, Jong Chul
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Temporal Adaptive Attention Map Guidance for Text-to-Image Diffusion Models
    Jung, Sunghoon
    Heo, Yong Seok
    ELECTRONICS, 2025, 14 (03):
  • [3] Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
    Wang, Ruichen
    Chen, Zekang
    Chen, Chen
    Ma, Jian
    Lu, Haonan
    Lin, Xiaodong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5544 - 5552
  • [4] Debiasing Text-to-Image Diffusion Models
    He, Ruifei
    Xue, Chuhui
    Tan, Haoru
    Zhang, Wenqing
    Yu, Yingchen
    Bai, Song
    Qi, Xiaojuan
    PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36
  • [5] From text to mask: Localizing entities using the attention of text-to-image diffusion models
    Xiao, Changming
    Yang, Qi
    Zhou, Feng
    Zhang, Changshui
    NEUROCOMPUTING, 2024, 610
  • [6] Text-to-Image Generation Method Based on Object Enhancement and Attention Maps
    Huang, Yongsen
    Cai, Xiaodong
    An, Yuefan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (01) : 961 - 968
  • [7] Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
    Chefer, Hila
    Alaluf, Yuval
    Vinker, Yael
    Wolf, Lior
    Cohen-Or, Daniel
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04):
  • [8] Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
    Zhang, Yang
    Tzun, Teoh Tze
    Hern, Lim Wei
    Kawaguchi, Kenji
    COMPUTER VISION - ECCV 2024, PT LXXXVI, 2025, 15144 : 70 - 86
  • [9] Unveiling and Mitigating Memorization in Text-to-Image Diffusion Models Through Cross Attention
    Ren, Jie
    Liu, Yaxin
    Zhang, Shenglai
    Xu, Han
    Lyu, Lingjuan
    Xing, Yue
    Tang, Jiliang
    COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 340 - 356
  • [10] Localizing Object-level Shape Variations with Text-to-Image Diffusion Models
    Patashnik, Or
    Garibi, Daniel
    Azuri, Idan
    Averbuch-Elor, Hadar
    Cohen-Or, Daniel
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22994 - 23004