MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask

被引:0
|
作者
Zhou, Yupeng [1 ,2 ]
Zhou, Daquan [2 ]
Wang, Yaxing [1 ]
Feng, Jiashi [2 ]
Hou, Qibin [1 ]
机构
[1] Nankai Univ, VCIP, CS, Tianjin 300350, Peoples R China
[2] ByteDance, Singapore, Singapore
关键词
Diffusion model; Text-to-image generation; Conditional mask;
D O I
10.1007/s11263-024-02294-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advancements in diffusion models have showcased their impressive capacity to generate visually striking images. However, ensuring a close match between the generated image and the given prompt remains a persistent challenge. In this work, we identify that a crucial factor leading to the erroneous generation of objects and their attributes is the inadequate cross-modality relation learning between the prompt and the generated images. To better align the prompt and image content, we advance the cross-attention with an adaptive mask, which is conditioned on the attention maps and the prompt embeddings, to dynamically adjust the contribution of each text token to the image features. This mechanism explicitly diminishes the ambiguity in the semantic information embedding of the text encoder, leading to a boost of text-to-image consistency in the synthesized images. Our method, termed MaskDiffusion, is training-free and hot-pluggable for popular pre-trained diffusion models. When applied to the latent diffusion models, our MaskDiffusion can largely enhance their capability to correctly generate objects and their attributes, with negligible computation overhead compared to the original diffusion models. Our project page is https://github.com/HVision-NKU/MaskDiffusion.
引用
收藏
页码:2805 / 2824
页数:20
相关论文
共 50 条
  • [21] Surgical text-to-image generation
    Nwoye, Chinedu Innocent
    Bose, Rupak
    Elgohary, Kareem
    Arboit, Lorenzo
    Carlino, Giorgio
    Lavanchy, Joel L.
    Mascagni, Pietro
    Padoy, Nicolas
    PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
  • [22] Expressive Text-to-Image Generation with Rich Text
    Ge, Songwei
    Park, Taesung
    Zhu, Jun-Yan
    Huang, Jia-Bin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
  • [23] CT-GAN: A conditional Generative Adversarial Network of transformer architecture for text-to-image
    Zhang, Xin
    Jiao, Wentao
    Wang, Bing
    Tian, Xuedong
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 115
  • [24] DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for Text-to-Image Generation
    Wang, Zhiwei
    Yang, Jing
    Cui, Jiajun
    Liu, Jiawei
    Wang, Jiahao
    COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 3 - 19
  • [25] SEMANTICALLY INVARIANT TEXT-TO-IMAGE GENERATION
    Sah, Shagan
    Peri, Dheeraj
    Shringi, Ameya
    Zhang, Chi
    Dominguez, Miguel
    Savakis, Andreas
    Ptucha, Ray
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3783 - 3787
  • [26] Instance Mask Embedding and Attribute-Adaptive Generative Adversarial Network for Text-to-Image Synthesis
    Ni, Jiancheng
    Zhang, Susu
    Zhou, Zili
    Hou, Jie
    Gao, Feng
    IEEE ACCESS, 2020, 8 (08): : 37697 - 37711
  • [27] Semantics Disentangling for Text-to-Image Generation
    Yin, Guojun
    Liu, Bin
    Sheng, Lu
    Yu, Nenghai
    Wang, Xiaogang
    Shao, Jing
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2322 - 2331
  • [28] Text-to-Image Generation for Abstract Concepts
    Liao, Jiayi
    Chen, Xu
    Fu, Qiang
    Du, Lun
    He, Xiangnan
    Wang, Xiang
    Han, Shi
    Zhang, Dongmei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3360 - 3368
  • [29] Shifted Diffusion for Text-to-image Generation
    Zhou, Yufan
    Liu, Bingchen
    Zhu, Yizhe
    Yang, Xiao
    Chen, Changyou
    Xu, Jinhui
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166
  • [30] Mobile App for Text-to-Image Synthesis
    Kang, Ryan
    Sunil, Athira
    Chen, Min
    MOBILE COMPUTING, APPLICATIONS, AND SERVICES, MOBICASE 2019, 2019, 290 : 32 - 43