DE-GAN: Text-to-image synthesis with dual and efficient fusion model

被引:3
|
作者
Jiang, Bin [1 ,2 ]
Zeng, Weiyuan [1 ,2 ]
Yang, Chao [1 ]
Wang, Renjun [1 ]
Zhang, Bolin [1 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Hunan Univ, Key Lab Embedded & Network Comp Hunan Prov, Changsha 410082, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Text-to-image synthesis; Generative adversarial network; Cross-modal; Attention mechanism; ATTENTION;
D O I
10.1007/s11042-023-16377-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generating diverse and plausible images conditioned on the given captions is an attractive but challenging task. While many existing studies have presented impressive results, text-to-image synthesis still suffers from two problems. (1) The fact that noise is only injected at the very beginning hurts the divesity of final results. (2) Most previous models exploit non-local-like spatial attention mechanisms to introduce fine-grained word-level information in the generation process, which makes these models too storage-consuming to apply to mobile and embedded applications. In this paper, we propose a novel Dual and Efficient Fusion Generative Adversarial Newtwork (DE-GAN) to cope with the issues above. To balance the diversity and fidelity of generated images, DE-GAN utilizes Dual Injection Blocks to simultaneously inject noise and text embeddings into the model multiple times during the generation process. In addition, an efficient condition channel attention module is designed in DE-GAN to capture the correlations between text and image modalities to guide the network in refining image features with as little storage overhead as possible, enabling the model to adapt to resource-constrained applications. Comprehensive experiments on two benchmark datasets demonstrate that DE-GAN efficiently generates more diverse and photo-realistic images compared to previous methods.
引用
收藏
页码:23839 / 23852
页数:14
相关论文
共 50 条
  • [1] DE-GAN: Text-to-image synthesis with dual and efficient fusion model
    Bin Jiang
    Weiyuan Zeng
    Chao Yang
    Renjun Wang
    Bolin Zhang
    [J]. Multimedia Tools and Applications, 2024, 83 : 23839 - 23852
  • [2] Dual Adversarial Inference for Text-to-Image Synthesis
    Lao, Qicheng
    Havaei, Mohammad
    Pesaranghader, Ahmad
    Dutil, Francis
    Di Jorio, Lisa
    Fevens, Thomas
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7566 - 7575
  • [3] SF-GAN: Semantic fusion generative adversarial networks for text-to-image synthesis
    Yang, Bing
    Xiang, Xueqin
    Kong, Wanzeng
    Zhang, Jianhai
    Yao, Jinliang
    [J]. Expert Systems with Applications, 2025, 262
  • [4] GMF-GAN: Gradual multi-granularity semantic fusion GAN for text-to-image synthesis
    Jin, Dehu
    Li, Guangju
    Yu, Qi
    Yu, Lan
    Cui, Jia
    Qi, Meng
    [J]. DIGITAL SIGNAL PROCESSING, 2023, 140
  • [5] Efficient Neural Architecture for Text-to-Image Synthesis
    Souza, Douglas M.
    Wehrmann, Jonatas
    Ruiz, Duncan D.
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for Text-to-Image Synthesis
    Yang, Bing
    Xiang, Xueqin
    Kong, Wangzeng
    Zhang, Jianhai
    Peng, Yong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6956 - 6967
  • [7] AraBERT and DF-GAN fusion for Arabic text-to-image generation
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    [J]. Array, 2022, 16
  • [8] AraBERT and DF-GAN fusion for Arabic text-to-image generation
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    [J]. ARRAY, 2022, 16
  • [9] Modified GAN with Proposed Feature Set for Text-to-Image Synthesis
    Talasila, Vamsidhar
    Narasingarao, M. R.
    Mohan, V. Murali
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (04)
  • [10] SWF-GAN: A Text-to-Image model based on sentence-word fusion perception
    Liu, Chun
    Hu, Jingsong
    Lin, Hong
    [J]. COMPUTERS & GRAPHICS-UK, 2023, 115 : 500 - 510