DE-GAN: Text-to-image synthesis with dual and efficient fusion model

被引:2
|
作者
Jiang, Bin [1 ,2 ]
Zeng, Weiyuan [1 ,2 ]
Yang, Chao [1 ]
Wang, Renjun [1 ]
Zhang, Bolin [1 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Hunan Univ, Key Lab Embedded & Network Comp Hunan Prov, Changsha 410082, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Text-to-image synthesis; Generative adversarial network; Cross-modal; Attention mechanism; ATTENTION;
D O I
10.1007/s11042-023-16377-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generating diverse and plausible images conditioned on the given captions is an attractive but challenging task. While many existing studies have presented impressive results, text-to-image synthesis still suffers from two problems. (1) The fact that noise is only injected at the very beginning hurts the divesity of final results. (2) Most previous models exploit non-local-like spatial attention mechanisms to introduce fine-grained word-level information in the generation process, which makes these models too storage-consuming to apply to mobile and embedded applications. In this paper, we propose a novel Dual and Efficient Fusion Generative Adversarial Newtwork (DE-GAN) to cope with the issues above. To balance the diversity and fidelity of generated images, DE-GAN utilizes Dual Injection Blocks to simultaneously inject noise and text embeddings into the model multiple times during the generation process. In addition, an efficient condition channel attention module is designed in DE-GAN to capture the correlations between text and image modalities to guide the network in refining image features with as little storage overhead as possible, enabling the model to adapt to resource-constrained applications. Comprehensive experiments on two benchmark datasets demonstrate that DE-GAN efficiently generates more diverse and photo-realistic images compared to previous methods.
引用
下载
收藏
页码:23839 / 23852
页数:14
相关论文
共 50 条
  • [41] ISF-GAN: Imagine, Select, and Fuse with GPT-based Text Enrichment for Text-to-image Synthesis
    Sheng, Yefei
    Tao, Ming
    Wang, Jie
    Bao, Bing-Kun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07) : 1 - 17
  • [42] Recurrent Affine Transformation for Text-to-Image Synthesis
    Ye, Senmao
    Wang, Huan
    Tan, Mingkui
    Liu, Fei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 462 - 473
  • [43] A Comprehensive Pipeline for Complex Text-to-Image Synthesis
    Fang, Fei
    Luo, Fei
    Zhang, Hong-Pan
    Zhou, Hua-Jian
    Chow, Alix L. H.
    Xiao, Chun-Xia
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (03) : 522 - 537
  • [44] Text-to-Image Synthesis via Aesthetic Layout
    Baraheem, Samah Saeed
    Trung-Nghia Le
    Nguyen, Tam, V
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4485 - 4487
  • [45] A Comprehensive Pipeline for Complex Text-to-Image Synthesis
    Fei Fang
    Fei Luo
    Hong-Pan Zhang
    Hua-Jian Zhou
    Alix L. H. Chow
    Chun-Xia Xiao
    Journal of Computer Science and Technology, 2020, 35 : 522 - 537
  • [46] Modality Disentangled Discriminator for Text-to-Image Synthesis
    Feng, Fangxiang
    Niu, Tianrui
    Li, Ruifan
    Wang, Xiaojie
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2112 - 2124
  • [47] Layout-Bridging Text-to-Image Synthesis
    Liang, Jiadong
    Pei, Wenjie
    Lu, Feng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7438 - 7451
  • [48] CookGAN: Causality based Text-to-Image Synthesis
    Zhu, Bin
    Ngo, Chong-Wah
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5518 - 5526
  • [49] AltDiffusion: A Multilingual Text-to-Image Diffusion Model
    Ye, Fulong
    Liu, Guang
    Wu, Xinya
    Wu, Ledell
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6648 - 6656
  • [50] KT-GAN: Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis
    Tan, Hongchen
    Liu, Xiuping
    Liu, Meng
    Yin, Baocai
    Li, Xin
    IEEE Transactions on Image Processing, 2021, 30 : 1275 - 1290