MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

被引:4
|
作者
Yang, Yuyan [1 ,2 ]
Ni, Xin [1 ,2 ]
Hao, Yanbin [1 ,2 ]
Liu, Chenyu [3 ]
Wang, Wenshan [3 ]
Liu, Yifeng [3 ]
Xie, Haiyong [2 ,4 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Anhui, Peoples R China
[2] Minist Culture & Tourism, Key Lab Cyberculture Content Cognit & Detect, Hefei 230026, Anhui, Peoples R China
[3] Natl Engn Lab Risk Percept & Prevent NEL RPP, Beijing 100041, Peoples R China
[4] Capital Med Univ, Adv Innovat Ctr Human Brain Protect, Beijing 100069, Peoples R China
来源
关键词
Text-to-Image; GAN; Triplet loss;
D O I
10.1007/978-3-030-98358-1_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of text-to-image synthesis has been significantly boosted accompanied by the development of generative adversarial network (GAN) techniques. The current GAN-based methods for text-to-image generation mainly adopt multiple generator-discriminator pairs to explore the coarse/fine-grained textual content (e.g., words and sentences); however, they only consider the semantic consistency between the text-image pair. One drawback of such a multi-stream structure is that it results in many heavyweight models. In comparison, the single-stream counterpart bears the weakness of insufficient use of texts. To alleviate the above problems, we propose a Multi-conditional Fusion GAN (MF-GAN) to reap the benefits of both the multi-stream and the single-stream methods. MF-GAN is a single-stream model but achieves the utilization of both coarse and fine-grained textual information with the use of conditional residual block and dual attention block. More specifically, the sentence and word features are repeatedly inputted into different model stages for textual information enhancement. Furthermore, we introduce a triple loss to close the visual gap between the synthesized image and its positive image and enlarge the gap to its negative image. To thoroughly verify our method, we conduct extensive experiments on two benchmarked CUB and COCO datasets. Experimental results show that the proposed MF-GAN outperforms the state-of-the-art methods.
引用
收藏
页码:41 / 53
页数:13
相关论文
共 50 条
  • [41] Multi-scale dual-modal generative adversarial networks for text-to-image synthesis
    Jiang, Bin
    Huang, Yun
    Huang, Wei
    Yang, Chao
    Xu, Fangqiang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15061 - 15077
  • [42] Instance Mask Embedding and Attribute-Adaptive Generative Adversarial Network for Text-to-Image Synthesis
    Ni, Jiancheng
    Zhang, Susu
    Zhou, Zili
    Hou, Jie
    Gao, Feng
    IEEE ACCESS, 2020, 8 (08): : 37697 - 37711
  • [43] Multi-scale dual-modal generative adversarial networks for text-to-image synthesis
    Bin Jiang
    Yun Huang
    Wei Huang
    Chao Yang
    Fangqiang Xu
    Multimedia Tools and Applications, 2023, 82 : 15061 - 15077
  • [44] Generative adversarial text-to-image generation with style image constraint
    Zekang Wang
    Li Liu
    Huaxiang Zhang
    Dongmei Liu
    Yu Song
    Multimedia Systems, 2023, 29 : 3291 - 3303
  • [45] Generative adversarial text-to-image generation with style image constraint
    Wang, Zekang
    Liu, Li
    Zhang, Huaxiang
    Liu, Dongmei
    Song, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3291 - 3303
  • [46] Advancements in adversarial generative text-to-image models: a review
    Zaghloul, Rawan
    Rawashdeh, Enas
    Bani-Ata, Tomader
    IMAGING SCIENCE JOURNAL, 2024,
  • [47] Dualattn-GAN: Text to Image Synthesis With Dual Attentional Generative Adversarial Network
    Cai, Yali
    Wang, Xiaoru
    Yu, Zhihong
    Li, Fu
    Xu, Peirong
    Li, Yueli
    Li, Lixian
    IEEE ACCESS, 2019, 7 : 183706 - 183716
  • [48] MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation
    Jia, Xibin
    Mi, Qing
    Dai, Qi
    PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 312 - 322
  • [49] Text-to-image generation method based on single stage generative adversarial network
    Yang B.
    Na W.
    Xiang X.-Q.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (12): : 2412 - 2420
  • [50] Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis
    Gao, Lianli
    Chen, Daiyuan
    Zhao, Zhou
    Shao, Jie
    Shen, Heng Tao
    PATTERN RECOGNITION, 2021, 110