MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

被引:4
|
作者
Yang, Yuyan [1 ,2 ]
Ni, Xin [1 ,2 ]
Hao, Yanbin [1 ,2 ]
Liu, Chenyu [3 ]
Wang, Wenshan [3 ]
Liu, Yifeng [3 ]
Xie, Haiyong [2 ,4 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Anhui, Peoples R China
[2] Minist Culture & Tourism, Key Lab Cyberculture Content Cognit & Detect, Hefei 230026, Anhui, Peoples R China
[3] Natl Engn Lab Risk Percept & Prevent NEL RPP, Beijing 100041, Peoples R China
[4] Capital Med Univ, Adv Innovat Ctr Human Brain Protect, Beijing 100069, Peoples R China
来源
关键词
Text-to-Image; GAN; Triplet loss;
D O I
10.1007/978-3-030-98358-1_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of text-to-image synthesis has been significantly boosted accompanied by the development of generative adversarial network (GAN) techniques. The current GAN-based methods for text-to-image generation mainly adopt multiple generator-discriminator pairs to explore the coarse/fine-grained textual content (e.g., words and sentences); however, they only consider the semantic consistency between the text-image pair. One drawback of such a multi-stream structure is that it results in many heavyweight models. In comparison, the single-stream counterpart bears the weakness of insufficient use of texts. To alleviate the above problems, we propose a Multi-conditional Fusion GAN (MF-GAN) to reap the benefits of both the multi-stream and the single-stream methods. MF-GAN is a single-stream model but achieves the utilization of both coarse and fine-grained textual information with the use of conditional residual block and dual attention block. More specifically, the sentence and word features are repeatedly inputted into different model stages for textual information enhancement. Furthermore, we introduce a triple loss to close the visual gap between the synthesized image and its positive image and enlarge the gap to its negative image. To thoroughly verify our method, we conduct extensive experiments on two benchmarked CUB and COCO datasets. Experimental results show that the proposed MF-GAN outperforms the state-of-the-art methods.
引用
收藏
页码:41 / 53
页数:13
相关论文
共 50 条
  • [21] Text-to-image synthesis based on modified deep convolutional generative adversarial network
    Li Y.
    Zhu M.
    Ren J.
    Su X.
    Zhou X.
    Yu H.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2023, 49 (08): : 1875 - 1883
  • [22] CD-GAN: Commonsense-Driven Generative Adversarial Network with Hierarchical Refinement for Text-to-Image Synthesis
    Zhang, Guokai
    Xu, Ning
    Yan, Chenggang
    Zheng, Bolun
    Duan, Yulong
    Lv, Bo
    Liu, An-An
    Intelligent Computing, 2023, 2
  • [23] DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for Text-to-Image Generation
    Wang, Zhiwei
    Yang, Jing
    Cui, Jiajun
    Liu, Jiawei
    Wang, Jiahao
    COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 3 - 19
  • [24] DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation
    Huang, Mengqi
    Mao, Zhendong
    Wang, Penghui
    Wang, Quan
    Zhang, Yongdong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4345 - 4354
  • [25] A survey of generative adversarial networks and their application in text-to-image synthesis
    Zeng, Wu
    Zhu, Heng-liang
    Lin, Chuan
    Xiao, Zheng-ying
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (12): : 7142 - 7181
  • [26] TextControlGAN: Text-to-Image Synthesis with Controllable Generative Adversarial Networks
    Ku, Hyeeun
    Lee, Minhyeok
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [27] A Comparative Study of Generative Adversarial Networks for Text-to-Image Synthesis
    Chopra, Muskaan
    Singh, Sunil K.
    Sharma, Akhil
    Gill, Shabeg Singh
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2022, 14 (01):
  • [28] MPS-GAN: A multi-conditional generative adversarial network for simulating input parameters' impact on manufacturing processes
    Ouidadi, Hasnaa
    Guo, Shenghan
    JOURNAL OF MANUFACTURING PROCESSES, 2024, 131 : 1030 - 1045
  • [29] ResFPA-GAN: Text-to-Image Synthesis with Generative Adversarial Network Based on Residual Block Feature Pyramid Attention
    Sun, Jingcong
    Zhou, Yimin
    Zhang, Bin
    2019 IEEE INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS (ARSO), 2019, : 317 - 322
  • [30] Siamese conditional generative adversarial network for multi-focus image fusion
    Huaguang Li
    Wenhua Qian
    Rencan Nie
    Jinde Cao
    Dan Xu
    Applied Intelligence, 2023, 53 : 17492 - 17507