MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

被引:4
|
作者
Yang, Yuyan [1 ,2 ]
Ni, Xin [1 ,2 ]
Hao, Yanbin [1 ,2 ]
Liu, Chenyu [3 ]
Wang, Wenshan [3 ]
Liu, Yifeng [3 ]
Xie, Haiyong [2 ,4 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Anhui, Peoples R China
[2] Minist Culture & Tourism, Key Lab Cyberculture Content Cognit & Detect, Hefei 230026, Anhui, Peoples R China
[3] Natl Engn Lab Risk Percept & Prevent NEL RPP, Beijing 100041, Peoples R China
[4] Capital Med Univ, Adv Innovat Ctr Human Brain Protect, Beijing 100069, Peoples R China
来源
关键词
Text-to-Image; GAN; Triplet loss;
D O I
10.1007/978-3-030-98358-1_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of text-to-image synthesis has been significantly boosted accompanied by the development of generative adversarial network (GAN) techniques. The current GAN-based methods for text-to-image generation mainly adopt multiple generator-discriminator pairs to explore the coarse/fine-grained textual content (e.g., words and sentences); however, they only consider the semantic consistency between the text-image pair. One drawback of such a multi-stream structure is that it results in many heavyweight models. In comparison, the single-stream counterpart bears the weakness of insufficient use of texts. To alleviate the above problems, we propose a Multi-conditional Fusion GAN (MF-GAN) to reap the benefits of both the multi-stream and the single-stream methods. MF-GAN is a single-stream model but achieves the utilization of both coarse and fine-grained textual information with the use of conditional residual block and dual attention block. More specifically, the sentence and word features are repeatedly inputted into different model stages for textual information enhancement. Furthermore, we introduce a triple loss to close the visual gap between the synthesized image and its positive image and enlarge the gap to its negative image. To thoroughly verify our method, we conduct extensive experiments on two benchmarked CUB and COCO datasets. Experimental results show that the proposed MF-GAN outperforms the state-of-the-art methods.
引用
收藏
页码:41 / 53
页数:13
相关论文
共 50 条
  • [1] Multi-Conditional Generative Adversarial Network for Text-to-Video Synthesis
    Zhou R.
    Jiang C.
    Xu Q.
    Li Y.
    Zhang C.
    Song Y.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (10): : 1567 - 1579
  • [2] CT-GAN: A conditional Generative Adversarial Network of transformer architecture for text-to-image
    Zhang, Xin
    Jiao, Wentao
    Wang, Bing
    Tian, Xuedong
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 115
  • [3] Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks
    Tan, Yong Xuan
    Lee, Chin Poo
    Neo, Mai
    Lim, Kian Ming
    Lim, Jit Yan
    IAENG International Journal of Computer Science, 2022, 49 (01) : 1 - 7
  • [4] Multi-Semantic Fusion Generative Adversarial Network for Text-to-Image Generation
    Huang, Pingda
    Liu, Yedan
    Fu, Chunjiang
    Zhao, Liang
    2023 IEEE 8TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS, ICBDA, 2023, : 159 - 164
  • [5] SF-GAN: Semantic fusion generative adversarial networks for text-to-image synthesis
    Yang, Bing
    Xiang, Xueqin
    Kong, Wanzeng
    Zhang, Jianhai
    Yao, Jinliang
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 262
  • [6] CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis
    Zhang, Yubo
    Han, Shuang
    Zhang, Zhongxin
    Wang, Jianyang
    Bi, Hongbo
    VISUAL COMPUTER, 2023, 39 (04): : 1283 - 1293
  • [7] CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis
    Yubo Zhang
    Shuang Han
    Zhongxin Zhang
    Jianyang Wang
    Hongbo Bi
    The Visual Computer, 2023, 39 : 1283 - 1293
  • [8] DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for Text-to-Image Synthesis
    Yang, Bing
    Xiang, Xueqin
    Kong, Wangzeng
    Zhang, Jianhai
    Peng, Yong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6956 - 6967
  • [9] Survey About Generative Adversarial Network and Text-to-Image Synthesis
    Lai, Lina
    Mi, Yu
    Zhou, Longlong
    Rao, Jiyong
    Xu, Tianyang
    Song, Xiaoning
    Computer Engineering and Applications, 2023, 59 (19): : 21 - 39
  • [10] SAW-GAN: Multi-granularity Text Fusion Generative Adversarial Networks for text-to-image generation
    Jin, Dehu
    Yu, Qi
    Yu, Lan
    Qi, Meng
    KNOWLEDGE-BASED SYSTEMS, 2024, 294