COMIM-GAN: Improved Text-to-Image Generation via Condition Optimization and Mutual Information Maximization

被引:0
|
作者
Zhou, Longlong [1 ]
Wu, Xiao-Jun [1 ]
Xu, Tianyang [1 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Peoples R China
来源
MULTIMEDIA MODELING, MMM 2023, PT I | 2023年 / 13833卷
关键词
cGAN; Condition optimization; Mutual information; maximization;
D O I
10.1007/978-3-031-27077-2_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language-based image generation is a challenging task. Current studies normally employ conditional generative adversarial network (cGAN) as the model framework and have achieved significant progress. Nonetheless, a close examination of their methods reveals two fundamental issues. First, the discrete linguistic conditions make the training of cGAN extremely difficult and impair the generalization performance of cGAN. Second, the conditional discriminator cannot extract semantically consistent features based on linguistic conditions, which is not conducive to conditional discrimination. To address these issues, we propose a condition optimization and mutual information maximization GAN (COMIM-GAN). To be specific, we design (1) a text condition construction module, which can construct a compact linguistic condition space, and (2) a mutual information loss between images and linguistic conditions to motivate the discriminator to extract more features associated with the linguistic conditions. Extensive experiments on CUB-200 and MS-COCO datasets demonstrate that our method is superior to the existing methods.
引用
收藏
页码:385 / 396
页数:12
相关论文
共 44 条
  • [1] Text-to-image generation combined with mutual information maximization
    Mo J.
    Xu K.
    Lin L.
    Ouyang N.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (05): : 180 - 188
  • [2] Diversified text-to-image generation via deep mutual information estimation
    Li, Ailin
    Zhao, Lei
    Zuo, Zhiwen
    Wang, Zhizhong
    Chen, Haibo
    Lu, Dongming
    Xing, Wei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 211
  • [3] InfoMax-GAN: Improved Adversarial Image Generation via Information Maximization and Contrastive Learning
    Lee, Kwot Sin
    Ngoc-Trung Tran
    Ngai-Man Cheung
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3941 - 3951
  • [4] DR-GAN: Distribution Regularization for Text-to-Image Generation
    Tan, Hongchen
    Liu, Xiuping
    Yin, Baocai
    Li, Xin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10309 - 10323
  • [5] CogView: Mastering Text-to-Image Generation via Transformers
    Ding, Ming
    Yang, Zhuoyi
    Hong, Wenyi
    Zheng, Wendi
    Zhou, Chang
    Yin, Da
    Lin, Junyang
    Zou, Xu
    Shao, Zhou
    Yang, Hongxia
    Tang, Jie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] AraBERT and DF-GAN fusion for Arabic text-to-image generation
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    Array, 2022, 16
  • [7] AraBERT and DF-GAN fusion for Arabic text-to-image generation
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    ARRAY, 2022, 16
  • [8] Stacking VAE and GAN for Context-aware Text-to-Image Generation
    Zhang, Chenrui
    Peng, Yuxin
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [9] Text-to-Image Generation via Semi-Supervised Training
    Ji, Zhongyi
    Wang, Wenmin
    Chen, Baoyang
    Han, Xiao
    2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 265 - 268
  • [10] Muse: Text-To-Image Generation via Masked Generative Transformers
    Chang, Huiwen
    Zhang, Han
    Barber, Jarred
    Maschinot, A. J.
    Lezama, Jose
    Jiang, Lu
    Yang, Ming-Hsuan
    Murphy, Kevin
    Freeman, William T.
    Rubinstein, Michael
    Li, Yuanzhen
    Krishnan, Dilip
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202