COMIM-GAN: Improved Text-to-Image Generation via Condition Optimization and Mutual Information Maximization

被引：0

作者：

Zhou, Longlong ^{[1
]}

Wu, Xiao-Jun ^{[1
]}

Xu, Tianyang ^{[1
]}

机构：

[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Peoples R China

来源：

MULTIMEDIA MODELING, MMM 2023, PT I | 2023年 / 13833卷

关键词：

cGAN; Condition optimization; Mutual information; maximization;

D O I：

10.1007/978-3-031-27077-2_30

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Language-based image generation is a challenging task. Current studies normally employ conditional generative adversarial network (cGAN) as the model framework and have achieved significant progress. Nonetheless, a close examination of their methods reveals two fundamental issues. First, the discrete linguistic conditions make the training of cGAN extremely difficult and impair the generalization performance of cGAN. Second, the conditional discriminator cannot extract semantically consistent features based on linguistic conditions, which is not conducive to conditional discrimination. To address these issues, we propose a condition optimization and mutual information maximization GAN (COMIM-GAN). To be specific, we design (1) a text condition construction module, which can construct a compact linguistic condition space, and (2) a mutual information loss between images and linguistic conditions to motivate the discriminator to extract more features associated with the linguistic conditions. Extensive experiments on CUB-200 and MS-COCO datasets demonstrate that our method is superior to the existing methods.

引用

页码：385 / 396

页数：12

共 44 条

[1] Text-to-image generation combined with mutual information maximization
Mo J.
Xu K.
Lin L.
Ouyang N.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (05): : 180 - 188
[2] Diversified text-to-image generation via deep mutual information estimation
Li, Ailin
Zhao, Lei
Zuo, Zhiwen
Wang, Zhizhong
Chen, Haibo
Lu, Dongming
Xing, Wei
COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 211
[3] InfoMax-GAN: Improved Adversarial Image Generation via Information Maximization and Contrastive Learning
Lee, Kwot Sin
Ngoc-Trung Tran
Ngai-Man Cheung
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3941 - 3951
[4] DR-GAN: Distribution Regularization for Text-to-Image Generation
Tan, Hongchen
Liu, Xiuping
Yin, Baocai
Li, Xin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10309 - 10323
[5] CogView: Mastering Text-to-Image Generation via Transformers
Ding, Ming
Yang, Zhuoyi
Hong, Wenyi
Zheng, Wendi
Zhou, Chang
Yin, Da
Lin, Junyang
Zou, Xu
Shao, Zhou
Yang, Hongxia
Tang, Jie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[6] AraBERT and DF-GAN fusion for Arabic text-to-image generation
Bahani, Mourad
El Ouaazizi, Aziza
Maalmi, Khalil
Array, 2022, 16
[7] AraBERT and DF-GAN fusion for Arabic text-to-image generation
Bahani, Mourad
El Ouaazizi, Aziza
Maalmi, Khalil
ARRAY, 2022, 16
[8] Stacking VAE and GAN for Context-aware Text-to-Image Generation
Zhang, Chenrui
Peng, Yuxin
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
[9] Text-to-Image Generation via Semi-Supervised Training
Ji, Zhongyi
Wang, Wenmin
Chen, Baoyang
Han, Xiao
2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 265 - 268
[10] Muse: Text-To-Image Generation via Masked Generative Transformers
Chang, Huiwen
Zhang, Han
Barber, Jarred
Maschinot, A. J.
Lezama, Jose
Jiang, Lu
Yang, Ming-Hsuan
Murphy, Kevin
Freeman, William T.
Rubinstein, Michael
Li, Yuanzhen
Krishnan, Dilip
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202

← 1 2 3 4 5 →