Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation

被引:7
|
作者
Tan, Zhaorui [1 ,2 ]
Yang, Xi [1 ]
Ye, Zihan [1 ]
Wang, Qiufeng [1 ]
Yan, Yuyao [1 ]
Nguyen, Anh [2 ]
Huang, Kaizhu [3 ]
机构
[1] Xian Jiaotong Liverpool Univ, Suzhou, Jiangsu, Peoples R China
[2] Univ Liverpool, Liverpool, England
[3] Duke Kunshan Univ, Suzhou, Jiangsu, Peoples R China
关键词
Text-to-image; Image generation; Generative adversarial networks; Semantic consistency;
D O I
10.1016/j.patcog.2023.109883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating high-quality images from text remains a challenge in visual-language understanding, with text image consistency being a major concern. Particularly, the most popular metric R-precision may not accurately reflect the text-image consistency, leading to misleading semantics in generated images. Albeit its significance, designing a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric, Semantic Similarity Distance (SSD), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. We also introduce Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which use two novel components to mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments indicate that, under the guidance of SSD, our developed PDF-GAN can induce remarkable enhancements in the consistency between texts and images while preserving acceptable image quality over the CUB and COCO datasets.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Prompt Refinement with Image Pivot for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Pan, Yingwei
    Yao, Ting
    Mao, Jiaxin
    Ma, Shaoping
    Mei, Tao
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 941 - 954
  • [22] QuTI! Quantifying Text-Image Consistency in Multimodal Documents
    Springstein, Matthias
    Mueller-Budack, Eric
    Ewerth, Ralph
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2575 - 2579
  • [23] Text-image communication, image-text communication
    Münkner, J
    ZEITSCHRIFT FUR GERMANISTIK, 2004, 14 (02): : 454 - 455
  • [24] MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask
    Zhou, Yupeng
    Zhou, Daquan
    Wang, Yaxing
    Feng, Jiashi
    Hou, Qibin
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 2805 - 2824
  • [25] Generative adversarial text-to-image generation with style image constraint
    Zekang Wang
    Li Liu
    Huaxiang Zhang
    Dongmei Liu
    Yu Song
    Multimedia Systems, 2023, 29 : 3291 - 3303
  • [26] Generative adversarial text-to-image generation with style image constraint
    Wang, Zekang
    Liu, Li
    Zhang, Huaxiang
    Liu, Dongmei
    Song, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3291 - 3303
  • [27] Zero-Shot Text-to-Image Generation
    Ramesh, Aditya
    Pavlov, Mikhail
    Goh, Gabriel
    Gray, Scott
    Voss, Chelsea
    Radford, Alec
    Chen, Mark
    Sutskever, Ilya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [28] Dense Text-to-Image Generation with Attention Modulation
    Kim, Yunji
    Lee, Jiyoung
    Kim, Jin-Hwa
    Ha, Jung-Woo
    Zhu, Jun-Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7667 - 7677
  • [29] MirrorGAN: Learning Text-to-image Generation by Redescription
    Qiao, Tingting
    Zhang, Jing
    Xu, Duanqing
    Tao, Dacheng
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1505 - 1514
  • [30] Visual Programming for Text-to-Image Generation and Evaluation
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,