Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation

被引:7
|
作者
Tan, Zhaorui [1 ,2 ]
Yang, Xi [1 ]
Ye, Zihan [1 ]
Wang, Qiufeng [1 ]
Yan, Yuyao [1 ]
Nguyen, Anh [2 ]
Huang, Kaizhu [3 ]
机构
[1] Xian Jiaotong Liverpool Univ, Suzhou, Jiangsu, Peoples R China
[2] Univ Liverpool, Liverpool, England
[3] Duke Kunshan Univ, Suzhou, Jiangsu, Peoples R China
关键词
Text-to-image; Image generation; Generative adversarial networks; Semantic consistency;
D O I
10.1016/j.patcog.2023.109883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating high-quality images from text remains a challenge in visual-language understanding, with text image consistency being a major concern. Particularly, the most popular metric R-precision may not accurately reflect the text-image consistency, leading to misleading semantics in generated images. Albeit its significance, designing a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric, Semantic Similarity Distance (SSD), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. We also introduce Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which use two novel components to mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments indicate that, under the guidance of SSD, our developed PDF-GAN can induce remarkable enhancements in the consistency between texts and images while preserving acceptable image quality over the CUB and COCO datasets.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
    Xue Z.
    Xu Z.
    Lang C.
    Feng S.
    Wang T.
    Li Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
  • [2] Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis
    Park, Minho
    Yun, Jooyeol
    Choi, Seunghwan
    Choo, Jaegul
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7557 - 7566
  • [3] Generative adversarial network based on semantic consistency for text-to-image generation
    Yue Ma
    Li Liu
    Huaxiang Zhang
    Chunjing Wang
    Zekang Wang
    Applied Intelligence, 2023, 53 : 4703 - 4716
  • [4] Generative adversarial network based on semantic consistency for text-to-image generation
    Ma, Yue
    Liu, Li
    Zhang, Huaxiang
    Wang, Chunjing
    Wang, Zekang
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4703 - 4716
  • [5] Semantic Distance Adversarial Learning for Text-to-Image Synthesis
    Yuan, Bowen
    Sheng, Yefei
    Bao, Bing-Kun
    Chen, Yi-Ping Phoebe
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1255 - 1266
  • [6] DTIA: Disruptive Text-Image Alignment for Countering Text-to-Image Diffusion Model Personalization
    Gao, Ya
    Yang, Jing
    Wu, Minghui
    Zhao, Chenxu
    Su, Anyang
    Song, Jie
    Yu, Zitong
    DATA SCIENCE AND ENGINEERING, 2025, 10 (01) : 12 - 23
  • [7] Expressive Text-to-Image Generation with Rich Text
    Ge, Songwei
    Park, Taesung
    Zhu, Jun-Yan
    Huang, Jia-Bin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
  • [8] Controllable Text-to-Image Generation
    Li, Bowen
    Qi, Xiaojuan
    Lukasiewicz, Thomas
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] Surgical text-to-image generation
    Nwoye, Chinedu Innocent
    Bose, Rupak
    Elgohary, Kareem
    Arboit, Lorenzo
    Carlino, Giorgio
    Lavanchy, Joel L.
    Mascagni, Pietro
    Padoy, Nicolas
    PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
  • [10] Sequential Semantic Generative Communication for Progressive Text-to-Image Generation
    Nam, Hyelin
    Park, Jihong
    Choi, Jinho
    Kim, Seong-Lyun
    2023 20TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON SENSING, COMMUNICATION, AND NETWORKING, SECON, 2023,