Locally controllable network based on visual-linguistic relation alignment for text-to-image generation

被引:0
|
作者
Li, Zaike [1 ]
Liu, Li [1 ]
Zhang, Huaxiang [1 ]
Liu, Dongmei [1 ]
Song, Yu [1 ]
Li, Boqun [1 ]
机构
[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250014, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Text-to-image generation; Image-text matching; Generative adversarial network; Local control;
D O I
10.1007/s00530-023-01222-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual-linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual-linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation
    Pan, Zhihong
    Zhou, Xin
    Tian, Hao
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4450 - 4460
  • [32] Text-to-image Generation Model Based on Diffusion Wasserstein Generative Adversarial Networks
    Zhao H.
    Li W.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2023, 45 (12): : 4371 - 4381
  • [33] Interpolating the Text-to-Image Correspondence Based on Phonetic and Phonological Similarities for Nonword-to-Image Generation
    Matsuhira, Chihaya
    Kastner, Marc A.
    Komamizu, Takahiro
    Hirayama, Takatsugu
    Doman, Keisuke
    Kawanishi, Yasutomo
    Ide, Ichiro
    IEEE ACCESS, 2024, 12 : 41299 - 41316
  • [34] Text-to-image generation method based on self-supervised attention and image features fusion
    Liao, Yonghui
    Zhang, Haitao
    Jin, Haibo
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2024, 39 (02) : 180 - 191
  • [35] Text-to-image synthesis based on modified deep convolutional generative adversarial network
    Li Y.
    Zhu M.
    Ren J.
    Su X.
    Zhou X.
    Yu H.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2023, 49 (08): : 1875 - 1883
  • [36] DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for Text-to-Image Generation
    Wang, Zhiwei
    Yang, Jing
    Cui, Jiajun
    Liu, Jiawei
    Wang, Jiahao
    COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 3 - 19
  • [37] Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models
    Zhang, Guangzi
    Qian, Yulin
    Deng, Juntao
    Cai, Xingquan
    APPLIED SCIENCES-BASEL, 2024, 14 (08):
  • [38] DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation
    Huang, Mengqi
    Mao, Zhendong
    Wang, Penghui
    Wang, Quan
    Zhang, Yongdong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4345 - 4354
  • [39] Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
    Gafni, Oran
    Polyak, Adam
    Ashual, Oron
    Sheynin, Shelly
    Parikh, Devi
    Taigman, Yaniv
    COMPUTER VISION - ECCV 2022, PT XV, 2022, 13675 : 89 - 106
  • [40] Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
    Zhang, Yasi
    Yu, Peiyu
    Wu, Ying Nian
    COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 55 - 71