Locally controllable network based on visual-linguistic relation alignment for text-to-image generation

被引:0
|
作者
Li, Zaike [1 ]
Liu, Li [1 ]
Zhang, Huaxiang [1 ]
Liu, Dongmei [1 ]
Song, Yu [1 ]
Li, Boqun [1 ]
机构
[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250014, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Text-to-image generation; Image-text matching; Generative adversarial network; Local control;
D O I
10.1007/s00530-023-01222-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual-linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual-linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] A hybrid network security algorithm based on Diffie Hellman and Text-to-Image Encryption algorithm
    Abusukhon, Ahmad
    Anwar, Muhammad Naveed
    Mohammad, Zeyad
    Alghannam, Bareeq
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2019, 22 (01): : 65 - 81
  • [42] Text-to-Image Person Re-Identification Based on Multimodal Graph Convolutional Network
    Han, Guang
    Lin, Min
    Li, Ziyang
    Zhao, Haitao
    Kwong, Sam
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6025 - 6036
  • [43] A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching
    Shang, Heng
    Zhao, Guoshuai
    Shi, Jing
    Qian, Xueming
    IEEE INTELLIGENT SYSTEMS, 2023, 38 (03) : 41 - 50
  • [44] Cross-modal Feature Alignment based Hybrid Attentional Generative Adversarial Networks for text-to-image synthesis
    Cheng, Qingrong
    Gu, Xiaodong
    DIGITAL SIGNAL PROCESSING, 2020, 107
  • [45] High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
    Hsu, Wei-Yen
    Lin, Jing-Wen
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [46] Controllable smoke image generation network based on smoke imaging principle
    Huanjie Tao
    Jing Wang
    Zhouxin Xin
    Multimedia Tools and Applications, 2023, 82 : 16057 - 16079
  • [47] Controllable smoke image generation network based on smoke imaging principle
    Tao, Huanjie
    Wang, Jing
    Xin, Zhouxin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16057 - 16079
  • [48] RII-GAN: Multi-scaled Aligning-Based Reversed Image Interaction Network for Text-to-Image Synthesis
    Haofei Yuan
    Hongqing Zhu
    Suyi Yang
    Ziying Wang
    Nan Wang
    Neural Processing Letters, 56
  • [49] RII-GAN: Multi-scaled Aligning-Based Reversed Image Interaction Network for Text-to-Image Synthesis
    Yuan, Haofei
    Zhu, Hongqing
    Yang, Suyi
    Wang, Ziying
    Wang, Nan
    NEURAL PROCESSING LETTERS, 2024, 56 (01)
  • [50] An Improved SAR Ship Classification Method Using Text-to-Image Generation-Based Data Augmentation and Squeeze and Excitation
    Wang, Lu
    Qi, Yuhang
    Mathiopoulos, P. Takis
    Zhao, Chunhui
    Mazhar, Suleman
    REMOTE SENSING, 2024, 16 (07)