Locally controllable network based on visual-linguistic relation alignment for text-to-image generation

被引:0
|
作者
Li, Zaike [1 ]
Liu, Li [1 ]
Zhang, Huaxiang [1 ]
Liu, Dongmei [1 ]
Song, Yu [1 ]
Li, Boqun [1 ]
机构
[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250014, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Text-to-image generation; Image-text matching; Generative adversarial network; Local control;
D O I
10.1007/s00530-023-01222-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual-linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual-linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Locally controllable network based on visual–linguistic relation alignment for text-to-image generation
    Zaike Li
    Li Liu
    Huaxiang Zhang
    Dongmei Liu
    Yu Song
    Boqun Li
    Multimedia Systems, 2024, 30
  • [2] Visual-Linguistic Alignment and Composition for Image Retrieval with Text Feedback
    Li, Dafeng
    Zhu, Yingying
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 108 - 113
  • [3] Controllable Text-to-Image Generation
    Li, Bowen
    Qi, Xiaojuan
    Lukasiewicz, Thomas
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Visual Programming for Text-to-Image Generation and Evaluation
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Visual question answering based evaluation metrics for text-to-image generation
    Miyamoto, Mizuki
    Morita, Ryugo
    Zhou, Jinjia
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [6] Generative adversarial network based on semantic consistency for text-to-image generation
    Yue Ma
    Li Liu
    Huaxiang Zhang
    Chunjing Wang
    Zekang Wang
    Applied Intelligence, 2023, 53 : 4703 - 4716
  • [7] Generative adversarial network based on semantic consistency for text-to-image generation
    Ma, Yue
    Liu, Li
    Zhang, Huaxiang
    Wang, Chunjing
    Wang, Zekang
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4703 - 4716
  • [8] Triangle-Reward Reinforcement Learning: Visual-Linguistic Semantic Alignment for Image Captioning
    Nie, Weizhi
    Li, Jiesi
    Xu, Ning
    Liu, An-An
    Li, Xuanya
    Zhang, Yongdong
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4510 - 4518
  • [9] Text-to-image generation method based on single stage generative adversarial network
    Yang B.
    Na W.
    Xiang X.-Q.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (12): : 2412 - 2420
  • [10] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
    Xue Z.
    Xu Z.
    Lang C.
    Feng S.
    Wang T.
    Li Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190