Locally controllable network based on visual-linguistic relation alignment for text-to-image generation

被引：0

作者：

Li, Zaike ^{[1
]}

Liu, Li ^{[1
]}

Zhang, Huaxiang ^{[1
]}

Liu, Dongmei ^{[1
]}

Song, Yu ^{[1
]}

Li, Boqun ^{[1
]}

机构：

[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250014, Shandong, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Text-to-image generation; Image-text matching; Generative adversarial network; Local control;

D O I：

10.1007/s00530-023-01222-7

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual-linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual-linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.

引用

页数：13

共 50 条

[41] A hybrid network security algorithm based on Diffie Hellman and Text-to-Image Encryption algorithm
Abusukhon, Ahmad
Anwar, Muhammad Naveed
Mohammad, Zeyad
Alghannam, Bareeq
JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2019, 22 (01): : 65 - 81
[42] Text-to-Image Person Re-Identification Based on Multimodal Graph Convolutional Network
Han, Guang
Lin, Min
Li, Ziyang
Zhao, Haitao
Kwong, Sam
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6025 - 6036
[43] A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching
Shang, Heng
Zhao, Guoshuai
Shi, Jing
Qian, Xueming
IEEE INTELLIGENT SYSTEMS, 2023, 38 (03) : 41 - 50
[44] Cross-modal Feature Alignment based Hybrid Attentional Generative Adversarial Networks for text-to-image synthesis
Cheng, Qingrong
Gu, Xiaodong
DIGITAL SIGNAL PROCESSING, 2020, 107
[45] High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
Hsu, Wei-Yen
Lin, Jing-Wen
APPLIED SCIENCES-BASEL, 2025, 15 (02):
[46] Controllable smoke image generation network based on smoke imaging principle
Huanjie Tao
Jing Wang
Zhouxin Xin
Multimedia Tools and Applications, 2023, 82 : 16057 - 16079
[47] Controllable smoke image generation network based on smoke imaging principle
Tao, Huanjie
Wang, Jing
Xin, Zhouxin
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16057 - 16079
[48] RII-GAN: Multi-scaled Aligning-Based Reversed Image Interaction Network for Text-to-Image Synthesis
Haofei Yuan
Hongqing Zhu
Suyi Yang
Ziying Wang
Nan Wang
Neural Processing Letters, 56
[49] RII-GAN: Multi-scaled Aligning-Based Reversed Image Interaction Network for Text-to-Image Synthesis
Yuan, Haofei
Zhu, Hongqing
Yang, Suyi
Wang, Ziying
Wang, Nan
NEURAL PROCESSING LETTERS, 2024, 56 (01)
[50] An Improved SAR Ship Classification Method Using Text-to-Image Generation-Based Data Augmentation and Squeeze and Excitation
Wang, Lu
Qi, Yuhang
Mathiopoulos, P. Takis
Zhao, Chunhui
Mazhar, Suleman
REMOTE SENSING, 2024, 16 (07)

← 1 2 3 4 5 →