Locally controllable network based on visual-linguistic relation alignment for text-to-image generation

被引：0

作者：

Li, Zaike ^{[1
]}

Liu, Li ^{[1
]}

Zhang, Huaxiang ^{[1
]}

Liu, Dongmei ^{[1
]}

Song, Yu ^{[1
]}

Li, Boqun ^{[1
]}

机构：

[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250014, Shandong, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Text-to-image generation; Image-text matching; Generative adversarial network; Local control;

D O I：

10.1007/s00530-023-01222-7

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual-linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual-linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.

引用

页数：13

共 50 条

[1] Locally controllable network based on visual–linguistic relation alignment for text-to-image generation
Zaike Li
Li Liu
Huaxiang Zhang
Dongmei Liu
Yu Song
Boqun Li
Multimedia Systems, 2024, 30
[2] Visual-Linguistic Alignment and Composition for Image Retrieval with Text Feedback
Li, Dafeng
Zhu, Yingying
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 108 - 113
[3] Controllable Text-to-Image Generation
Li, Bowen
Qi, Xiaojuan
Lukasiewicz, Thomas
Torr, Philip H. S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] Visual Programming for Text-to-Image Generation and Evaluation
Cho, Jaemin
Zala, Abhay
Bansal, Mohit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Visual question answering based evaluation metrics for text-to-image generation
Miyamoto, Mizuki
Morita, Ryugo
Zhou, Jinjia
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[6] Generative adversarial network based on semantic consistency for text-to-image generation
Yue Ma
Li Liu
Huaxiang Zhang
Chunjing Wang
Zekang Wang
Applied Intelligence, 2023, 53 : 4703 - 4716
[7] Generative adversarial network based on semantic consistency for text-to-image generation
Ma, Yue
Liu, Li
Zhang, Huaxiang
Wang, Chunjing
Wang, Zekang
APPLIED INTELLIGENCE, 2023, 53 (04) : 4703 - 4716
[8] Triangle-Reward Reinforcement Learning: Visual-Linguistic Semantic Alignment for Image Captioning
Nie, Weizhi
Li, Jiesi
Xu, Ning
Liu, An-An
Li, Xuanya
Zhang, Yongdong
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4510 - 4518
[9] Text-to-image generation method based on single stage generative adversarial network
Yang B.
Na W.
Xiang X.-Q.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (12): : 2412 - 2420
[10] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
Xue Z.
Xu Z.
Lang C.
Feng S.
Wang T.
Li Y.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190

← 1 2 3 4 5 →