Locally controllable network based on visual-linguistic relation alignment for text-to-image generation

被引：0

作者：

Li, Zaike ^{[1
]}

Liu, Li ^{[1
]}

Zhang, Huaxiang ^{[1
]}

Liu, Dongmei ^{[1
]}

Song, Yu ^{[1
]}

Li, Boqun ^{[1
]}

机构：

[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250014, Shandong, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Text-to-image generation; Image-text matching; Generative adversarial network; Local control;

D O I：

10.1007/s00530-023-01222-7

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual-linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual-linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.

引用

页数：13

共 50 条

[31] Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation
Pan, Zhihong
Zhou, Xin
Tian, Hao
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4450 - 4460
[32] Text-to-image Generation Model Based on Diffusion Wasserstein Generative Adversarial Networks
Zhao H.
Li W.
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2023, 45 (12): : 4371 - 4381
[33] Interpolating the Text-to-Image Correspondence Based on Phonetic and Phonological Similarities for Nonword-to-Image Generation
Matsuhira, Chihaya
Kastner, Marc A.
Komamizu, Takahiro
Hirayama, Takatsugu
Doman, Keisuke
Kawanishi, Yasutomo
Ide, Ichiro
IEEE ACCESS, 2024, 12 : 41299 - 41316
[34] Text-to-image generation method based on self-supervised attention and image features fusion
Liao, Yonghui
Zhang, Haitao
Jin, Haibo
CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2024, 39 (02) : 180 - 191
[35] Text-to-image synthesis based on modified deep convolutional generative adversarial network
Li Y.
Zhu M.
Ren J.
Su X.
Zhou X.
Yu H.
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2023, 49 (08): : 1875 - 1883
[36] DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for Text-to-Image Generation
Wang, Zhiwei
Yang, Jing
Cui, Jiajun
Liu, Jiawei
Wang, Jiahao
COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 3 - 19
[37] Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models
Zhang, Guangzi
Qian, Yulin
Deng, Juntao
Cai, Xingquan
APPLIED SCIENCES-BASEL, 2024, 14 (08):
[38] DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation
Huang, Mengqi
Mao, Zhendong
Wang, Penghui
Wang, Quan
Zhang, Yongdong
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4345 - 4354
[39] Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Gafni, Oran
Polyak, Adam
Ashual, Oron
Sheynin, Shelly
Parikh, Devi
Taigman, Yaniv
COMPUTER VISION - ECCV 2022, PT XV, 2022, 13675 : 89 - 106
[40] Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
Zhang, Yasi
Yu, Peiyu
Wu, Ying Nian
COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 55 - 71

← 1 2 3 4 5 →