High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network

被引：0

作者：

Hsu, Wei-Yen ^{[1
,2
,3
]}

Lin, Jing-Wen ^{[1
]}

机构：

[1] Natl Chung Cheng Univ, Dept Informat Management, Chiayi 62102, Taiwan

[2] Natl Chung Cheng Univ, Adv Inst Mfg High Tech Innovat, Chiayi 62102, Taiwan

[3] Natl Chung Cheng Univ, Ctr Innovat Res Aging Society CIRAS, Chiayi 62102, Taiwan

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 02期

关键词：

generative adversarial network; text-to-image generation; high detail; feature preservation;

D O I：

10.3390/app15020706

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Multistage text-to-image generation algorithms have shown remarkable success. However, the images produced often lack detail and suffer from feature loss. This is because these methods mainly focus on extracting features from images and text, using only conventional residual blocks for post-extraction feature processing. This results in the loss of features, greatly reducing the quality of the generated images and necessitating more resources for feature calculation, which will severely limit the use and application of optical devices such as cameras and smartphones. To address these issues, the novel High-Detail Feature-Preserving Network (HDFpNet) is proposed to effectively generate high-quality, near-realistic images from text descriptions. The initial text-to-image generation (iT2IG) module is used to generate initial feature maps to avoid feature loss. Next, the fast excitation-and-squeeze feature extraction (FESFE) module is proposed to recursively generate high-detail and feature-preserving images with lower computational costs through three steps: channel excitation (CE), fast feature extraction (FFE), and channel squeeze (CS). Finally, the channel attention (CA) mechanism further enriches the feature details. Compared with the state of the art, experimental results obtained on the CUB-Bird and MS-COCO datasets demonstrate that the proposed HDFpNet achieves better performance and visual presentation, especially regarding high-detail images and feature preservation.

引用

页数：16

共 50 条

[21] A high-quality feature selection method based on frequent and correlated items for text classification
Farghaly, Heba Mamdouh
Abd El-Hafeez, Tarek
SOFT COMPUTING, 2023, 27 (16) : 11259 - 11274
[22] High-quality face image generation based on generative adversarial networks
Zhang, Zhixin
Pan, Xuhua
Jiang, Shuhao
Zhao, Peijun
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71
[23] Contour wavelet diffusion: A fast and high-quality image generation model
Ding, Yaoyao
Zhu, Xiaoxi
Zou, Yuntao
COMPUTATIONAL INTELLIGENCE, 2024, 40 (02)
[24] A high-quality feature selection method based on frequent and correlated items for text classification
Heba Mamdouh Farghaly
Tarek Abd El-Hafeez
Soft Computing, 2023, 27 : 11259 - 11274
[25] Utilizing high-quality feature extension mode to classify chinese short-text
Fan X.
Hu H.
Journal of Networks, 2010, 5 (12) : 1417 - 1425
[26] Cascaded Boundary Network for High-Quality Temporal Action Proposal Generation
Xu, Liang
Wang, Xinggang
Liu, Wenyu
Feng, Bin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (10) : 3702 - 3713
[27] High-quality image resizing using oblique projection operators
Lee, C
Eden, M
Unser, M
IEEE TRANSACTIONS ON IMAGE PROCESSING, 1998, 7 (05) : 679 - 692
[28] HIGH-QUALITY SYNTHETIC SPEECH GENERATION USING SYNCHRONIZED OSCILLATORS
HASHIMOTO, K
MOCHIDA, T
SATO, Y
KOBAYASHI, T
SHIRAI, K
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1993, E76A (11) : 1949 - 1956
[29] Discriminative Region Proposal Adversarial Network for High-Quality Image-to-Image Translation
Chao Wang
Wenjie Niu
Yufeng Jiang
Haiyong Zheng
Zhibin Yu
Zhaorui Gu
Bing Zheng
International Journal of Computer Vision, 2020, 128 : 2366 - 2385
[30] Discriminative Region Proposal Adversarial Network for High-Quality Image-to-Image Translation
Wang, Chao
Niu, Wenjie
Jiang, Yufeng
Zheng, Haiyong
Yu, Zhibin
Gu, Zhaorui
Zheng, Bing
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (10-11) : 2366 - 2385

← 1 2 3 4 5 →