Prompt-Based Learning for Image Variation Using Single Image Multi-Scale Diffusion Models

被引：0

作者：

Park, Jiwon ^{[1
]}

Jeong, Dasol ^{[2
]}

Lee, Hyebean ^{[2
]}

Han, Seunghee ^{[2
]}

Paik, Joonki ^{[1
,2
]}

机构：

[1] Chung Ang Univ, Dept Artificial Intelligence, Seoul 06974, South Korea

[2] Chung Ang Univ, Dept Image, Seoul 06974, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

新加坡国家研究基金会;

关键词：

Training; Computational modeling; Periodic structures; Diffusion models; Data models; Image synthesis; Adaptation models; Noise reduction; Feature extraction; Context modeling; Single image generation; prompt-based learning; text guided image editing;

D O I：

10.1109/ACCESS.2024.3487215

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a novel technique for a multi-scale framework with text-based learning using a single image to perform variations and text-based editing of the input image. Our approach captures the detailed internal information of a single image, enabling numerous variations while preserving the original features. In addition, text-conditioned learning provides a method to combine text and images to effectively perform text-based editing based on a single image. We propose a technique that integrates the diffusion U-Net structure within a multi-scale framework to accurately capture the quality and internal structure of an image from a single image and perform diverse variations while maintaining the features of the original image. Additionally, we utilized a pre-trained Bootstrapped Language-Image Pretraining (BLIP) model to generate various prompts for effective text-based editing, and we fed the prompts that most closely resembled the input image into the training process using Contrastive Language-Image Pretraining (CLIP)'s prior knowledge. To improve accuracy during the image editing stage, we designed a contrastive loss function to enhance the relevance between the prompt and the image. As a result, we improved the performance of learning between text and images, and through various experiments, we demonstrated its effectiveness on text-based image editing tasks. Our experiments show that the proposed method significantly improves the performance of single-image-based generative models and presents new possibilities in the field of text-based image editing.

引用

页码：158810 / 158823

页数：14

共 50 条

[1] Prompt-Based Learning for Unpaired Image Captioning
Zhu, Peipei
Wang, Xiao
Zhu, Lin
Sun, Zhenglong
Zheng, Wei-Shi
Wang, Yaowei
Chen, Changwen
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 379 - 393
[2] Prompt learning and multi-scale attention for infrared and visible image fusion
Li, Yanan
Ji, Qingtao
Jiao, Shaokang
INFRARED PHYSICS & TECHNOLOGY, 2025, 145
[3] Single image dehazing based on multi-scale segmentation and deep learning
Yu, Tianhe
Zhu, Ming
Chen, Haiming
MACHINE VISION AND APPLICATIONS, 2022, 33 (02)
[4] Single image dehazing based on multi-scale segmentation and deep learning
Tianhe Yu
Ming Zhu
Haiming Chen
Machine Vision and Applications, 2022, 33
[5] Multi-scale transformer with conditioned prompt for image deraining
Wu, Xianhao
Chen, Hongming
Chen, Xiang
Xu, Guili
DIGITAL SIGNAL PROCESSING, 2025, 156
[6] Multi-scale network for single image deblurring based on ensemble learning module
Wu W.
Pan Y.
Su N.
Wang J.
Wu S.
Xu Z.
Yu Y.
Liu Y.
Multimedia Tools and Applications, 2025, 84 (11) : 9045 - 9064
[7] Multi-Scale Deep Residual Learning-Based Single Image Haze Removal via Image Decomposition
Yeh, Chia-Hung
Huang, Chih-Hsiang
Kang, Li-Wei
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3153 - 3167
[8] Hyperspectral Image Reconstruction Using Multi-scale Fusion Learning
Han, Xian-Hua
Zheng, Yinqiang
Chen, Yen-Wei
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (01)
[9] Image Annotation using Multi-scale Hypergraph Heat Diffusion Framework
Murthy, Venkatesh N.
Sharma, Avinash
Chari, Visesh
Manmatha, R.
ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 299 - 303
[10] Single Image Dehazing by Multi-Scale Fusion
Ancuti, Codruta Orniana
Ancuti, Cosmin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3271 - 3282

← 1 2 3 4 5 →