Prompt-Based Learning for Image Variation Using Single Image Multi-Scale Diffusion Models

被引:0
|
作者
Park, Jiwon [1 ]
Jeong, Dasol [2 ]
Lee, Hyebean [2 ]
Han, Seunghee [2 ]
Paik, Joonki [1 ,2 ]
机构
[1] Chung Ang Univ, Dept Artificial Intelligence, Seoul 06974, South Korea
[2] Chung Ang Univ, Dept Image, Seoul 06974, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
基金
新加坡国家研究基金会;
关键词
Training; Computational modeling; Periodic structures; Diffusion models; Data models; Image synthesis; Adaptation models; Noise reduction; Feature extraction; Context modeling; Single image generation; prompt-based learning; text guided image editing;
D O I
10.1109/ACCESS.2024.3487215
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel technique for a multi-scale framework with text-based learning using a single image to perform variations and text-based editing of the input image. Our approach captures the detailed internal information of a single image, enabling numerous variations while preserving the original features. In addition, text-conditioned learning provides a method to combine text and images to effectively perform text-based editing based on a single image. We propose a technique that integrates the diffusion U-Net structure within a multi-scale framework to accurately capture the quality and internal structure of an image from a single image and perform diverse variations while maintaining the features of the original image. Additionally, we utilized a pre-trained Bootstrapped Language-Image Pretraining (BLIP) model to generate various prompts for effective text-based editing, and we fed the prompts that most closely resembled the input image into the training process using Contrastive Language-Image Pretraining (CLIP)'s prior knowledge. To improve accuracy during the image editing stage, we designed a contrastive loss function to enhance the relevance between the prompt and the image. As a result, we improved the performance of learning between text and images, and through various experiments, we demonstrated its effectiveness on text-based image editing tasks. Our experiments show that the proposed method significantly improves the performance of single-image-based generative models and presents new possibilities in the field of text-based image editing.
引用
收藏
页码:158810 / 158823
页数:14
相关论文
共 50 条
  • [1] Prompt-Based Learning for Unpaired Image Captioning
    Zhu, Peipei
    Wang, Xiao
    Zhu, Lin
    Sun, Zhenglong
    Zheng, Wei-Shi
    Wang, Yaowei
    Chen, Changwen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 379 - 393
  • [2] Prompt learning and multi-scale attention for infrared and visible image fusion
    Li, Yanan
    Ji, Qingtao
    Jiao, Shaokang
    INFRARED PHYSICS & TECHNOLOGY, 2025, 145
  • [3] Single image dehazing based on multi-scale segmentation and deep learning
    Yu, Tianhe
    Zhu, Ming
    Chen, Haiming
    MACHINE VISION AND APPLICATIONS, 2022, 33 (02)
  • [4] Single image dehazing based on multi-scale segmentation and deep learning
    Tianhe Yu
    Ming Zhu
    Haiming Chen
    Machine Vision and Applications, 2022, 33
  • [5] Multi-scale transformer with conditioned prompt for image deraining
    Wu, Xianhao
    Chen, Hongming
    Chen, Xiang
    Xu, Guili
    DIGITAL SIGNAL PROCESSING, 2025, 156
  • [6] Multi-scale network for single image deblurring based on ensemble learning module
    Wu W.
    Pan Y.
    Su N.
    Wang J.
    Wu S.
    Xu Z.
    Yu Y.
    Liu Y.
    Multimedia Tools and Applications, 2025, 84 (11) : 9045 - 9064
  • [7] Multi-Scale Deep Residual Learning-Based Single Image Haze Removal via Image Decomposition
    Yeh, Chia-Hung
    Huang, Chih-Hsiang
    Kang, Li-Wei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3153 - 3167
  • [8] Hyperspectral Image Reconstruction Using Multi-scale Fusion Learning
    Han, Xian-Hua
    Zheng, Yinqiang
    Chen, Yen-Wei
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (01)
  • [9] Image Annotation using Multi-scale Hypergraph Heat Diffusion Framework
    Murthy, Venkatesh N.
    Sharma, Avinash
    Chari, Visesh
    Manmatha, R.
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 299 - 303
  • [10] Single Image Dehazing by Multi-Scale Fusion
    Ancuti, Codruta Orniana
    Ancuti, Cosmin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3271 - 3282