Query-Selected Global Attention for Text guided Image Style Transfer using Diffusion Model

被引:0
|
作者
Hwang, Jungmin [1 ]
Lee, Won-Sook [1 ]
机构
[1] Univ Ottawa, Fac Engn, Sch EECS, Ottawa, ON, Canada
关键词
Diffusion; Style Transfer; Query Selection; Global Attention;
D O I
10.1109/CAI59869.2024.00207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion models have gained tremendous interest in image generation. Additionally, guided text methods for manipulating source images have shown successful progress. However, research on style transfer using diffusion models is still ongoing to address the trade-off between style transfer and content preservation. One representative solution to the issue is contrastive learning in a self-supervised manner, which is useful for extracting specific features from the same location on source and generated images for every pixel. However, there are instances where it is necessary to preserve certain areas, which contain more information from the source image compared to other areas in the image. Therefore, we propose anchoring the areas for preservation and intentionally selecting features at the anchor points through a query-selected global attention method. This enables our method to generate an image that preserves the content of the source while transferring the style without the need for additional fine-tuning or auxiliary network. Our diffusion model follows a simple architecture to enhance image quality and speed up inference time, in comparison to other diffusion methods. Our experimental results also demonstrate superior performance.
引用
收藏
页码:1162 / 1166
页数:5
相关论文
共 50 条
  • [41] Altered retinal responses in chicken myopia model using image-guided global flash multifocal electroretinogram
    Vyas, Sonal
    Lakshmanan, Yamunadevi
    Chan, Henry Ho-lung
    Kee, Chea-Su
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2019, 60 (09)
  • [42] ExpoGenius: Robust Personalized Human Image Generation using Diffusion Model for Exposure Variation and Pose Transfer
    Liu, Depei
    Fan, Hongjie
    Liu, Junfei
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 239 - 247
  • [43] StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model
    Xu, Zipeng
    Sangineto, Enver
    Sebe, Nicu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7567 - 7577
  • [44] CBCT-based synthetic CT image generation using a diffusion model for CBCT-guided lung radiotherapy
    Chen, Xiaoqian
    Qiu, Richard L. J.
    Peng, Junbo
    Shelton, Joseph W.
    Chang, Chih-Wei
    Yang, Xiaofeng
    Kesarwala, Aparna H.
    MEDICAL PHYSICS, 2024, 51 (11) : 8168 - 8178
  • [45] 3D Information Guided Motion Transfer via Sequential Image Based Human Model Refinement and Face-Attention GAN
    Xia, Guiyu
    Luo, Dong
    Zhang, Zeyuan
    Sun, Yubao
    Liu, Qingshan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3270 - 3283
  • [46] An image information fusion based simple diffusion network leveraging the segment anything model for guided attention on thermal images producing colorized pedestrian masks
    Goswami, Suranjan
    Singh, Satish Kumar
    INFORMATION FUSION, 2025, 113
  • [47] TRANS-VQA: Fully Transformer-Based Image Question-Answering Model Using Question-guided Vision Attention
    Koshti D.
    Gupta A.
    Kalla M.
    Sharma A.
    Inteligencia Artificial, 2024, 27 (73) : 111 - 128
  • [48] SPRINT: Spectra Preserving Radiance Image Fusion Technique using holistic deep edge spatial attention and Minnaert guided Bayesian probabilistic model
    Misra, Indranil
    Rohil, Mukesh Kumar
    Moorthi, S. Manthira
    Dhar, Debajyoti
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 113
  • [49] TRANS-VQA: Fully Transformer-Based Image Question-Answering Model Using Question-guided Vision Attention
    Koshti, Dipali
    Gupta, Ashutosh
    Kalla, Mukesh
    Sharma, Arvind
    INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2024, 27 (73): : 111 - 128
  • [50] DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
    Kim, Gwanghyun
    Chun, Se Young
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14203 - 14213