Query-Selected Global Attention for Text guided Image Style Transfer using Diffusion Model

被引：0

作者：

Hwang, Jungmin ^{[1
]}

Lee, Won-Sook ^{[1
]}

机构：

[1] Univ Ottawa, Fac Engn, Sch EECS, Ottawa, ON, Canada

来源：

2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024 | 2024年

关键词：

Diffusion; Style Transfer; Query Selection; Global Attention;

D O I：

10.1109/CAI59869.2024.00207

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Diffusion models have gained tremendous interest in image generation. Additionally, guided text methods for manipulating source images have shown successful progress. However, research on style transfer using diffusion models is still ongoing to address the trade-off between style transfer and content preservation. One representative solution to the issue is contrastive learning in a self-supervised manner, which is useful for extracting specific features from the same location on source and generated images for every pixel. However, there are instances where it is necessary to preserve certain areas, which contain more information from the source image compared to other areas in the image. Therefore, we propose anchoring the areas for preservation and intentionally selecting features at the anchor points through a query-selected global attention method. This enables our method to generate an image that preserves the content of the source while transferring the style without the need for additional fine-tuning or auxiliary network. Our diffusion model follows a simple architecture to enhance image quality and speed up inference time, in comparison to other diffusion methods. Our experimental results also demonstrate superior performance.

引用

页码：1162 / 1166

页数：5

共 50 条

[41] Altered retinal responses in chicken myopia model using image-guided global flash multifocal electroretinogram
Vyas, Sonal
Lakshmanan, Yamunadevi
Chan, Henry Ho-lung
Kee, Chea-Su
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2019, 60 (09)
[42] ExpoGenius: Robust Personalized Human Image Generation using Diffusion Model for Exposure Variation and Pose Transfer
Liu, Depei
Fan, Hongjie
Liu, Junfei
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 239 - 247
[43] StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model
Xu, Zipeng
Sangineto, Enver
Sebe, Nicu
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7567 - 7577
[44] CBCT-based synthetic CT image generation using a diffusion model for CBCT-guided lung radiotherapy
Chen, Xiaoqian
Qiu, Richard L. J.
Peng, Junbo
Shelton, Joseph W.
Chang, Chih-Wei
Yang, Xiaofeng
Kesarwala, Aparna H.
MEDICAL PHYSICS, 2024, 51 (11) : 8168 - 8178
[45] 3D Information Guided Motion Transfer via Sequential Image Based Human Model Refinement and Face-Attention GAN
Xia, Guiyu
Luo, Dong
Zhang, Zeyuan
Sun, Yubao
Liu, Qingshan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3270 - 3283
[46] An image information fusion based simple diffusion network leveraging the segment anything model for guided attention on thermal images producing colorized pedestrian masks
Goswami, Suranjan
Singh, Satish Kumar
INFORMATION FUSION, 2025, 113
[47] TRANS-VQA: Fully Transformer-Based Image Question-Answering Model Using Question-guided Vision Attention
Koshti D.
Gupta A.
Kalla M.
Sharma A.
Inteligencia Artificial, 2024, 27 (73) : 111 - 128
[48] SPRINT: Spectra Preserving Radiance Image Fusion Technique using holistic deep edge spatial attention and Minnaert guided Bayesian probabilistic model
Misra, Indranil
Rohil, Mukesh Kumar
Moorthi, S. Manthira
Dhar, Debajyoti
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 113
[49] TRANS-VQA: Fully Transformer-Based Image Question-Answering Model Using Question-guided Vision Attention
Koshti, Dipali
Gupta, Ashutosh
Kalla, Mukesh
Sharma, Arvind
INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2024, 27 (73): : 111 - 128
[50] DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
Kim, Gwanghyun
Chun, Se Young
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14203 - 14213

← 1 2 3 4 5 →