Text-free diffusion inpainting using reference images for enhanced visual fidelity

被引：0

作者：

Kim, Beomjo ^{[1
]}

Sohn, Kyung-Ah ^{[1
]}

机构：

[1] Ajou Univ, Dept Artificial Intelligence, 206 World Cup Ro, Suwon 16499, Gyeonggi Do, South Korea

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 186卷

基金：

新加坡国家研究基金会;

关键词：

Diffusion models; Image generation; Image inpainting; Subject-driven generation; Image manipulation;

D O I：

10.1016/j.patrec.2024.10.009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a novel approach to subject-driven image generation that addresses the limitations of traditional text-to-image diffusion models. Our method generates images using reference images without relying on language-based prompts. We introduce a visual detail preserving module that captures intricate details and textures, addressing overfitting issues associated with limited training samples. The model's performance is further enhanced through a modified classifier-free guidance technique and feature concatenation, enabling the natural positioning and harmonization of subjects within diverse scenes. Quantitative assessments using CLIP, DINO and Quality scores (QS), along with a user study, demonstrate the superior quality of our generated images. Our work highlights the potential of pre-trained models and visual patch embeddings in subject-driven editing, balancing diversity and fidelity in image generation tasks. Our implementation is available at https://github.co m/8eomio/Subject-Inpainting.

引用

页码：221 / 228

页数：8

共 50 条

[21] Edge-enhanced error diffusion halftoning using human visual properties
Kwak, Nae-Joung
Ryu, Soung-Pil
Ahn, Jae-Hyeong
2006 International Conference on Hybrid Information Technology, Vol 1, Proceedings, 2006, : 499 - 504
[22] MTBI Identification From Diffusion MR Images Using Bag of Adversarial Visual Features
Minaee, Shervin
Wang, Yao
Aygar, Alp
Chung, Sohae
Wang, Xiuyuan
Lui, Yvonne W.
Fieremans, Els
Flanagan, Steven
Rath, Joseph
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (11) : 2545 - 2555
[23] Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images
Yu, Cuican
Lu, Guansong
Zeng, Yihan
Sun, Jian
Liang, Xiaodan
Li, Huibin
Xu, Zongben
Xu, Songcen
Zhang, Wei
Xu, Hang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15280 - 15291
[24] Reference-Free Isotropic 3D EM Reconstruction Using Diffusion Models
Lee, Kyungryun
Jeong, Won-Ki
DEEP GENERATIVE MODELS, DGM4MICCAI 2023, 2024, 14533 : 235 - 245
[25] Script-free text line segmentation using interline space model for printed document images
Kim, Minwoo
Oh, Il-Seok
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1354 - 1358
[26] Super-Resolution for Land Surface Temperature Retrieval Images via Cross-Scale Diffusion Model Using Reference Images
Chen, Junqi
Jia, Lijuan
Zhang, Jinchuan
Feng, Yilong
Zhao, Xiaobin
Tao, Ran
REMOTE SENSING, 2024, 16 (08)
[27] Reference-Free Axial Super-Resolution of 3D Microscopy Images Using Implicit Neural Representation with a 2D Diffusion Prior
Lee, Kyungryun
Jeong, Won-Ki
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 593 - 602
[28] REFERENCE-FREE DESPECKLING OF SYNTHETIC-APERTURE RADAR IMAGES USING A DEEP CONVOLUTIONAL NETWORK
Davis, T.
Jain, V
Ley, A.
D'Hondt, O.
Valade, S.
Hellwich, O.
IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 3908 - 3911
[29] No-Reference Quality Assessment for Screen Content Images Using Visual Edge Model and AdaBoosting Neural Network
Yang, Jiachen
Bian, Zilin
Liu, Jiacheng
Jiang, Bin
Lu, Wen
Gao, Xinbo
Song, Houbing
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6801 - 6814
[30] GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion
Ukarapol, Trapoom
Pruvost, Kevin
arXiv,

← 1 2 3 4 5 →