Text-free diffusion inpainting using reference images for enhanced visual fidelity

被引:0
|
作者
Kim, Beomjo [1 ]
Sohn, Kyung-Ah [1 ]
机构
[1] Ajou Univ, Dept Artificial Intelligence, 206 World Cup Ro, Suwon 16499, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Diffusion models; Image generation; Image inpainting; Subject-driven generation; Image manipulation;
D O I
10.1016/j.patrec.2024.10.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel approach to subject-driven image generation that addresses the limitations of traditional text-to-image diffusion models. Our method generates images using reference images without relying on language-based prompts. We introduce a visual detail preserving module that captures intricate details and textures, addressing overfitting issues associated with limited training samples. The model's performance is further enhanced through a modified classifier-free guidance technique and feature concatenation, enabling the natural positioning and harmonization of subjects within diverse scenes. Quantitative assessments using CLIP, DINO and Quality scores (QS), along with a user study, demonstrate the superior quality of our generated images. Our work highlights the potential of pre-trained models and visual patch embeddings in subject-driven editing, balancing diversity and fidelity in image generation tasks. Our implementation is available at https://github.co m/8eomio/Subject-Inpainting.
引用
收藏
页码:221 / 228
页数:8
相关论文
共 50 条
  • [1] A text-free training method for generating face images based on Chinese text
    Tong, Songlin
    Liu, Meiling
    Zhou, Jiyun
    Journal of Combinatorial Mathematics and Combinatorial Computing, 2024, 123 : 161 - 177
  • [2] TtfDiffusion: Training-free and text-free image editing in diffusion models with structural and semantic disentanglement
    Yu, Zhenbo
    Jin, Jian
    Zhao, Jinhan
    Fu, Zhenyong
    Yang, Jian
    NEUROCOMPUTING, 2025, 619
  • [3] Repairing and inpainting damaged images using diffusion tensor
    Signal, Image Processing and Pattern Recognition Laboratory, TSIRF , Tunisia
    Int. J. Comput. Sci. Issues, 4 4-3 (150-156):
  • [4] AN AUTOMATIC TEXT-FREE SPEAKER RECOGNITION SYSTEM BASED ON AN ENHANCED ART-2 NEURAL ARCHITECTURE
    ECK, JT
    SHIH, FY
    INFORMATION SCIENCES, 1994, 76 (3-4) : 233 - 253
  • [5] Images Inpainting Quality Evaluation Using Structural Features and Visual Saliency
    Ma, Shuang
    Liu, Jinhe
    ADVANCES IN MULTIMEDIA, 2024, 2024
  • [6] TEXT-FREE NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING NORMALISING FLOWS
    Merritt, Thomas
    Ezzerg, Abdelhamid
    Bilinski, Piotr
    Proszewska, Magdalena
    Pokora, Kamil
    Barra-Chicote, Roberto
    Korzekwa, Daniel
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6782 - 6786
  • [7] TEXT LOCATION IN SCENE IMAGES USING VISUAL ATTENTION MODEL
    Sun, Qiao-Yu
    Lu, Yue
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (04)
  • [8] Assessing the fidelity of consecutive interpreting The effects of using source versus target text as the reference material
    Han, Chao
    Xiao, Rui
    Su, Wei
    INTERPRETING, 2021, 23 (02) : 245 - 268
  • [9] Iterative Object Localization Algorithm Using Visual Images with a Reference Coordinate
    Park, Kyoung-Su
    Lee, Jinseok
    Stanacevic, Milutin
    Hong, Sangjin
    Cho, We-Duke
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)
  • [10] Iterative Object Localization Algorithm Using Visual Images with a Reference Coordinate
    Kyoung-Su Park
    Jinseok Lee
    Milutin Stanaćević
    Sangjin Hong
    We-Duke Cho
    EURASIP Journal on Image and Video Processing, 2008