PreciseControl: Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control

被引:0
|
作者
Parihar, Rishubh [1 ]
Sachidanand, V. S. [1 ]
Mani, Sabraswaran [2 ]
Karmali, Tejan [1 ]
Babu, R. Venkatesh [1 ]
机构
[1] IISc Bangalore, Vis & AI Lab, Bengaluru, India
[2] IIT Kharagpur, Kharagpur, W Bengal, India
来源
关键词
Personalised Image Generation; Fine-grained editing;
D O I
10.1007/978-3-031-73007-8_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity preservation and rely on semantic text-based editing of the generated face. However, a more fine-grained control is desired for facial attribute editing, which is challenging to achieve solely with text prompts. In contrast, StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation. This work uses the disentangled W+ space of StyleGANs to condition the T2I model. This approach allows us to precisely manipulate facial attributes, such as smoothly introducing a smile, while preserving the existing coarse text-based control inherent in T2I models. To enable conditioning of the T2I model on the W+ space, we train a latent mapper to translate latent codes from W+ to the token embedding space of the T2I model. The proposed approach excels in the precise inversion of face images with attribute preservation and facilitates continuous control for fine-grained attribute editing. Furthermore, our approach can be readily extended to generate compositions involving multiple individuals. We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing.
引用
收藏
页码:469 / 487
页数:19
相关论文
共 50 条
  • [31] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
    Yang, Jingyuan
    Feng, Jiawei
    Huang, Hui
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
  • [32] Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
    Xu, Xingqian
    Guo, Jiayi
    Wang, Zhangyang
    Huang, Gao
    Essa, Irfan
    Shi, Humphrey
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8682 - 8692
  • [33] Text to Image GANs with RoBERTa and Fine-grained Attention Networks
    Siddharth, M.
    Aarthi, R.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (12) : 947 - 955
  • [34] SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
    Zhong, Shanshan
    Huang, Zhongzhan
    Wen, Wushao
    Qin, Jinghui
    Lin, Liang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 567 - 578
  • [35] Visual Analytics for Fine-grained Text Classification Models and Datasets
    Battogtokh, M.
    Xing, Y.
    Davidescu, C.
    Abdul-Rahman, A.
    Luck, M.
    Borgo, R.
    COMPUTER GRAPHICS FORUM, 2024, 43 (03)
  • [36] Text-to-Image Diffusion Models are Zero-Shot Classifiers
    Clark, Kevin
    Jaini, Priyank
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Zero-Shot Attribute Attacks on Fine-Grained Recognition Models
    Shafiee, Nasim
    Elhamifar, Ehsan
    COMPUTER VISION - ECCV 2022, PT V, 2022, 13665 : 262 - 282
  • [38] Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
    Zhao, Shihao
    Chen, Dongdong
    Chen, Yen-Chun
    Bao, Jianmin
    Hao, Shaozhe
    Yuan, Lu
    Wong, Kwan-Yee K.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [39] The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
    Avrahami, Omri
    Hertz, Amir
    Vinker, Yael
    Arar, Moab
    Fruchter, Shlomi
    Fried, Ohad
    Cohen-Or, Daniel
    Lischinski, Dani
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [40] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82