Multi-Modal Face Stylization with a Generative Prior

被引:0
|
作者
Li, Mengtian [1 ]
Dong, Yi [2 ]
Lin, Minxuan [1 ]
Huang, Haibin [1 ]
Wan, Pengfei [1 ]
Ma, Chongyang [1 ]
机构
[1] Kuaishou Technol, Beijing, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
关键词
Computing methodologies -> Image processing;
D O I
10.1111/cgf.14952
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this work, we introduce a new approach for face stylization. Despite existing methods achieving impressive results in this task, there is still room for improvement in generating high-quality artistic faces with diverse styles and accurate facial reconstruction. Our proposed framework, MMFS, supports multi-modal face stylization by leveraging the strengths of StyleGAN and integrates it into an encoder-decoder architecture. Specifically, we use the mid-resolution and high-resolution layers of StyleGAN as the decoder to generate high-quality faces, while aligning its low-resolution layer with the encoder to extract and preserve input facial details. We also introduce a two-stage training strategy, where we train the encoder in the first stage to align the feature maps with StyleGAN and enable a faithful reconstruction of input faces. In the second stage, the entire network is fine-tuned with artistic data for stylized face generation. To enable the fine-tuned model to be applied in zero-shot and one-shot stylization tasks, we train an additional mapping network from the large-scale Contrastive-Language-Image-Pre-training (CLIP) space to a latent w+ space of fine-tuned StyleGAN. Qualitative and quantitative experiments show that our framework achieves superior performance in both one-shot and zero-shot face stylization tasks, outperforming state-of-the-art methods by a large margin.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Multi-Modal Face Recognition
    Shen, Haihong
    Ma, Liqun
    Zhang, Qishan
    [J]. 2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 5, 2010, : 612 - 616
  • [2] Multi-Modal Face Recognition
    Shen, Haihong
    Ma, Liqun
    Zhang, Qishan
    [J]. 2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 720 - 723
  • [3] GANtlitz: Ultra High Resolution Generative Model for Multi-Modal Face Textures
    Gruber, A.
    Collins, E.
    Meka, A.
    Mueller, F.
    Sarkar, K.
    Orts-Escolano, S.
    Prasso, L.
    Busch, J.
    Gross, M.
    Beeler, T.
    [J]. COMPUTER GRAPHICS FORUM, 2024, 43 (02)
  • [4] Discriminative multi-modal deep generative models
    Du, Fang
    Zhang, Jiangshe
    Hu, Junying
    Fei, Rongrong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 173 : 74 - 82
  • [5] Multi-Modal Generative AI with Foundation Models
    Liu, Ziwei
    [J]. PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 5 - 5
  • [6] Cross-modal generative models for multi-modal plastic sorting
    Neo, Edward R. K.
    Low, Jonathan S. C.
    Goodship, Vannessa
    Coles, Stuart R.
    Debattista, Kurt
    [J]. JOURNAL OF CLEANER PRODUCTION, 2023, 415
  • [7] Face Detection using Multi-modal Features
    Lee, Hyobin
    Kim, Seongwan
    Kim, Sooyeon
    Lee, Sangyoun
    [J]. 2008 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS, VOLS 1-4, 2008, : 1857 - 1860
  • [8] MULTI-MODAL EAR AND FACE MODELING AND RECOGNITION
    Mahoor, Mohammad H.
    Cadavid, Steven
    Abdel-Mottaleb, Mohamed
    [J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 4137 - +
  • [9] Multi-Modal Face Presentation Attack Detection
    Institute of Automation, Chinese Academy of Sciences, Guodong, China
    不详
    不详
    不详
    不详
    [J]. Synth. Lect. Comput. Vis., 2020, 1 (1-88): : 1 - 88
  • [10] Sejong face database: A multi-modal disguise face database
    Cheema, Usman
    Moon, Seungbin
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 208