Text-driven Face Image Generation and Manipulation via Multi-level Residual Mapper

被引:0
|
作者
Li Z.-L. [1 ]
Zhang S.-P. [1 ]
Liu Y. [1 ]
Zhang Z.-X. [1 ]
Zhang W.-G. [1 ]
Huang Q.-M. [2 ]
机构
[1] School of Computer Science and Technology, Harbin Institute of Technology, Weihai
[2] School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 05期
关键词
face image generation; face image manipulation; generative adversarial network (GAN); multimodal learning; pre-trained model;
D O I
10.13328/j.cnki.jos.006767
中图分类号
学科分类号
摘要
Although generative adversarial networks (GANs) have achieved great success in face image generation and manipulation, discovering meaningful directions in the latent encoding space of GANs to manipulate semantic attributes of faces is a great challenge in computer vision. The solution to this challenge requires a large amount of labeled data and several hours of network fine-tuning. However, many difficulties are confronted in the collection and annotation of similar data, such as great technical barriers and high labor costs. Recent studies have been attempting to overcome the problem of lacking labeled data by pre-trained models. Such efforts are proved capable of accomplishing the above task, but the accuracy of the manipulation and the authenticity of the results cannot meet the needs of real face editing scenarios. To address these problems, this study encodes the image and text descriptions into a shared latent encoding space by leveraging the joint representation capability of contrastive language-image pre-training (CLIP). With carefully designed network structures and loss functions, the proposed framework can accurately recognize relevant face attributes and learn a residual mapping network. The network can predict the latent code residuals according to image and text description codes and perform high-quality image generation and manipulation by the pre-trained model StyleGAN2. Extensive experiments demonstrate the superiority of the proposed approach in terms of manipulation accuracy, visual realism, and irrelevant attribute preservation. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2101 / 2115
页数:14
相关论文
共 61 条
  • [1] Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y., Generative adversarial nets, Proc. of the 27th Int’l Conf. on Neural Information Processing Systems, pp. 2672-2680, (2014)
  • [2] Karras T, Laine S, Aila T., A style-based generator architecture for generative adversarial networks, Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 4401-4410, (2019)
  • [3] Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T., Analyzing and improving the image quality of StyleGAN, Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 8110-8119, (2020)
  • [4] Ghosh A, Zhang R, Dokania P, Wang O, Efros A, Torr P, Shechtman E., Interactive sketch & fill: Multiclass sketch-to-image translation, Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision, pp. 1171-1180, (2019)
  • [5] Xia WH, Yang YJ, Xue JH., Cali-Sketch: Stroke calibration and completion for high-quality face image generation from Human-like sketches, (2019)
  • [6] Lin JX., Research on image-to-image translation, (2020)
  • [7] Wang TC, Liu MY, Zhu JY, Tao A, Kautz J, Catanzaro B., High-resolution image synthesis and semantic manipulation with conditional GANs, Proc. of the 2018 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 8798-8807, (2018)
  • [8] Gu GH, Cao YY, Li G, Zhao Y., Image hierarchical classification based on semantic label generation and partial order structure, Ruan Jian Xue Bao/Journal of Software, 31, 2, pp. 531-543, (2020)
  • [9] Nam S, Kim Y, Kim SJ., Text-adaptive generative adversarial networks: Manipulating images with natural language, Proc. of the 32nd Int’l Conf. on Neural Information Processing Systems, pp. 42-51, (2018)
  • [10] Xu T, Zhang PC, Huang QY, Zhang H, Gan Z, Huang XL, He XD., AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks, Proc. of the 2018 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1316-1324, (2018)