Survey of deep face manipulation and fake detection

被引:0
|
作者
Xie T. [1 ,2 ]
Yu L. [2 ,3 ]
Luo C. [4 ,5 ]
Xie H. [3 ]
Zhang Y. [2 ,3 ]
机构
[1] AHU-IAI AI Joint Laboratory, Anhui University, Hefei
[2] Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei
[3] The School of Information Science and Technology, University of Science and Technology of China, Hefei
[4] Department of Electronic Engineering, Tsinghua University, Beijing
[5] Academy of Military Sciences, Beijing
关键词
deep face forgery detection; deep face manipulation; deep generative model; detection techniques;
D O I
10.16511/j.cnki.qhdxxb.2023.21.002
中图分类号
学科分类号
摘要
[Significance] Deep face manipulation technology involves the generation and manipulation of human imagery hy different strategies, such as identity swapping or face reenactment between the source face and the target face. On the one hand, the rise of deep face manipulation has inspired a series of applications, including video making and advertising marketing. On the other hand, because face manipulation technology is usually open source or packaged as APPs for free distribution, it makes the threshold of tampering technology lower, resulting in the proliferation of fake videos. Moreover, when face manipulation technology is maliciously used by criminals to produce fake news, especially for important military and political officials, it will guide and intervene in public opinion, posing a great threat to national security and social stability. Therefore, the research on deep face forgery detection technology is particularly important. Hence, it is necessary to summarize the existing research to rationally guide deep face manipulation and detection technology.[Progress] Nowadays, deep face manipulation technology can be roughly divided into four types, namely, identity swapping, face reenactment, face editing, and face synthesis. Deepfakes bring real-world identity swapping to a new level of fidelity. The region-aware face-swapping network provides the identity information of source characters from local and global perspectives, making the generated faces more natural. In the field of facial reenactment, Wav21ip uses pretrained lip synchro models as expert models, encouraging the model to generate natural and accurate lip movements. In the field of face editing, FENeRF, a three-dimensional perception generator based on a neural radiation field, aligns semantic, geometric, and texture information in spatial domain and improves the consistency of the generated image between different perspectives while ensuring that the face can be edited. In the field of face synthesis, Anyface proposes a cross-modal distillation module for the alignment of language and visual representation, realizing the use of text information to generate more diversified face images. Deep face forgery detection technology can be roughly divided into image-level forgery detection and video-level forgery detection methods. In the image-level methods, SBI proposes a self-blended technique to generate realistic fake face images with data augmentation, effectively improving the generalization ability of the model. M2TR proposes a multimodal and multi-scale Transformer model to detect local artifacts at different levels of the image in spatial. Frequency domain features are also added as auxiliary information to ensure the forgery detection ability of the model for highly compressed images. In the video-level methods, RealForensics learns the natural correspondence between the face and audio in a real video in a self-supervised way, enhancing the generalization and robustness of the model.[Conclusions and Prospects] Presently, deep face manipulation and detection technologies are rapidly developing, and various corresponding technologies are in the process of continuous update and iteration. First, this survey reviews the deep face manipulation and detection methods and discusses their strengths and weaknesses. Second, the common datasets and the evaluation results of different manipulation and detection methods are summarized. Finally, the main challenges of face manipulation and fake detection are discussed, and the possible research direction in the future is pointed out. © 2023 Press of Tsinghua University. All rights reserved.
引用
收藏
页码:1350 / 1365
页数:15
相关论文
共 56 条
  • [1] Deepfakes github
  • [2] Zao
  • [3] Face app [EB/OL]
  • [4] DOLHANSKY B, BITTON J., PFLAUM B, Et al., The deepfake detection challenge (DFDC) dataset
  • [5] MIRSKY Y, LEE W., The creation and detection of deepfakes: A survey [j], ACM Computing Surveys, 54, 1, (2022)
  • [6] GOODFELLOW I J., POUGET-ABADIE J, MIRZA M, Et al., Generative adversarial nets [C], Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 2672-2680, (2014)
  • [7] XU C, ZHANG J N, HUA M, Et al., Region-aware face swapping, Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7622-7631, (2022)
  • [8] PRAJWAL K R, MUKHOPADHYAY R, NAMBOODIRI V P, Et al., A lip sync expert is all you need for speech to lip generation in the wild, Proceedings of the 28th ACM International Conference on Multimedia, pp. 484-492, (2020)
  • [9] LIANG B R, PAN Y, GUO Z Z, Et al., Expressive talking head generation with granular audio-visual control [C], Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3377-3386, (2022)
  • [10] SCHWARZK, LIAO Y Y, NIEMEYER M, Et al., Graf: Generative radiance fields for 3D-aware image synthesis, Proceedings of the 34th International Conference on Neural Information Processing Systems, (2020)