Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

被引:6
|
作者
Huang, Ricong [1 ]
Lai, Peiwen [1 ]
Qin, Yipeng [2 ]
Li, Guanbin [1 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Cardiff Univ, Cardiff, Wales
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-driven facial reenactment is a crucial technique that has a range of applications in film-making, virtual avatars and video conferences. Existing works either employ explicit intermediate face representations (e.g., 2D facial landmarks or 3D face models) or implicit ones (e.g., Neural Radiance Fields), thus suffering from the trade-offs between interpretability and expressive power, hence between controllability and quality of the results. In this work, we break these trade-offs with our novel parametric implicit face representation and propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads. Specifically, our parametric implicit representation parameterizes the implicit representation with interpretable parameters of 3D face models, thereby taking the best of both explicit and implicit methods. In addition, we propose several new techniques to improve the three components of our framework, including i) incorporating contextual information into the audio-to-expression parameters encoding; ii) using conditional image synthesis to parameterize the implicit representation and implementing it with an innovative tri-plane structure for efficient learning; iii) formulating facial reenactment as a conditional image inpainting problem and proposing a novel data augmentation technique to improve model generalizability. Extensive experiments demonstrate that our method can generate more realistic results than previous methods with greater fidelity to the identities and talking styles of speakers.
引用
收藏
页码:12759 / 12768
页数:10
相关论文
共 50 条
  • [21] Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion
    Karras, Tero
    Aila, Timo
    Laine, Samuli
    Herva, Antti
    Lehtinen, Jaakko
    ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (04):
  • [22] Emotion-Aware Audio-Driven Face Animation via Contrastive Feature Disentanglement
    Ren, Xin
    Luo, Juan
    Zhong, Xionghu
    Cai, Minjie
    INTERSPEECH 2023, 2023, : 2728 - 2732
  • [23] EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation
    Tan, Shuai
    Ji, Bin
    Pan, Ye
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22089 - 22099
  • [24] Audio-Driven Multimedia Content Authentication as a Service
    Vryzas, Nikolaos
    Katsaounidou, Anastasia
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    146TH AES CONVENTION, 2019,
  • [25] EAT-Face: Emotion-Controllable Audio-Driven Talking Face Generation via Diffusion Model
    Wang, Haodi
    Jia, Xiaojun
    Cao, Xiaochun
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [26] Audio-Driven Talking Video Frame Restoration
    Cheng, Harry
    Guo, Yangyang
    Yin, Jianhua
    Chen, Haonan
    Wang, Jiafang
    Nie, Liqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4110 - 4122
  • [27] Detecting Face2Face Facial Reenactment in Videos
    Kumar, Prabhat
    Vatsa, Mayank
    Singh, Richa
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2578 - 2586
  • [28] UniTalker: Scaling up Audio-Driven 3D Facial Animation Through A Unified Model
    Fan, Xiangyu
    Li, Jiaqi
    Lin, Zhiqian
    Xiao, Weiye
    Yang, Lei
    COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 204 - 221
  • [29] Face Reenactment Based on Motion Field Representation
    Zheng, Si
    Chen, Junbin
    Yang, Zhijing
    Chen, Tianshui
    Lu, Yongyi
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 354 - 364
  • [30] Touch the Sound: Audio-Driven Tactile Feedback for Audio Mixing Applications
    Merchel, Sebastian
    Altinsoy, M. Ercan
    Stamm, Maik
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2012, 60 (1-2): : 47 - 53