Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

被引:6
|
作者
Huang, Ricong [1 ]
Lai, Peiwen [1 ]
Qin, Yipeng [2 ]
Li, Guanbin [1 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Cardiff Univ, Cardiff, Wales
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-driven facial reenactment is a crucial technique that has a range of applications in film-making, virtual avatars and video conferences. Existing works either employ explicit intermediate face representations (e.g., 2D facial landmarks or 3D face models) or implicit ones (e.g., Neural Radiance Fields), thus suffering from the trade-offs between interpretability and expressive power, hence between controllability and quality of the results. In this work, we break these trade-offs with our novel parametric implicit face representation and propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads. Specifically, our parametric implicit representation parameterizes the implicit representation with interpretable parameters of 3D face models, thereby taking the best of both explicit and implicit methods. In addition, we propose several new techniques to improve the three components of our framework, including i) incorporating contextual information into the audio-to-expression parameters encoding; ii) using conditional image synthesis to parameterize the implicit representation and implementing it with an innovative tri-plane structure for efficient learning; iii) formulating facial reenactment as a conditional image inpainting problem and proposing a novel data augmentation technique to improve model generalizability. Extensive experiments demonstrate that our method can generate more realistic results than previous methods with greater fidelity to the identities and talking styles of speakers.
引用
收藏
页码:12759 / 12768
页数:10
相关论文
共 50 条
  • [31] Personalized Audio-Driven 3D Facial Animation via Style-Content Disentanglement
    Chai, Yujin
    Shao, Tianjia
    Weng, Yanlin
    Zhou, Kun
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (03) : 1803 - 1820
  • [32] Audio-Driven Deformation Flow for Effective Lip Reading
    Feng, Dalu
    Yang, Shuang
    Shan, Shiguang
    Chen, Xilin
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 274 - 280
  • [33] ASVFI: AUDIO-DRIVEN SPEAKER VIDEO FRAME INTERPOLATION
    Wang, Qianrui
    Li, Dengshi
    Liao, Liang
    Song, Hao
    Li, Wei
    Xiao, Jing
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3200 - 3204
  • [34] Audio-driven human body motion analysis and synthesis
    Ofli, F.
    Canton-Ferrer, C.
    Tilmanne, J.
    Demir, Y.
    Bozkurt, E.
    Yemez, Y.
    Erzin, E.
    Tekalp, A. M.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2233 - +
  • [35] Face2Face: Real-time facial reenactment
    Thies, Justus
    IT-INFORMATION TECHNOLOGY, 2019, 61 (2-3): : 143 - 146
  • [36] Leveraging Language Models and Audio-Driven Dynamic Facial Motion Synthesis: A New Paradigm in AI-Driven Interview Training
    Garg, Aakash
    Chaudhury, Rohan
    Godbole, Mihir
    Seo, Jinsil Hwaryoung
    ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2024, PT I, 2024, 2150 : 461 - 468
  • [37] VisemeNet: Audio-Driven Animator-Centric Speech Animation
    Zhou, Yang
    Xu, Zhan
    Landreth, Chris
    Kalogerakis, Evangelos
    Maji, Subhransu
    Singh, Karan
    ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
  • [38] Audio-Driven Co-Speech Gesture Video Generation
    Liu, Xian
    Wu, Qianyi
    Zhou, Hang
    Du, Yuanqi
    Wu, Wayne
    Lin, Dahua
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [39] Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis
    Song, Linsen
    Wu, Wayne
    Fu, Chaoyou
    Loy, Chen Change
    He, Ran
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1247 - 1261
  • [40] Audio-Driven Robot Upper-Body Motion Synthesis
    Ondras, Jan
    Celiktutan, Oya
    Bremner, Paul
    Gunes, Hatice
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (11) : 5445 - 5454