Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

被引:6
|
作者
Huang, Ricong [1 ]
Lai, Peiwen [1 ]
Qin, Yipeng [2 ]
Li, Guanbin [1 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Cardiff Univ, Cardiff, Wales
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-driven facial reenactment is a crucial technique that has a range of applications in film-making, virtual avatars and video conferences. Existing works either employ explicit intermediate face representations (e.g., 2D facial landmarks or 3D face models) or implicit ones (e.g., Neural Radiance Fields), thus suffering from the trade-offs between interpretability and expressive power, hence between controllability and quality of the results. In this work, we break these trade-offs with our novel parametric implicit face representation and propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads. Specifically, our parametric implicit representation parameterizes the implicit representation with interpretable parameters of 3D face models, thereby taking the best of both explicit and implicit methods. In addition, we propose several new techniques to improve the three components of our framework, including i) incorporating contextual information into the audio-to-expression parameters encoding; ii) using conditional image synthesis to parameterize the implicit representation and implementing it with an innovative tri-plane structure for efficient learning; iii) formulating facial reenactment as a conditional image inpainting problem and proposing a novel data augmentation technique to improve model generalizability. Extensive experiments demonstrate that our method can generate more realistic results than previous methods with greater fidelity to the identities and talking styles of speakers.
引用
收藏
页码:12759 / 12768
页数:10
相关论文
共 50 条
  • [1] Audio-driven Neural Gesture Reenactment with Video Motion Graphs
    Zhou, Yang
    Yang, Jimei
    Li, Dingzeyu
    Saito, Jun
    Aneja, Deepali
    Kalogerakis, Evangelos
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3408 - 3418
  • [2] Audio-driven talking face generation with diverse yet realistic facial animations
    Wu, Rongliang
    Yu, Yingchen
    Zhan, Fangneng
    Zhang, Jiahui
    Zhang, Xiaoqin
    Lu, Shijian
    PATTERN RECOGNITION, 2023, 144
  • [3] Audio-Driven Talking Face Generation: A Review
    Liu, Shiguang
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2023, 71 (7-8): : 408 - 419
  • [4] Voice2Face: Audio-driven Facial and Tongue Rig Animations with cVAEs
    Aylagas, Monica Villanueva
    Leon, Hector Anadon
    Teye, Mattias
    Tollmar, Konrad
    COMPUTER GRAPHICS FORUM, 2022, 41 (08) : 255 - 265
  • [5] Audio-Driven Facial Animation with Deep Learning: A Survey
    Jiang, Diqiong
    Chang, Jian
    You, Lihua
    Bian, Shaojun
    Kosk, Robert
    Maguire, Greg
    INFORMATION, 2024, 15 (11)
  • [6] Multi-Task Audio-Driven Facial Animation
    Kim, Youngsoo
    An, Shounan
    Jo, Youngbak
    Park, Seungje
    Kang, Shindong
    Oh, Insoo
    Kim, Duke Donghyun
    SIGGRAPH '19 - ACM SIGGRAPH 2019 POSTERS, 2019,
  • [7] Audio-driven Talking Face Video Generation with Emotion
    Liang, Jiadong
    Lu, Feng
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 863 - 864
  • [8] Spatially and Temporally Optimized Audio-Driven Talking Face Generation
    Dong, Biao
    Ma, Bo-Yao
    Zhang, Lei
    COMPUTER GRAPHICS FORUM, 2024, 43 (07)
  • [9] EmoFace: Audio-driven Emotional 3D Face Animation
    Liu, Chang
    Lin, Qunfen
    Zeng, Zijiao
    Pan, Ye
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES, VR 2024, 2024, : 387 - 397
  • [10] An audio-driven dancing avatar
    Ofli, Ferda
    Demir, Yasemin
    Yemez, Yucel
    Erzin, Engin
    Tekalp, A. Murat
    Balci, Koray
    Kizoglu, Idil
    Akarun, Lale
    Canton-Ferrer, Cristian
    Tilmanne, Joelle
    Bozkurt, Elif
    Erdem, A. Tanju
    JOURNAL ON MULTIMODAL USER INTERFACES, 2008, 2 (02) : 93 - 103