Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

被引:6
|
作者
Huang, Ricong [1 ]
Lai, Peiwen [1 ]
Qin, Yipeng [2 ]
Li, Guanbin [1 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Cardiff Univ, Cardiff, Wales
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-driven facial reenactment is a crucial technique that has a range of applications in film-making, virtual avatars and video conferences. Existing works either employ explicit intermediate face representations (e.g., 2D facial landmarks or 3D face models) or implicit ones (e.g., Neural Radiance Fields), thus suffering from the trade-offs between interpretability and expressive power, hence between controllability and quality of the results. In this work, we break these trade-offs with our novel parametric implicit face representation and propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads. Specifically, our parametric implicit representation parameterizes the implicit representation with interpretable parameters of 3D face models, thereby taking the best of both explicit and implicit methods. In addition, we propose several new techniques to improve the three components of our framework, including i) incorporating contextual information into the audio-to-expression parameters encoding; ii) using conditional image synthesis to parameterize the implicit representation and implementing it with an innovative tri-plane structure for efficient learning; iii) formulating facial reenactment as a conditional image inpainting problem and proposing a novel data augmentation technique to improve model generalizability. Extensive experiments demonstrate that our method can generate more realistic results than previous methods with greater fidelity to the identities and talking styles of speakers.
引用
收藏
页码:12759 / 12768
页数:10
相关论文
共 50 条
  • [41] Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
    Tong, Haonan
    Li, Haopeng
    Du, Hongyang
    Yang, Zhaohui
    Yin, Changchuan
    Niyato, Dusit
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2025, 14 (01) : 93 - 97
  • [42] Speech-driven Face Reenactment for a Video Sequence
    Nakashima, Yuta
    Yasui, Takaaki
    Nguyen, Leon
    Babaguchi, Noboru
    ITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS, 2020, 8 (01): : 60 - 68
  • [43] Audio2AB: Audio-driven collaborative generation of virtual character animation
    Niu L.
    Xie W.
    Wang D.
    Cao Z.
    Liu X.
    Virtual Reality and Intelligent Hardware, 6 (01): : 56 - 70
  • [44] Audio-Driven Violin Performance Animation with Clear Fingering and Bowing
    Hirata, Asuka
    Tanaka, Keitaro
    Hamanaka, Masatoshi
    Morishima, Shigeo
    PROCEEDINGS OF SIGGRAPH 2022 POSTERS, SIGGRAPH 2022, 2022,
  • [45] Audio-driven emotional speech animation for interactive virtual characters
    Charalambous, Constantinos
    Yumak, Zerrin
    van der Stappen, A. Frank
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)
  • [46] Partial linear regresston for audio-driven talking head application
    Hsieh, CK
    Chen, YC
    2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 281 - 284
  • [47] Audio2AB:Audio-driven collaborative generation of virtual character animation
    Lichao NIU
    Wenjun XIE
    Dong WANG
    Zhongrui CAO
    Xiaoping LIU
    虚拟现实与智能硬件(中英文), 2024, 6 (01) : 56 - 70
  • [48] PADVG: A Simple Baseline of Active Protection for Audio-Driven Video Generation
    Liu, Huan
    Liu, Xiaolong
    Tan, Zichang
    Li, Xiaolong
    Zhao, Yao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)
  • [49] Audio-Driven Stylized Gesture Generation with Flow-Based Model
    Ye, Sheng
    Wen, Yu-Hui
    Sun, Yanan
    He, Ying
    Zhang, Ziyang
    Wang, Yaoyuan
    He, Weihua
    Liu, Yong-Jin
    COMPUTER VISION - ECCV 2022, PT V, 2022, 13665 : 712 - 728
  • [50] DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
    Shen, Shuai
    Zhao, Wenliang
    Meng, Zibin
    Li, Wanhua
    Zhu, Zheng
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1982 - 1991