Personalized Audio-Driven 3D Facial Animation via Style-Content Disentanglement

被引:2
|
作者
Chai, Yujin [1 ]
Shao, Tianjia [1 ]
Weng, Yanlin [1 ]
Zhou, Kun [1 ]
机构
[1] Zhejiang Univ, State Key Lab CAD & CG, Hangzhou 310058, Zhejiang, Peoples R China
关键词
Audio-driven animation; facial animation; style learning; style-content disentanglement; facial motion decomposition; PLUS PLUS;
D O I
10.1109/TVCG.2022.3230541
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a learning-based approach for generating 3D facial animations with the motion style of a specific subject from arbitrary audio inputs. The subject style is learned from a video clip (1-2 minutes) either downloaded from the Internet or captured through an ordinary camera. Traditional methods often require many hours of the subject's video to learn a robust audio-driven model and are thus unsuitable for this task. Recent research efforts aim to train a model from video collections of a few subjects but ignore the discrimination between the subject style and underlying speech content within facial motions, leading to inaccurate style or articulation. To solve the problem, we propose a novel framework that disentangles subject-specific style and speech content from facial motions. The disentanglement is enabled by two novel training mechanisms. One is two-pass style swapping between two random subjects, and the other is joint training of the decomposition network and audio-to-motion network with a shared decoder. After training, the disentangled style is combined with arbitrary audio inputs to generate stylized audio-driven 3D facial animations. Compared with start-of-the-art methods, our approach achieves better results qualitatively and quantitatively, especially in difficult cases like bilabial plosive and bilabial nasal phonemes.
引用
收藏
页码:1803 / 1820
页数:18
相关论文
共 50 条
  • [21] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
    Zhang, Xitie
    Wu, Suping
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
  • [22] SynGauss: Real-Time 3D Gaussian Splatting for Audio-Driven Talking Head Synthesis
    Zhou, Zhanyi
    Feng, Quandong
    Li, Hongjun
    IEEE ACCESS, 2025, 13 : 42167 - 42177
  • [23] 3D performance capture for facial animation
    MacVicar, D
    Ford, S
    Borland, E
    Rixon, R
    Patterson, J
    Cockshott, P
    2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 42 - 49
  • [24] 3D Facial Animation for Mobile Devices
    De Martino, Jose Mario
    Leite, Tatiane Silvia
    WSCG 2010: FULL PAPERS PROCEEDINGS, 2010, : 81 - 87
  • [25] A comprehensive system for facial animation of generic 3D head models driven by speech
    Terissi, Lucas D.
    Cerda, Mauricio
    Gomez, Juan C.
    Hitschfeld-Kahler, Nancy
    Girau, Bernard
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
  • [26] Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
    Wu, Haozhe
    Zhou, Songtao
    Jia, Jia
    Xing, Junliang
    Wen, Qi
    Wen, Xiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6822 - 6830
  • [27] A comprehensive system for facial animation of generic 3D head models driven by speech
    Lucas D Terissi
    Mauricio Cerda
    Juan C Gómez
    Nancy Hitschfeld-Kahler
    Bernard Girau
    EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [28] KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
    Xu, Zhihao
    Gong, Shengjie
    Tang, Jiapeng
    Liang, Lingyu
    Huang, Yining
    Li, Haojie
    Huang, Shuangping
    COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 236 - 253
  • [29] FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
    Stan, Stefan
    Haque, Kazi Injamamul
    Yumak, Zerrin
    15TH ANNUAL ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION AND GAMES, MIG 2023, 2023,
  • [30] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
    Xing, Jinbo
    Xia, Menghan
    Zhang, Yuechen
    Cun, Xiaodong
    Wang, Jue
    Wong, Tien-Tsin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12780 - 12790