Personalized Audio-Driven 3D Facial Animation via Style-Content Disentanglement

被引：2

作者：

Chai, Yujin ^{[1
]}

Shao, Tianjia ^{[1
]}

Weng, Yanlin ^{[1
]}

Zhou, Kun ^{[1
]}

机构：

[1] Zhejiang Univ, State Key Lab CAD & CG, Hangzhou 310058, Zhejiang, Peoples R China

来源：

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS | 2024年 / 30卷 / 03期

关键词：

Audio-driven animation; facial animation; style learning; style-content disentanglement; facial motion decomposition; PLUS PLUS;

D O I：

10.1109/TVCG.2022.3230541

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We present a learning-based approach for generating 3D facial animations with the motion style of a specific subject from arbitrary audio inputs. The subject style is learned from a video clip (1-2 minutes) either downloaded from the Internet or captured through an ordinary camera. Traditional methods often require many hours of the subject's video to learn a robust audio-driven model and are thus unsuitable for this task. Recent research efforts aim to train a model from video collections of a few subjects but ignore the discrimination between the subject style and underlying speech content within facial motions, leading to inaccurate style or articulation. To solve the problem, we propose a novel framework that disentangles subject-specific style and speech content from facial motions. The disentanglement is enabled by two novel training mechanisms. One is two-pass style swapping between two random subjects, and the other is joint training of the decomposition network and audio-to-motion network with a shared decoder. After training, the disentangled style is combined with arbitrary audio inputs to generate stylized audio-driven 3D facial animations. Compared with start-of-the-art methods, our approach achieves better results qualitatively and quantitatively, especially in difficult cases like bilabial plosive and bilabial nasal phonemes.

引用

页码：1803 / 1820

页数：18

共 50 条

[21] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
Zhang, Xitie
Wu, Suping
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
[22] SynGauss: Real-Time 3D Gaussian Splatting for Audio-Driven Talking Head Synthesis
Zhou, Zhanyi
Feng, Quandong
Li, Hongjun
IEEE ACCESS, 2025, 13 : 42167 - 42177
[23] 3D performance capture for facial animation
MacVicar, D
Ford, S
Borland, E
Rixon, R
Patterson, J
Cockshott, P
2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 42 - 49
[24] 3D Facial Animation for Mobile Devices
De Martino, Jose Mario
Leite, Tatiane Silvia
WSCG 2010: FULL PAPERS PROCEEDINGS, 2010, : 81 - 87
[25] A comprehensive system for facial animation of generic 3D head models driven by speech
Terissi, Lucas D.
Cerda, Mauricio
Gomez, Juan C.
Hitschfeld-Kahler, Nancy
Girau, Bernard
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
[26] Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
Wu, Haozhe
Zhou, Songtao
Jia, Jia
Xing, Junliang
Wen, Qi
Wen, Xiang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6822 - 6830
[27] A comprehensive system for facial animation of generic 3D head models driven by speech
Lucas D Terissi
Mauricio Cerda
Juan C Gómez
Nancy Hitschfeld-Kahler
Bernard Girau
EURASIP Journal on Audio, Speech, and Music Processing, 2013
[28] KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
Xu, Zhihao
Gong, Shengjie
Tang, Jiapeng
Liang, Lingyu
Huang, Yining
Li, Haojie
Huang, Shuangping
COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 236 - 253
[29] FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
Stan, Stefan
Haque, Kazi Injamamul
Yumak, Zerrin
15TH ANNUAL ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION AND GAMES, MIG 2023, 2023,
[30] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
Xing, Jinbo
Xia, Menghan
Zhang, Yuechen
Cun, Xiaodong
Wang, Jue
Wong, Tien-Tsin
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12780 - 12790

← 1 2 3 4 5 →