Personalized Audio-Driven 3D Facial Animation via Style-Content Disentanglement

被引：2

作者：

Chai, Yujin ^{[1
]}

Shao, Tianjia ^{[1
]}

Weng, Yanlin ^{[1
]}

Zhou, Kun ^{[1
]}

机构：

[1] Zhejiang Univ, State Key Lab CAD & CG, Hangzhou 310058, Zhejiang, Peoples R China

来源：

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS | 2024年 / 30卷 / 03期

关键词：

Audio-driven animation; facial animation; style learning; style-content disentanglement; facial motion decomposition; PLUS PLUS;

D O I：

10.1109/TVCG.2022.3230541

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We present a learning-based approach for generating 3D facial animations with the motion style of a specific subject from arbitrary audio inputs. The subject style is learned from a video clip (1-2 minutes) either downloaded from the Internet or captured through an ordinary camera. Traditional methods often require many hours of the subject's video to learn a robust audio-driven model and are thus unsuitable for this task. Recent research efforts aim to train a model from video collections of a few subjects but ignore the discrimination between the subject style and underlying speech content within facial motions, leading to inaccurate style or articulation. To solve the problem, we propose a novel framework that disentangles subject-specific style and speech content from facial motions. The disentanglement is enabled by two novel training mechanisms. One is two-pass style swapping between two random subjects, and the other is joint training of the decomposition network and audio-to-motion network with a shared decoder. After training, the disentangled style is combined with arbitrary audio inputs to generate stylized audio-driven 3D facial animations. Compared with start-of-the-art methods, our approach achieves better results qualitatively and quantitatively, especially in difficult cases like bilabial plosive and bilabial nasal phonemes.

引用

页码：1803 / 1820

页数：18

共 50 条

[1] EmoFace: Audio-driven Emotional 3D Face Animation
Liu, Chang
Lin, Qunfen
Zeng, Zijiao
Pan, Ye
2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES, VR 2024, 2024, : 387 - 397
[2] Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation
Fu, Hui
Wang, Zeqing
Gong, Ke
Wang, Keze
Chen, Tianshui
Li, Haojie
Zeng, Haifeng
Kang, Wenxiong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1770 - 1777
[3] UniTalker: Scaling up Audio-Driven 3D Facial Animation Through A Unified Model
Fan, Xiangyu
Li, Jiaqi
Lin, Zhiqian
Xiao, Weiye
Yang, Lei
COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 204 - 221
[4] Emotion-Aware Audio-Driven Face Animation via Contrastive Feature Disentanglement
Ren, Xin
Luo, Juan
Zhong, Xionghu
Cai, Minjie
INTERSPEECH 2023, 2023, : 2728 - 2732
[5] Audio-Driven Facial Animation with Deep Learning: A Survey
Jiang, Diqiong
Chang, Jian
You, Lihua
Bian, Shaojun
Kosk, Robert
Maguire, Greg
INFORMATION, 2024, 15 (11)
[6] Multi-Task Audio-Driven Facial Animation
Kim, Youngsoo
An, Shounan
Jo, Youngbak
Park, Seungje
Kang, Shindong
Oh, Insoo
Kim, Duke Donghyun
SIGGRAPH '19 - ACM SIGGRAPH 2019 POSTERS, 2019,
[7] A Comparative Study of Four 3D Facial Animation Methods: Skeleton, Blendshape, Audio-Driven, and Vision-Based Capture
Wei, Mingzhu
Adamo, Nicoletta
Giri, Nandhini
Chen, Yingjie
ARTSIT, INTERACTIVITY AND GAME CREATION, ARTSIT 2022, 2023, 479 : 36 - 50
[8] Imitator: Personalized Speech-driven 3D Facial Animation
Thambiraja, Balamurugan
Habibie, Ikhsanul
Aliakbarian, Sadegh
Cosker, Darren
Theobalt, Christian
Thies, Justus
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20564 - 20574
[9] Audio-Driven Lips and Expression on 3D Human Face
Ma, Le
Ma, Zhihao
Meng, Weiliang
Xu, Shibiao
Zhang, Xiaopeng
ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT II, 2024, 14496 : 15 - 26
[10] Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion
Karras, Tero
Aila, Timo
Laine, Samuli
Herva, Antti
Lehtinen, Jaakko
ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (04):

← 1 2 3 4 5 →