Voice2Face: Audio-driven Facial and Tongue Rig Animations with cVAEs

被引：5

作者：

Aylagas, Monica Villanueva ^{[1
]}

Leon, Hector Anadon ^{[1
]}

Teye, Mattias ^{[1
]}

Tollmar, Konrad ^{[1
]}

机构：

[1] SEED Elect Arts EA, Redwood City, CA 94065 USA

来源：

COMPUTER GRAPHICS FORUM | 2022年 / 41卷 / 08期

关键词：

Deep Learning; Facial animation; Tongue animation; Lip synchronization; Rig animation;

D O I：

10.1111/cgf.14640

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We present Voice2Face: a Deep Learning model that generates face and tongue animations directly from recorded speech. Our approach consists of two steps: a conditional Variational Autoencoder generates mesh animations from speech, while a separate module maps the animations to rig controller space. Our contributions include an automated method for speech style control, a method to train a model with data from multiple quality levels, and a method for animating the tongue. Unlike previous works, our model generates animations without speaker-dependent characteristics while allowing speech style control. We demonstrate through a user study that Voice2Face significantly outperforms a comparative state-of-the-art model in terms of perceived animation quality, and our quantitative evaluation suggests that Voice2Face yields more accurate lip closure in speech with bilabials through our speech style optimization. Both evaluations also show that our data quality conditioning scheme outperforms both an unconditioned model and a model trained with a smaller high-quality dataset. Finally, the user study shows a preference for animations including tongue. Results from our model can be seen at .

引用

页码：255 / 265

页数：11

共 31 条

[1] Audio-driven talking face generation with diverse yet realistic facial animations
Wu, Rongliang
Yu, Yingchen
Zhan, Fangneng
Zhang, Jiahui
Zhang, Xiaoqin
Lu, Shijian
PATTERN RECOGNITION, 2023, 144
[2] Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
Huang, Ricong
Lai, Peiwen
Qin, Yipeng
Li, Guanbin
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12759 - 12768
[3] Audio-Driven Talking Face Generation: A Review
Liu, Shiguang
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2023, 71 (7-8): : 408 - 419
[4] Audio-Driven Facial Animation with Deep Learning: A Survey
Jiang, Diqiong
Chang, Jian
You, Lihua
Bian, Shaojun
Kosk, Robert
Maguire, Greg
INFORMATION, 2024, 15 (11)
[5] Multi-Task Audio-Driven Facial Animation
Kim, Youngsoo
An, Shounan
Jo, Youngbak
Park, Seungje
Kang, Shindong
Oh, Insoo
Kim, Duke Donghyun
SIGGRAPH '19 - ACM SIGGRAPH 2019 POSTERS, 2019,
[6] Audio-driven Talking Face Video Generation with Emotion
Liang, Jiadong
Lu, Feng
2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 863 - 864
[7] Spatially and Temporally Optimized Audio-Driven Talking Face Generation
Dong, Biao
Ma, Bo-Yao
Zhang, Lei
COMPUTER GRAPHICS FORUM, 2024, 43 (07)
[8] EmoFace: Audio-driven Emotional 3D Face Animation
Liu, Chang
Lin, Qunfen
Zeng, Zijiao
Pan, Ye
2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES, VR 2024, 2024, : 387 - 397
[9] Audio-Driven Talking Face Video Generation With Dynamic Convolution Kernels
Ye, Zipeng
Xia, Mengfei
Yi, Ran
Zhang, Juyong
Lai, Yu-Kun
Huang, Xuwei
Zhang, Guoxin
Liu, Yong-Jin
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2033 - 2046
[10] Audio-Driven Lips and Expression on 3D Human Face
Ma, Le
Ma, Zhihao
Meng, Weiliang
Xu, Shibiao
Zhang, Xiaopeng
ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT II, 2024, 14496 : 15 - 26

← 1 2 3 4 →