Voice2Face: Audio-driven Facial and Tongue Rig Animations with cVAEs

被引:5
|
作者
Aylagas, Monica Villanueva [1 ]
Leon, Hector Anadon [1 ]
Teye, Mattias [1 ]
Tollmar, Konrad [1 ]
机构
[1] SEED Elect Arts EA, Redwood City, CA 94065 USA
关键词
Deep Learning; Facial animation; Tongue animation; Lip synchronization; Rig animation;
D O I
10.1111/cgf.14640
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present Voice2Face: a Deep Learning model that generates face and tongue animations directly from recorded speech. Our approach consists of two steps: a conditional Variational Autoencoder generates mesh animations from speech, while a separate module maps the animations to rig controller space. Our contributions include an automated method for speech style control, a method to train a model with data from multiple quality levels, and a method for animating the tongue. Unlike previous works, our model generates animations without speaker-dependent characteristics while allowing speech style control. We demonstrate through a user study that Voice2Face significantly outperforms a comparative state-of-the-art model in terms of perceived animation quality, and our quantitative evaluation suggests that Voice2Face yields more accurate lip closure in speech with bilabials through our speech style optimization. Both evaluations also show that our data quality conditioning scheme outperforms both an unconditioned model and a model trained with a smaller high-quality dataset. Finally, the user study shows a preference for animations including tongue. Results from our model can be seen at .
引用
收藏
页码:255 / 265
页数:11
相关论文
共 31 条
  • [1] Audio-driven talking face generation with diverse yet realistic facial animations
    Wu, Rongliang
    Yu, Yingchen
    Zhan, Fangneng
    Zhang, Jiahui
    Zhang, Xiaoqin
    Lu, Shijian
    PATTERN RECOGNITION, 2023, 144
  • [2] Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
    Huang, Ricong
    Lai, Peiwen
    Qin, Yipeng
    Li, Guanbin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12759 - 12768
  • [3] Audio-Driven Talking Face Generation: A Review
    Liu, Shiguang
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2023, 71 (7-8): : 408 - 419
  • [4] Audio-Driven Facial Animation with Deep Learning: A Survey
    Jiang, Diqiong
    Chang, Jian
    You, Lihua
    Bian, Shaojun
    Kosk, Robert
    Maguire, Greg
    INFORMATION, 2024, 15 (11)
  • [5] Multi-Task Audio-Driven Facial Animation
    Kim, Youngsoo
    An, Shounan
    Jo, Youngbak
    Park, Seungje
    Kang, Shindong
    Oh, Insoo
    Kim, Duke Donghyun
    SIGGRAPH '19 - ACM SIGGRAPH 2019 POSTERS, 2019,
  • [6] Audio-driven Talking Face Video Generation with Emotion
    Liang, Jiadong
    Lu, Feng
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 863 - 864
  • [7] Spatially and Temporally Optimized Audio-Driven Talking Face Generation
    Dong, Biao
    Ma, Bo-Yao
    Zhang, Lei
    COMPUTER GRAPHICS FORUM, 2024, 43 (07)
  • [8] EmoFace: Audio-driven Emotional 3D Face Animation
    Liu, Chang
    Lin, Qunfen
    Zeng, Zijiao
    Pan, Ye
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES, VR 2024, 2024, : 387 - 397
  • [9] Audio-Driven Talking Face Video Generation With Dynamic Convolution Kernels
    Ye, Zipeng
    Xia, Mengfei
    Yi, Ran
    Zhang, Juyong
    Lai, Yu-Kun
    Huang, Xuwei
    Zhang, Guoxin
    Liu, Yong-Jin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2033 - 2046
  • [10] Audio-Driven Lips and Expression on 3D Human Face
    Ma, Le
    Ma, Zhihao
    Meng, Weiliang
    Xu, Shibiao
    Zhang, Xiaopeng
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT II, 2024, 14496 : 15 - 26