End-to-end Learning for 3D Facial Animation from Speech

被引:26
|
作者
Pham, Hai X. [1 ]
Wang, Yuting [1 ]
Pavlovic, Vladimir [1 ]
机构
[1] Rutgers State Univ, New Brunswick, NJ 08901 USA
关键词
D O I
10.1145/3242969.3243017
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a deep learning framework for real-time speech-driven 3D facial animation from speech audio. Our deep neural network directly maps an input sequence of speech spectrograms to a series of micro facial action unit intensities to drive a 3D blendshape face model. In particular, our deep model is able to learn the latent representations of time-varying contextual information and affective states within the speech. Hence, our model not only activates appropriate facial action units at inference to depict different utterance generating actions, in the form of lip movements, but also, without any assumption, automatically estimates emotional intensity of the speaker and reproduces her ever-changing affective states by adjusting strength of related facial unit activations. For example, in a happy speech, the mouth opens wider than normal, while other facial units are relaxed; or both eyebrows raise higher in a surprised state. Experiments on diverse audiovisual corpora of different actors across a wide range of facial actions and emotional states show promising results of our approach. Being speaker-independent, our generalized model is readily applicable to various tasks in human-machine interaction and animation.
引用
收藏
页码:361 / 365
页数:5
相关论文
共 50 条
  • [1] Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion
    Karras, Tero
    Aila, Timo
    Laine, Samuli
    Herva, Antti
    Lehtinen, Jaakko
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (04):
  • [2] End-to-End Learning on 3D Protein Structure for Interface Prediction
    Townshend, Raphael J. L.
    Bedi, Rishi
    Suriana, Patricia A.
    Dror, Ron O.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] End-to-end 3D Modelling Solution
    不详
    [J]. GIM INTERNATIONAL-THE WORLDWIDE MAGAZINE FOR GEOMATICS, 2013, 27 (01): : 9 - 9
  • [4] End-to-end Learning of Multi-sensor 3D Tracking by Detection
    Frossard, Davi
    Urtasun, Raquel
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 635 - 642
  • [5] An end-to-end workflow for nondestructive 3D pathology
    Bishop, Kevin W.
    Erion Barner, Lindsey A.
    Han, Qinghua
    Baraznenok, Elena
    Lan, Lydia
    Poudel, Chetan
    Gao, Gan
    Serafin, Robert B.
    Chow, Sarah S. L.
    Glaser, Adam K.
    Janowczyk, Andrew
    Brenes, David
    Huang, Hongyi
    Miyasato, Dominie
    True, Lawrence D.
    Kang, Soyoung
    Vaughan, Joshua C.
    Liu, Jonathan T. C.
    [J]. NATURE PROTOCOLS, 2024, 19 (04) : 1122 - 1148
  • [6] End-to-end 3D Tracking with Decoupled Queries
    Li, Yanwei
    Yu, Zhiding
    Philion, Jonah
    Anandkumar, Anima
    Fidler, Sanja
    Jia, Jiaya
    Alvarez, Jose
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18256 - 18265
  • [7] 3D Dosimetry in End-To-End Dosimetry QA
    Ibbott, G.
    [J]. MEDICAL PHYSICS, 2016, 43 (06) : 3695 - 3695
  • [8] An end-to-end workflow for nondestructive 3D pathology
    Kevin W. Bishop
    Lindsey A. Erion Barner
    Qinghua Han
    Elena Baraznenok
    Lydia Lan
    Chetan Poudel
    Gan Gao
    Robert B. Serafin
    Sarah S. L. Chow
    Adam K. Glaser
    Andrew Janowczyk
    David Brenes
    Hongyi Huang
    Dominie Miyasato
    Lawrence D. True
    Soyoung Kang
    Joshua C. Vaughan
    Jonathan T. C. Liu
    [J]. Nature Protocols, 2024, 19 : 1122 - 1148
  • [9] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
    Zhang, Xitie
    Wu, Suping
    [J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
  • [10] DeepHPS: End-to-end Estimation of 3D Hand Pose and Shape by Learning from Synthetic Depth
    Malik, Jameel
    Elhayek, Ahmed
    Nunnari, Fabrizio
    Varanasi, Kiran
    Tamaddon, Kiarash
    Heloir, Alexis
    Stricker, Didier
    [J]. 2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, : 110 - 119