End-to-end Learning for 3D Facial Animation from Speech

被引：26

作者：

Pham, Hai X. ^{[1
]}

Wang, Yuting ^{[1
]}

Pavlovic, Vladimir ^{[1
]}

机构：

[1] Rutgers State Univ, New Brunswick, NJ 08901 USA

来源：

ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2018年

关键词：

D O I：

10.1145/3242969.3243017

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a deep learning framework for real-time speech-driven 3D facial animation from speech audio. Our deep neural network directly maps an input sequence of speech spectrograms to a series of micro facial action unit intensities to drive a 3D blendshape face model. In particular, our deep model is able to learn the latent representations of time-varying contextual information and affective states within the speech. Hence, our model not only activates appropriate facial action units at inference to depict different utterance generating actions, in the form of lip movements, but also, without any assumption, automatically estimates emotional intensity of the speaker and reproduces her ever-changing affective states by adjusting strength of related facial unit activations. For example, in a happy speech, the mouth opens wider than normal, while other facial units are relaxed; or both eyebrows raise higher in a surprised state. Experiments on diverse audiovisual corpora of different actors across a wide range of facial actions and emotional states show promising results of our approach. Being speaker-independent, our generalized model is readily applicable to various tasks in human-machine interaction and animation.

引用

页码：361 / 365

页数：5

共 50 条

[1] Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion
Karras, Tero
Aila, Timo
Laine, Samuli
Herva, Antti
Lehtinen, Jaakko
[J]. ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (04):
[2] End-to-End Learning on 3D Protein Structure for Interface Prediction
Townshend, Raphael J. L.
Bedi, Rishi
Suriana, Patricia A.
Dror, Ron O.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] End-to-end 3D Modelling Solution
不详
[J]. GIM INTERNATIONAL-THE WORLDWIDE MAGAZINE FOR GEOMATICS, 2013, 27 (01): : 9 - 9
[4] End-to-end Learning of Multi-sensor 3D Tracking by Detection
Frossard, Davi
Urtasun, Raquel
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 635 - 642
[5] An end-to-end workflow for nondestructive 3D pathology
Bishop, Kevin W.
Erion Barner, Lindsey A.
Han, Qinghua
Baraznenok, Elena
Lan, Lydia
Poudel, Chetan
Gao, Gan
Serafin, Robert B.
Chow, Sarah S. L.
Glaser, Adam K.
Janowczyk, Andrew
Brenes, David
Huang, Hongyi
Miyasato, Dominie
True, Lawrence D.
Kang, Soyoung
Vaughan, Joshua C.
Liu, Jonathan T. C.
[J]. NATURE PROTOCOLS, 2024, 19 (04) : 1122 - 1148
[6] End-to-end 3D Tracking with Decoupled Queries
Li, Yanwei
Yu, Zhiding
Philion, Jonah
Anandkumar, Anima
Fidler, Sanja
Jia, Jiaya
Alvarez, Jose
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18256 - 18265
[7] 3D Dosimetry in End-To-End Dosimetry QA
Ibbott, G.
[J]. MEDICAL PHYSICS, 2016, 43 (06) : 3695 - 3695
[8] An end-to-end workflow for nondestructive 3D pathology
Kevin W. Bishop
Lindsey A. Erion Barner
Qinghua Han
Elena Baraznenok
Lydia Lan
Chetan Poudel
Gan Gao
Robert B. Serafin
Sarah S. L. Chow
Adam K. Glaser
Andrew Janowczyk
David Brenes
Hongyi Huang
Dominie Miyasato
Lawrence D. True
Soyoung Kang
Joshua C. Vaughan
Jonathan T. C. Liu
[J]. Nature Protocols, 2024, 19 : 1122 - 1148
[9] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
Zhang, Xitie
Wu, Suping
[J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
[10] DeepHPS: End-to-end Estimation of 3D Hand Pose and Shape by Learning from Synthetic Depth
Malik, Jameel
Elhayek, Ahmed
Nunnari, Fabrizio
Varanasi, Kiran
Tamaddon, Kiarash
Heloir, Alexis
Stricker, Didier
[J]. 2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, : 110 - 119

← 1 2 3 4 5 →