Learning Continuous Facial Actions From Speech for Real-Time Animation

被引:0
|
作者
Pham, Hai X. [1 ]
Wang, Yuting [2 ]
Pavlovic, Vladimir [2 ]
机构
[1] Samsung AI Ctr, Cambridge CB1 2JH, England
[2] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA
关键词
Three-dimensional displays; Acoustics; Faces; Hidden Markov models; Face recognition; Animation; Solid modeling; Deep learning; speech; emotion; facial action unit; animation; CONVOLUTIONAL NEURAL-NETWORKS; FACE; DRIVEN; EMOTION; ROBUST;
D O I
10.1109/TAFFC.2020.3022017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech conveys not only the verbal communication, but also emotions, manifested as facial expressions of the speaker. In this article, we present deep learning frameworks that directly infer facial expressions from just speech signals. Specifically, the time-varying contextual non-linear mapping between audio stream and micro facial movements is realized by our proposed recurrent neural networks to drive a 3D blendshape face model in real-time. Our models not only activate appropriate facial action units (AUs), defined as 3D expression blendshapes in the FaceWarehouse database, to depict different utterance generating actions in the form of lip movements, but also, without any assumption, automatically estimate emotional intensity of the speaker and reproduces her ever-changing affective states by adjusting strength of related facial unit activations. In the baseline models, conventional handcrafted acoustic features are utilized to predict facial actions. Furthermore, we show that it is more advantageous to learn meaningful acoustic feature representation from speech spectrograms with convolutional nets, which subsequently improves the accuracy of facial action synthesis. Experiments on diverse audiovisual corpora of different actors across a wide range of facial actions and emotional states show promising results of our approaches. Being speaker-independent, our generalized models are readily applicable to various tasks in human-machine interaction and animation.
引用
收藏
页码:1567 / 1580
页数:14
相关论文
共 50 条
  • [1] SYNTHESIZING REAL-TIME SPEECH-DRIVEN FACIAL ANIMATION
    Luo, Changwei
    Yu, Jun
    Wang, Zengfu
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] Real-Time Facial Character Animation
    Tasli, H. Emrah
    den Uyl, Tim M.
    Boujut, Hugo
    Zaharia, Titus
    [J]. 2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), VOL. 1, 2015,
  • [3] Real-time facial animation on mobile devices
    Weng, Yanlin
    Cao, Chen
    Hou, Qiming
    Zhou, Kun
    [J]. GRAPHICAL MODELS, 2014, 76 : 172 - 179
  • [4] Real-Time Speech Driven Gesture Animation
    Kasarci, Kenan
    Bozkurt, Elif
    Yemez, Yucel
    Erzin, Engin
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1917 - 1920
  • [5] Multimedia authoring tool for real-time facial animation
    Byun, H. W.
    [J]. MULTIMEDIA CONTENT ANALYSIS AND MINING, PROCEEDINGS, 2007, 4577 : 295 - +
  • [6] Easy acquisition and real-time animation of facial wrinkles
    Dutreve, Ludovic
    Meyer, Alexandre
    Bouakaz, Sada
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2011, 22 (2-3) : 169 - 176
  • [7] Fuzzy connections in realistic real-time facial animation
    Rigiroli, P
    Borghese, NA
    [J]. NEURAL NETS WIRN VIETRI-01, 2002, : 207 - 211
  • [8] Supporting Ubiquitous Collaboration with Real-Time Facial Animation
    Jiang, Bo
    [J]. COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN IV, 2008, 5236 : 44 - 55
  • [9] Automatic face cloning and animation - Using real-time facial feature tracking and speech acquisition
    Goto, T
    Kshirsagar, S
    Magnenat-Thalmann, N
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2001, 18 (03) : 17 - 25
  • [10] The Effect of Real-Time Constraints on Automatic Speech Animation
    Websdale, Danny
    Taylor, Sarah
    Milner, Ben
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2479 - 2483