AUDIO2FACE: GENERATING SPEECH/FACE ANIMATION FROM SINGLE AUDIO WITH ATTENTION-BASED BIDIRECTIONAL LSTM NETWORKS

被引:26
|
作者
Tian, Guanzhong [1 ]
Yuan, Yi [2 ]
Liu, Yong [1 ]
机构
[1] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Zhejiang, Peoples R China
[2] Netease, Fuxi AI Lab, Guangzhou, Guangdong, Peoples R China
关键词
Animation; Long short-term memory network; Attention mechanism; DRIVEN;
D O I
10.1109/ICMEW.2019.00069
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose an end to end deep learning approach for generating real-time facial animation from just audio. Specifically, our deep architecture employs deep bidirectional long short-term memory network and attention mechanism to discover the latent representations of time-varying contextual information within the speech and recognize the significance of different information contributed to certain face status. Therefore, our model is able to drive different levels of facial movements at inference and automatically keep up with the corresponding pitch and latent speaking style in the input audio, with no assumption or further human intervention. Evaluation results show that our method could not only generate accurate lip movements from audio, but also successfully regress the speaker's time-varying facial movements.
引用
收藏
页码:366 / 371
页数:6
相关论文
共 44 条
  • [21] Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
    Zhao, Ziping
    Zheng, Yu
    Zhang, Zixing
    Wang, Haishuai
    Zhao, Yiqin
    Li, Chao
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 272 - 276
  • [22] Audio2DiffuGesture: Generating a diverse co-speech gesture based on a diffusion model
    Yao, Hongze
    Xu, Yingting
    Wu, Weitao
    He, Huabin
    Ren, Wen
    Cai, Zhiming
    [J]. ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (09): : 5392 - 5408
  • [23] Audio-visual speech translation with automatic lip synchronization and face tracking based on 3-D read model
    Morishima, S
    Ogata, S
    Murai, K
    Nakamura, S
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2117 - 2120
  • [24] FACE LANDMARK-BASED SPEAKER-INDEPENDENT AUDIO-VISUAL SPEECH ENHANCEMENT IN MULTI-TALKER ENVIRONMENTS
    Morrone, Giovanni
    Pasa, Luca
    Tikhanoff, Vadim
    Bergamaschi, Sonia
    Fadiga, Luciano
    Badino, Leonardo
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6900 - 6904
  • [25] Audio-visual speech translation with automatic LIP synchronization and face tracking based on 3-D head model
    Morishima, Shigeo
    Ogata, Shin
    Murai, Kazumasa
    Nakamura, Satoshi
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2002, 2
  • [26] One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning
    Wang, Suzhen
    Li, Lincheng
    Ding, Yu
    Yu, Xin
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2531 - 2539
  • [27] Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models
    Kleinlein, Ricardo
    Luna Jimenez, Cristina
    Manuel Montero, Juan
    Callejas, Zoraida
    Fernandez-Martinez, Fernando
    [J]. INTERSPEECH 2019, 2019, : 61 - 65
  • [28] Assessing the Risk of Extreme Storm Surges from Tropical Cyclones under Climate Change Using Bidirectional Attention-Based LSTM for Improved Prediction
    Ian, Vai-Kei
    Tang, Su-Kit
    Pau, Giovanni
    [J]. ATMOSPHERE, 2023, 14 (12)
  • [29] An Attention-based Bidirectional LSTM Model for Continuous Cross-Subject Estimation of Knee Joint Angle during Running from sEMG Signals
    Zangene, Alireza Rezaie
    Samuel, Oluwarotimi Williams
    Abbasi, Ali
    Nazarpour, Kianoush
    McEwan, Alistair A.
    Li, Guanglin
    [J]. 2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [30] Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries
    Alfattni, Ghada
    Peek, Niels
    Nenadic, Goran
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 123