AUDIO2FACE: GENERATING SPEECH/FACE ANIMATION FROM SINGLE AUDIO WITH ATTENTION-BASED BIDIRECTIONAL LSTM NETWORKS

被引：26

作者：

Tian, Guanzhong ^{[1
]}

Yuan, Yi ^{[2
]}

Liu, Yong ^{[1
]}

机构：

[1] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Zhejiang, Peoples R China

[2] Netease, Fuxi AI Lab, Guangzhou, Guangdong, Peoples R China

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW) | 2019年

关键词：

Animation; Long short-term memory network; Attention mechanism; DRIVEN;

D O I：

10.1109/ICMEW.2019.00069

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose an end to end deep learning approach for generating real-time facial animation from just audio. Specifically, our deep architecture employs deep bidirectional long short-term memory network and attention mechanism to discover the latent representations of time-varying contextual information within the speech and recognize the significance of different information contributed to certain face status. Therefore, our model is able to drive different levels of facial movements at inference and automatically keep up with the corresponding pitch and latent speaking style in the input audio, with no assumption or further human intervention. Evaluation results show that our method could not only generate accurate lip movements from audio, but also successfully regress the speaker's time-varying facial movements.

引用

页码：366 / 371

页数：6

共 44 条

[21] Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
Zhao, Ziping
Zheng, Yu
Zhang, Zixing
Wang, Haishuai
Zhao, Yiqin
Li, Chao
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 272 - 276
[22] Audio2DiffuGesture: Generating a diverse co-speech gesture based on a diffusion model
Yao, Hongze
Xu, Yingting
Wu, Weitao
He, Huabin
Ren, Wen
Cai, Zhiming
[J]. ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (09): : 5392 - 5408
[23] Audio-visual speech translation with automatic lip synchronization and face tracking based on 3-D read model
Morishima, S
Ogata, S
Murai, K
Nakamura, S
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2117 - 2120
[24] FACE LANDMARK-BASED SPEAKER-INDEPENDENT AUDIO-VISUAL SPEECH ENHANCEMENT IN MULTI-TALKER ENVIRONMENTS
Morrone, Giovanni
Pasa, Luca
Tikhanoff, Vadim
Bergamaschi, Sonia
Fadiga, Luciano
Badino, Leonardo
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6900 - 6904
[25] Audio-visual speech translation with automatic LIP synchronization and face tracking based on 3-D head model
Morishima, Shigeo
Ogata, Shin
Murai, Kazumasa
Nakamura, Satoshi
[J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2002, 2
[26] One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning
Wang, Suzhen
Li, Lincheng
Ding, Yu
Yu, Xin
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2531 - 2539
[27] Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models
Kleinlein, Ricardo
Luna Jimenez, Cristina
Manuel Montero, Juan
Callejas, Zoraida
Fernandez-Martinez, Fernando
[J]. INTERSPEECH 2019, 2019, : 61 - 65
[28] Assessing the Risk of Extreme Storm Surges from Tropical Cyclones under Climate Change Using Bidirectional Attention-Based LSTM for Improved Prediction
Ian, Vai-Kei
Tang, Su-Kit
Pau, Giovanni
[J]. ATMOSPHERE, 2023, 14 (12)
[29] An Attention-based Bidirectional LSTM Model for Continuous Cross-Subject Estimation of Knee Joint Angle during Running from sEMG Signals
Zangene, Alireza Rezaie
Samuel, Oluwarotimi Williams
Abbasi, Ali
Nazarpour, Kianoush
McEwan, Alistair A.
Li, Guanglin
[J]. 2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
[30] Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries
Alfattni, Ghada
Peek, Niels
Nenadic, Goran
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 123

← 1 2 3 4 5 →