AUDIO2FACE: GENERATING SPEECH/FACE ANIMATION FROM SINGLE AUDIO WITH ATTENTION-BASED BIDIRECTIONAL LSTM NETWORKS

被引：26

作者：

Tian, Guanzhong ^{[1
]}

Yuan, Yi ^{[2
]}

Liu, Yong ^{[1
]}

机构：

[1] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Zhejiang, Peoples R China

[2] Netease, Fuxi AI Lab, Guangzhou, Guangdong, Peoples R China

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW) | 2019年

关键词：

Animation; Long short-term memory network; Attention mechanism; DRIVEN;

D O I：

10.1109/ICMEW.2019.00069

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose an end to end deep learning approach for generating real-time facial animation from just audio. Specifically, our deep architecture employs deep bidirectional long short-term memory network and attention mechanism to discover the latent representations of time-varying contextual information within the speech and recognize the significance of different information contributed to certain face status. Therefore, our model is able to drive different levels of facial movements at inference and automatically keep up with the corresponding pitch and latent speaking style in the input audio, with no assumption or further human intervention. Evaluation results show that our method could not only generate accurate lip movements from audio, but also successfully regress the speaker's time-varying facial movements.

引用

页码：366 / 371

页数：6

共 44 条

[1] GENERATING SYNTHETIC AUDIO DATA FOR ATTENTION-BASED SPEECH RECOGNITION SYSTEMS
Rossenbach, Nick
Zeyer, Albert
Schlueter, Ralf
Ney, Hermann
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7069 - 7073
[2] Mining audio/visual database for speech driven face animation
Chen, YQ
Gao, W
Wang, ZQ
Miao, J
Jiang, DL
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 2638 - 2643
[3] Attention-Based LSTM Algorithm for Audio Replay Detection in Noisy Environments
Li, Jiakang
Zhang, Xiongwei
Sun, Meng
Zou, Xia
Zheng, Changyan
[J]. APPLIED SCIENCES-BASEL, 2019, 9 (08):
[4] Single Shot Attention-Based Face Detector
Zhuang, Chubin
Zhang, Shifeng
Zhu, Xiangyu
Lei, Zhen
Li, Stan Z.
[J]. BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 : 285 - 293
[5] Audio2Face基于音频文件智能生成虚拟角色面部动画
蔡国鑫
[J]. 现代电影技术, 2021, (09) : 60 - 61
[6] Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
Sterpu, George
Saam, Christian
Harte, Naomi
[J]. ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 111 - 115
[7] CMFF-Face: Attention-Based Cross-Modal Feature Fusion for High-Quality Audio-Driven Talking Face Generation
Zhao, Guangzhe
Liu, Yanan
Wang, Xueping
Yan, Feihu
[J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 101 - 110
[8] Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation
Sun, Yasheng
Zhou, Hang
Liu, Ziwei
Koike, Hideki
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1018 - 1024
[9] Travel order quantity prediction via attention-based bidirectional LSTM networks
Yang, Fei
Zhang, Huyin
Tao, Shiming
[J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (03): : 4398 - 4420
[10] Travel order quantity prediction via attention-based bidirectional LSTM networks
Fei Yang
Huyin Zhang
Shiming Tao
[J]. The Journal of Supercomputing, 2022, 78 : 4398 - 4420

← 1 2 3 4 5 →