AUDIO2FACE: GENERATING SPEECH/FACE ANIMATION FROM SINGLE AUDIO WITH ATTENTION-BASED BIDIRECTIONAL LSTM NETWORKS

被引:26
|
作者
Tian, Guanzhong [1 ]
Yuan, Yi [2 ]
Liu, Yong [1 ]
机构
[1] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Zhejiang, Peoples R China
[2] Netease, Fuxi AI Lab, Guangzhou, Guangdong, Peoples R China
关键词
Animation; Long short-term memory network; Attention mechanism; DRIVEN;
D O I
10.1109/ICMEW.2019.00069
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose an end to end deep learning approach for generating real-time facial animation from just audio. Specifically, our deep architecture employs deep bidirectional long short-term memory network and attention mechanism to discover the latent representations of time-varying contextual information within the speech and recognize the significance of different information contributed to certain face status. Therefore, our model is able to drive different levels of facial movements at inference and automatically keep up with the corresponding pitch and latent speaking style in the input audio, with no assumption or further human intervention. Evaluation results show that our method could not only generate accurate lip movements from audio, but also successfully regress the speaker's time-varying facial movements.
引用
收藏
页码:366 / 371
页数:6
相关论文
共 44 条
  • [1] GENERATING SYNTHETIC AUDIO DATA FOR ATTENTION-BASED SPEECH RECOGNITION SYSTEMS
    Rossenbach, Nick
    Zeyer, Albert
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7069 - 7073
  • [2] Mining audio/visual database for speech driven face animation
    Chen, YQ
    Gao, W
    Wang, ZQ
    Miao, J
    Jiang, DL
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 2638 - 2643
  • [3] Attention-Based LSTM Algorithm for Audio Replay Detection in Noisy Environments
    Li, Jiakang
    Zhang, Xiongwei
    Sun, Meng
    Zou, Xia
    Zheng, Changyan
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (08):
  • [4] Single Shot Attention-Based Face Detector
    Zhuang, Chubin
    Zhang, Shifeng
    Zhu, Xiangyu
    Lei, Zhen
    Li, Stan Z.
    [J]. BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 : 285 - 293
  • [5] Audio2Face基于音频文件智能生成虚拟角色面部动画
    蔡国鑫
    [J]. 现代电影技术, 2021, (09) : 60 - 61
  • [6] Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
    Sterpu, George
    Saam, Christian
    Harte, Naomi
    [J]. ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 111 - 115
  • [7] CMFF-Face: Attention-Based Cross-Modal Feature Fusion for High-Quality Audio-Driven Talking Face Generation
    Zhao, Guangzhe
    Liu, Yanan
    Wang, Xueping
    Yan, Feihu
    [J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 101 - 110
  • [8] Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation
    Sun, Yasheng
    Zhou, Hang
    Liu, Ziwei
    Koike, Hideki
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1018 - 1024
  • [9] Travel order quantity prediction via attention-based bidirectional LSTM networks
    Yang, Fei
    Zhang, Huyin
    Tao, Shiming
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (03): : 4398 - 4420
  • [10] Travel order quantity prediction via attention-based bidirectional LSTM networks
    Fei Yang
    Huyin Zhang
    Shiming Tao
    [J]. The Journal of Supercomputing, 2022, 78 : 4398 - 4420