A Real-time Speech Driven Talking Avatar based on Deep Neural Network

被引:0
|
作者
Zhao, Kai [1 ]
Wu, Zhiyong [1 ]
Cai, Lianhong [1 ]
机构
[1] Tsinghua Univ, Grad Sch Shenzhen, Shenzhen Key Lab Informat Sci & Technol, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Shenzhen 518057, Peoples R China
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes our initial work in developing a real-time speech driven talking avatar system with deep neural network. The input of the system is the acoustic speech and the output is the articulatory movements (that are synchronized with the input speech) on a 3-dimentional avatar. The mapping from the input acoustic features to the output articulatory features is achieved by virtue of deep neural network (DNN). Experiments on the well known acoustic-articulatory English speech corpus MNGU0 demonstrate that the proposed audio-visual mapping method based on DNN can achieve good performance.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] A Real-Time Trajectory Optimization Method for Hypersonic Vehicles Based on a Deep Neural Network
    Wang, Jianying
    Wu, Yuanpei
    Liu, Ming
    Yang, Ming
    Liang, Haizhao
    AEROSPACE, 2022, 9 (04)
  • [32] Real-time Detection Method of Newborn Piglets Based on Deep Convolution Neural Network
    Shen M.
    Tai M.
    Cedric O.
    Liu L.
    Li J.
    Sun Y.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2019, 50 (08): : 270 - 279
  • [33] A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
    Vuong, Tyler
    Xia, Yangyang
    Stern, Richard M.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6643 - 6647
  • [34] A neural network based real-time gaze tracker
    Piratla, NM
    Jayasumana, AP
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2002, 25 (03) : 179 - 196
  • [35] A Real-Time Neural Network based Color Classifier
    Penharbel, Eder Augusto
    Goncalves, Ben Hur
    Francelin Romero, Roseli Aparecida
    2008 5TH LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS 2008), 2008, : 35 - 39
  • [36] Real-Time Speech Driven Gesture Animation
    Kasarci, Kenan
    Bozkurt, Elif
    Yemez, Yucel
    Erzin, Engin
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1917 - 1920
  • [37] Towards Real-time Speech Emotion Recognition using Deep Neural Networks
    Fayek, H. M.
    Lech, M.
    Cavedon, L.
    2015 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2015,
  • [38] Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis
    Li, Xu
    Wu, Zhiyong
    Meng, Helen
    Jia, Jia
    Lou, Xiaoyan
    Cai, Lianhong
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1472 - 1476
  • [39] TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6875 - 6879
  • [40] Real-time speech synthesis system driven by visual speech
    Li, G
    Xie, GM
    Lin, L
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION SCIENCE AND TECHNOLOGY, VOL 2, 2004, : 397 - 402