Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks

被引:0
|
作者
Lee, Seo-Hyun [1 ]
Lee, Young-Eun [1 ]
Kim, Soowon [2 ]
Ko, Byung-Kwan [2 ]
Kim, Jun-Young [2 ]
机构
[1] Korea Univ, Dept Brain & Cognit Engn, Seoul, South Korea
[2] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea
关键词
brain-computer interface; deep neural networks; electroencephalogram; generative adversarial network; imagined speech; speech synthesis; COMMUNICATION; IMAGERY;
D O I
10.1109/BCI60775.2024.10480503
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-tospeech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in nonverbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] Deep Segmental Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Deng, Li
    Yu, Dong
    Jiang, Hui
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852
  • [32] Speech watermarking using Deep Neural Networks
    Pavlovic, Kosta
    Kovacevic, Slavko
    Durovic, Igor
    2020 28TH TELECOMMUNICATIONS FORUM (TELFOR), 2020, : 292 - 295
  • [33] DEEP MAXOUT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Cai, Meng
    Shi, Yongzhe
    Liu, Jia
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 291 - 296
  • [34] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
    Graves, Alex
    Mohamed, Abdel-rahman
    Hinton, Geoffrey
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
  • [35] Binary Deep Neural Networks for Speech Recognition
    Xiang, Xu
    Qian, Yanmin
    Yu, Kai
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 533 - 537
  • [36] Predicting speech intelligibility with deep neural networks
    Spille, Constantin
    Ewert, Stephan D.
    Kollmeier, Birger
    Meyer, Bernd T.
    COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 51 - 66
  • [37] Speech synthesis with face embeddings
    Wu, Xing
    Ji, Sihui
    Wang, Jianjia
    Guo, Yike
    APPLIED INTELLIGENCE, 2022, 52 (13) : 14839 - 14852
  • [38] Research on Dungan speech synthesis based on Deep Neural Network
    Chen, Lijia
    Yang, Hongwu
    Wang, Hui
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 46 - 50
  • [39] Speech synthesis with face embeddings
    Xing Wu
    Sihui Ji
    Jianjia Wang
    Yike Guo
    Applied Intelligence, 2022, 52 : 14839 - 14852
  • [40] Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
    Wang, Ke
    Zhang, Junbo
    Sun, Sining
    Wang, Yujun
    Xiang, Fei
    Xie, Lei
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1581 - 1585