Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks

被引:0
|
作者
Lee, Seo-Hyun [1 ]
Lee, Young-Eun [1 ]
Kim, Soowon [2 ]
Ko, Byung-Kwan [2 ]
Kim, Jun-Young [2 ]
机构
[1] Korea Univ, Dept Brain & Cognit Engn, Seoul, South Korea
[2] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea
关键词
brain-computer interface; deep neural networks; electroencephalogram; generative adversarial network; imagined speech; speech synthesis; COMMUNICATION; IMAGERY;
D O I
10.1109/BCI60775.2024.10480503
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-tospeech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in nonverbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Multi-mode Neural Speech Coding Based on Deep Generative Networks
    Xiao, Wei
    Liu, Wenzhe
    Wang, Meng
    Yang, Shan
    Shi, Yupeng
    Kang, Yuyong
    Su, Dan
    Shang, Shidong
    Yu, Dong
    INTERSPEECH 2023, 2023, : 819 - 823
  • [2] Learning and Modeling Unit Embeddings Using Deep Neural Networks for Unit-Selection-Based Mandarin Speech Synthesis
    Zhou, Xiao
    Ling, Zhen-Hua
    Dai, Li-Rong
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (03)
  • [3] A Study on Tailor-Made Speech Synthesis Based on Deep Neural Networks
    Yamada, Shuhei
    Nose, Takashi
    Ito, Akinori
    ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 1, 2017, 63 : 159 - 166
  • [4] Speech bandwidth expansion based on Deep Neural Networks
    Wang, Yingxue
    Zhao, Shenghui
    Liu, Wenbo
    Li, Ming
    Kuang, Jingming
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2593 - 2597
  • [5] Mongolian Speech Recognition Based on Deep Neural Networks
    Zhang, Hui
    Bao, Feilong
    Gao, Guanglai
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
  • [6] Czech Speech Synthesis with Generative Neural Vocoder
    Vit, Jakub
    Hanzlicek, Zdenek
    Matousek, Jindrich
    TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 307 - 315
  • [7] STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS
    Zen, Heiga
    Senior, Andrew
    Schuster, Mike
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7962 - 7966
  • [8] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [9] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
    Du, Jun
    Wang, Qing
    Gao, Tian
    Xu, Yong
    Dai, Lirong
    Lee, Chin-Hui
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620
  • [10] The Representation of Speech in Deep Neural Networks
    Scharenborg, Odette
    van der Gouw, Nikki
    Larson, Martha
    Marchiori, Elena
    MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 194 - 205