Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks

被引:0
|
作者
Lee, Seo-Hyun [1 ]
Lee, Young-Eun [1 ]
Kim, Soowon [2 ]
Ko, Byung-Kwan [2 ]
Kim, Jun-Young [2 ]
机构
[1] Korea Univ, Dept Brain & Cognit Engn, Seoul, South Korea
[2] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea
关键词
brain-computer interface; deep neural networks; electroencephalogram; generative adversarial network; imagined speech; speech synthesis; COMMUNICATION; IMAGERY;
D O I
10.1109/BCI60775.2024.10480503
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-tospeech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in nonverbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Cross-Lingual Neural Network Speech Synthesis Based on Multiple Embeddings
    Nosek, Tijana, V
    Suzic, Sinisa B.
    Pekar, Darko J.
    Obradovic, Radovan J.
    Secujski, Milan S.
    Delic, Vlado D.
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 7 (02): : 110 - 120
  • [22] On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum
    Zelinka, Jan
    Salajka, Petr
    Mueller, Ludek
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 301 - 308
  • [23] Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication
    Liu, Miao
    Wang, Jing
    Wang, Fei
    Xiang, Fei
    Chen, Jingdong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 174 - 187
  • [24] GENERATIVE SPEECH ENHANCEMENT BASED ON CLONED NETWORKS
    Chinen, Michael
    Kleijn, W. Bastiaan
    Lim, Felicia S. C.
    Skoglund, Jan
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 214 - 218
  • [25] Deep Elman recurrent neural networks for statistical parametric speech synthesis
    Achanta, Sivanand
    Gangashetty, Suryakanth V.
    SPEECH COMMUNICATION, 2017, 93 : 31 - 42
  • [26] Head motion synthesis from speech using deep neural networks
    Ding, Chuang
    Xie, Lei
    Zhu, Pengcheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9871 - 9888
  • [27] Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 282 - 291
  • [28] Head motion synthesis from speech using deep neural networks
    Chuang Ding
    Lei Xie
    Pengcheng Zhu
    Multimedia Tools and Applications, 2015, 74 : 9871 - 9888
  • [29] Efficient deep neural networks for speech synthesis using bottleneck features
    Joo, Young-Sun
    Jun, Won-Suk
    Kang, Hong-Goo
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [30] Deep Neural Networks in Russian Speech Recognition
    Markovnikov, Nikita
    Kipyatkova, Irina
    Karpov, Alexey
    Filchenkov, Andrey
    ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67