Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks

被引：0

作者：

Lee, Seo-Hyun ^{[1
]}

Lee, Young-Eun ^{[1
]}

Kim, Soowon ^{[2
]}

Ko, Byung-Kwan ^{[2
]}

Kim, Jun-Young ^{[2
]}

机构：

[1] Korea Univ, Dept Brain & Cognit Engn, Seoul, South Korea

[2] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea

来源：

2024 12TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI 2024 | 2024年

关键词：

brain-computer interface; deep neural networks; electroencephalogram; generative adversarial network; imagined speech; speech synthesis; COMMUNICATION; IMAGERY;

D O I：

10.1109/BCI60775.2024.10480503

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-tospeech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in nonverbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.

引用

页数：4

共 50 条

[21] Cross-Lingual Neural Network Speech Synthesis Based on Multiple Embeddings
Nosek, Tijana, V
Suzic, Sinisa B.
Pekar, Darko J.
Obradovic, Radovan J.
Secujski, Milan S.
Delic, Vlado D.
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 7 (02): : 110 - 120
[22] On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum
Zelinka, Jan
Salajka, Petr
Mueller, Ludek
SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 301 - 308
[23] Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication
Liu, Miao
Wang, Jing
Wang, Fei
Xiang, Fei
Chen, Jingdong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 174 - 187
[24] GENERATIVE SPEECH ENHANCEMENT BASED ON CLONED NETWORKS
Chinen, Michael
Kleijn, W. Bastiaan
Lim, Felicia S. C.
Skoglund, Jan
2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 214 - 218
[25] Deep Elman recurrent neural networks for statistical parametric speech synthesis
Achanta, Sivanand
Gangashetty, Suryakanth V.
SPEECH COMMUNICATION, 2017, 93 : 31 - 42
[26] Head motion synthesis from speech using deep neural networks
Ding, Chuang
Xie, Lei
Zhu, Pengcheng
MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9871 - 9888
[27] Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder
Al-Radhi, Mohammed Salah
Csapo, Tamas Gabor
Nemeth, Geza
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 282 - 291
[28] Head motion synthesis from speech using deep neural networks
Chuang Ding
Lei Xie
Pengcheng Zhu
Multimedia Tools and Applications, 2015, 74 : 9871 - 9888
[29] Efficient deep neural networks for speech synthesis using bottleneck features
Joo, Young-Sun
Jun, Won-Suk
Kang, Hong-Goo
2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[30] Deep Neural Networks in Russian Speech Recognition
Markovnikov, Nikita
Kipyatkova, Irina
Karpov, Alexey
Filchenkov, Andrey
ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67

← 1 2 3 4 5 →