Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks

被引：0

作者：

Lee, Seo-Hyun ^{[1
]}

Lee, Young-Eun ^{[1
]}

Kim, Soowon ^{[2
]}

Ko, Byung-Kwan ^{[2
]}

Kim, Jun-Young ^{[2
]}

机构：

[1] Korea Univ, Dept Brain & Cognit Engn, Seoul, South Korea

[2] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea

来源：

2024 12TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI 2024 | 2024年

关键词：

brain-computer interface; deep neural networks; electroencephalogram; generative adversarial network; imagined speech; speech synthesis; COMMUNICATION; IMAGERY;

D O I：

10.1109/BCI60775.2024.10480503

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-tospeech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in nonverbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.

引用

页数：4

共 50 条

[31] Deep Segmental Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Deng, Li
Yu, Dong
Jiang, Hui
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852
[32] Speech watermarking using Deep Neural Networks
Pavlovic, Kosta
Kovacevic, Slavko
Durovic, Igor
2020 28TH TELECOMMUNICATIONS FORUM (TELFOR), 2020, : 292 - 295
[33] DEEP MAXOUT NEURAL NETWORKS FOR SPEECH RECOGNITION
Cai, Meng
Shi, Yongzhe
Liu, Jia
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 291 - 296
[34] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
Graves, Alex
Mohamed, Abdel-rahman
Hinton, Geoffrey
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
[35] Binary Deep Neural Networks for Speech Recognition
Xiang, Xu
Qian, Yanmin
Yu, Kai
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 533 - 537
[36] Predicting speech intelligibility with deep neural networks
Spille, Constantin
Ewert, Stephan D.
Kollmeier, Birger
Meyer, Bernd T.
COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 51 - 66
[37] Speech synthesis with face embeddings
Wu, Xing
Ji, Sihui
Wang, Jianjia
Guo, Yike
APPLIED INTELLIGENCE, 2022, 52 (13) : 14839 - 14852
[38] Research on Dungan speech synthesis based on Deep Neural Network
Chen, Lijia
Yang, Hongwu
Wang, Hui
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 46 - 50
[39] Speech synthesis with face embeddings
Xing Wu
Sihui Ji
Jianjia Wang
Yike Guo
Applied Intelligence, 2022, 52 : 14839 - 14852
[40] Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
Wang, Ke
Zhang, Junbo
Sun, Sining
Wang, Yujun
Xiang, Fei
Xie, Lei
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1581 - 1585

← 1 2 3 4 5 →