Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks

被引：8

作者：

Moliner Juanpere, Eloi ^{[1
]}

Csapo, Tamas Gabor ^{[2
,3
]}

机构：

[1] UPC Barcelona Sch Telecommun Engn ETSETB, Barcelona, Spain

[2] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary

[3] MTA ELTE Lendulet Lingual Articulat Res Grp, Budapest, Hungary

来源：

ACTA ACUSTICA UNITED WITH ACUSTICA | 2019年 / 105卷 / 04期

关键词：

TONGUE;

D O I：

10.3813/AAA.919339

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN and bidirectional LSTM layers has shown the best objective and subjective results. (C) 2019 The Author(s). Published by S. Hirzel Verlag . EAA.

引用

页码：587 / 590

页数：4

共 50 条

[21] Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks
Gao, Yingming
Birkholz, Peter
Li, Ya
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1845 - 1858
[22] Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
Toth, Laszlo
Gosztolya, Gabor
Grosz, Tamas
Marko, Alexandra
Csapo, Tamas Gabor
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3172 - 3176
[23] DNN-based Ultrasound-to-Speech Conversion for a Silent Speech Interface
Csapo, Temas Gabor
Grosz, Tamas
Gosztolya, Gabor
Toth, Laszlo
Marko, Alexandra
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3672 - 3676
[24] Speech Emotion Recognition Using Multichannel Parallel Convolutional Recurrent Neural Networks based on Gammatone Auditory Filterbank
Peng, Zhichao
Zhu, Zhi
Unoki, Masashi
Dang, Jianwu
Akagi, Masato
[J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1750 - 1755
[25] Speech prediction using recurrent neural networks
Varoglu, E
Hacioglu, K
[J]. ELECTRONICS LETTERS, 1999, 35 (16) : 1353 - 1355
[26] Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition
Kim, Taejun
Nam, Juhan
[J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 437 - 441
[27] Rainfall Prediction using Spatial Convolutional Neural Networks and Recurrent Neural Networks
Lestari, Nadia Dwi Puji
Djamal, Esmeralda Contessa
[J]. 2022 International Conference on Data Science and Its Applications, ICoDSA 2022, 2022, : 12 - 17
[28] Automatic playlist generation using Convolutional Neural Networks and Recurrent Neural Networks
Irene, Rosilde Tatiana
Borrelli, Clara
Zanoni, Massimiliano
Buccoli, Michele
Sarti, Augusto
[J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[29] Rainfall Prediction using Spatial Convolutional Neural Networks and Recurrent Neural Networks
Lestari, Nadia Dwi Puji
Djamal, Esmeralda Contessa
[J]. 2022 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ITS APPLICATIONS (ICODSA), 2022, : 12 - 17
[30] CONVOLUTIONAL NEURAL NETWORKS-BASED CONTINUOUS SPEECH RECOGNITION USING RAW SPEECH SIGNAL
Palaz, Dimitri
Magimai-Doss, Mathew
Collobert, Ronan
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4295 - 4299

← 1 2 3 4 5 →