Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks

被引:8
|
作者
Moliner Juanpere, Eloi [1 ]
Csapo, Tamas Gabor [2 ,3 ]
机构
[1] UPC Barcelona Sch Telecommun Engn ETSETB, Barcelona, Spain
[2] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
[3] MTA ELTE Lendulet Lingual Articulat Res Grp, Budapest, Hungary
关键词
TONGUE;
D O I
10.3813/AAA.919339
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN and bidirectional LSTM layers has shown the best objective and subjective results. (C) 2019 The Author(s). Published by S. Hirzel Verlag . EAA.
引用
收藏
页码:587 / 590
页数:4
相关论文
共 50 条
  • [21] Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks
    Gao, Yingming
    Birkholz, Peter
    Li, Ya
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1845 - 1858
  • [22] Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
    Toth, Laszlo
    Gosztolya, Gabor
    Grosz, Tamas
    Marko, Alexandra
    Csapo, Tamas Gabor
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3172 - 3176
  • [23] DNN-based Ultrasound-to-Speech Conversion for a Silent Speech Interface
    Csapo, Temas Gabor
    Grosz, Tamas
    Gosztolya, Gabor
    Toth, Laszlo
    Marko, Alexandra
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3672 - 3676
  • [24] Speech Emotion Recognition Using Multichannel Parallel Convolutional Recurrent Neural Networks based on Gammatone Auditory Filterbank
    Peng, Zhichao
    Zhu, Zhi
    Unoki, Masashi
    Dang, Jianwu
    Akagi, Masato
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1750 - 1755
  • [25] Speech prediction using recurrent neural networks
    Varoglu, E
    Hacioglu, K
    [J]. ELECTRONICS LETTERS, 1999, 35 (16) : 1353 - 1355
  • [26] Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition
    Kim, Taejun
    Nam, Juhan
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 437 - 441
  • [27] Rainfall Prediction using Spatial Convolutional Neural Networks and Recurrent Neural Networks
    Lestari, Nadia Dwi Puji
    Djamal, Esmeralda Contessa
    [J]. 2022 International Conference on Data Science and Its Applications, ICoDSA 2022, 2022, : 12 - 17
  • [28] Automatic playlist generation using Convolutional Neural Networks and Recurrent Neural Networks
    Irene, Rosilde Tatiana
    Borrelli, Clara
    Zanoni, Massimiliano
    Buccoli, Michele
    Sarti, Augusto
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [29] Rainfall Prediction using Spatial Convolutional Neural Networks and Recurrent Neural Networks
    Lestari, Nadia Dwi Puji
    Djamal, Esmeralda Contessa
    [J]. 2022 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ITS APPLICATIONS (ICODSA), 2022, : 12 - 17
  • [30] CONVOLUTIONAL NEURAL NETWORKS-BASED CONTINUOUS SPEECH RECOGNITION USING RAW SPEECH SIGNAL
    Palaz, Dimitri
    Magimai-Doss, Mathew
    Collobert, Ronan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4295 - 4299