Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder

被引:8
|
作者
Csapo, Tamas Gabor [1 ,2 ]
Al-Radhi, Mohammed Salah [1 ]
Nemeth, Geza [1 ]
Gosztolya, Gabor [3 ,4 ]
Grosz, Tamas [4 ]
Toth, Laszlo [4 ]
Marko, Alexandra [2 ,5 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
[2] MTA ELTE Lendulet Lingual Articulat Res Grp, Budapest, Hungary
[3] MTA SZTE Res Grp Artificial Intelligence, Szeged, Hungary
[4] Univ Szeged, Inst Informat, Szeged, Hungary
[5] Eotvos Lorand Univ, Dept Phonet, Budapest, Hungary
来源
关键词
Silent speech interface; articulatory-to-acoustic mapping; F0; prediction; CNN; HMM;
D O I
10.21437/Interspeech.2019-2046
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic mapping. Moreover, text-to-speech synthesizers were shown to produce higher quality speech when using a continuous pitch estimate, which takes non-zero pitch values even when voicing is not present. Therefore, in this paper on UTI-based SSI, we use a simple continuous F0 tracker which does not apply a strict voiced / unvoiced decision. Continuous vocoder parameters (ContF0, Maximum Voiced Frequency and Mel-Generalized Cepstrum) are predicted using a convolutional neural network, with UTI as input. The results demonstrate that during the articulatory-to-acoustic mapping experiments, the continuous F0 is predicted with lower error, and the continuous vocoder produces slightly more natural synthesized speech than the baseline vocoder using standard discontinuous F0.
引用
收藏
页码:894 / 898
页数:5
相关论文
共 50 条
  • [1] Eigentongue feature extraction for an ultrasound-based silent speech interface
    Hueber, T.
    Aversano, G.
    Chollet, G.
    Denby, B.
    Dreyfus, G.
    Oussar, Y.
    Roussel, P.
    Stone, M.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, : 1245 - +
  • [2] Silent vs Vocalized Articulation for a Portable Ultrasound-Based Silent Speech Interface
    Florescu, Victoria-M
    Crevier-Buchman, Lise
    Denby, Bruce
    Hueber, Thomas
    Colazo-Simon, Antonia
    Pillot-Loiseau, Claire
    Roussel, Pierre
    Gendrot, Cedric
    Quattrocchi, Sophie
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 450 - +
  • [3] Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks
    Moliner Juanpere, Eloi
    Csapo, Tamas Gabor
    [J]. ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (04) : 587 - 590
  • [4] Ultrasound-Based Silent Speech Interface using Sequential Convolutional Auto-encoder
    Xu, Kele
    Wu, Yuxiang
    Gao, Zhifeng
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2194 - 2195
  • [5] Statistical Mapping between Articulatory and Acoustic Data for an Ultrasound-based Silent Speech Interface
    Hueber, Thomas
    Benaroya, Elie-Laurent
    Denby, Bruce
    Chollet, Gerard
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 600 - +
  • [6] Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces
    Shandiz, Amin Honarmandi
    Toth, Laszlo
    Gosztolya, Gabor
    Marko, Alexandra
    Csapo, Tamas Gabor
    [J]. INTERSPEECH 2021, 2021, : 1932 - 1936
  • [7] Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
    Toth, Laszlo
    Gosztolya, Gabor
    Grosz, Tamas
    Marko, Alexandra
    Csapo, Tamas Gabor
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3172 - 3176
  • [8] DNN-based Ultrasound-to-Speech Conversion for a Silent Speech Interface
    Csapo, Temas Gabor
    Grosz, Tamas
    Gosztolya, Gabor
    Toth, Laszlo
    Marko, Alexandra
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3672 - 3676
  • [9] Vocoder-Based Speech Synthesis from Silent Videos
    Michelsanti, Daniel
    Slizovskaia, Olga
    Haro, Gloria
    Gomez, Emilia
    Tan, Zheng-Hua
    Jensen, Jesper
    [J]. INTERSPEECH 2020, 2020, : 3530 - 3534
  • [10] Visuo-Phonetic Decoding using Multi-Stream and Context-Dependent Models for an Ultrasound-based Silent Speech Interface
    Hueber, Thomas
    Benaroya, Elie-Laurent
    Chollet, Gerard
    Denby, Bruce
    Dreyfus, Gerard
    Stone, Maureen
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 628 - +