Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks

被引:8
|
作者
Moliner Juanpere, Eloi [1 ]
Csapo, Tamas Gabor [2 ,3 ]
机构
[1] UPC Barcelona Sch Telecommun Engn ETSETB, Barcelona, Spain
[2] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
[3] MTA ELTE Lendulet Lingual Articulat Res Grp, Budapest, Hungary
关键词
TONGUE;
D O I
10.3813/AAA.919339
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN and bidirectional LSTM layers has shown the best objective and subjective results. (C) 2019 The Author(s). Published by S. Hirzel Verlag . EAA.
引用
收藏
页码:587 / 590
页数:4
相关论文
共 50 条
  • [1] Ultrasound-Based Silent Speech Interface using Sequential Convolutional Auto-encoder
    Xu, Kele
    Wu, Yuxiang
    Gao, Zhifeng
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2194 - 2195
  • [2] Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder
    Csapo, Tamas Gabor
    Al-Radhi, Mohammed Salah
    Nemeth, Geza
    Gosztolya, Gabor
    Grosz, Tamas
    Toth, Laszlo
    Marko, Alexandra
    [J]. INTERSPEECH 2019, 2019, : 894 - 898
  • [3] Eigentongue feature extraction for an ultrasound-based silent speech interface
    Hueber, T.
    Aversano, G.
    Chollet, G.
    Denby, B.
    Dreyfus, G.
    Oussar, Y.
    Roussel, P.
    Stone, M.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, : 1245 - +
  • [4] Silent vs Vocalized Articulation for a Portable Ultrasound-Based Silent Speech Interface
    Florescu, Victoria-M
    Crevier-Buchman, Lise
    Denby, Bruce
    Hueber, Thomas
    Colazo-Simon, Antonia
    Pillot-Loiseau, Claire
    Roussel, Pierre
    Gendrot, Cedric
    Quattrocchi, Sophie
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 450 - +
  • [5] Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces
    Shandiz, Amin Honarmandi
    Toth, Laszlo
    Gosztolya, Gabor
    Marko, Alexandra
    Csapo, Tamas Gabor
    [J]. INTERSPEECH 2021, 2021, : 1932 - 1936
  • [6] Statistical Mapping between Articulatory and Acoustic Data for an Ultrasound-based Silent Speech Interface
    Hueber, Thomas
    Benaroya, Elie-Laurent
    Denby, Bruce
    Chollet, Gerard
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 600 - +
  • [7] Ultrasound-Based Detection of Lung Abnormalities Using Single Shot Detection Convolutional Neural Networks
    Kulhare, Sourabh
    Zheng, Xinliang
    Mehanian, Courosh
    Gregory, Cynthia
    Zhu, Meihua
    Gregory, Kenton
    Xie, Hua
    Jones, James McAndrew
    Wilson, Benjamin
    [J]. SIMULATION, IMAGE PROCESSING, AND ULTRASOUND SYSTEMS FOR ASSISTED DIAGNOSIS AND NAVIGATION, 2018, 11042 : 65 - 73
  • [8] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
    Lim, Wootaek
    Jang, Daeyoung
    Lee, Taejin
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [9] Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms
    Qamhan, Mustafa A.
    Meftah, Ali H.
    Selouani, Sid-Ahmed
    Alotaibi, Yousef A.
    Zakariah, Mohammed
    Seddiq, Yasser Mohammad
    [J]. 2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2020,
  • [10] SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks
    Kimura, Naoki
    Kono, Michinari
    Rekimoto, Jun
    [J]. CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,