EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning

被引:5
|
作者
Chen, Li-Chin [1 ]
Chen, Po-Hsun [2 ]
Tsai, Richard Tzong-Han [2 ,3 ]
Tsao, Yu [1 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 115, Taiwan
[2] Natl Cent Univ, Dept Comp Sci & Informat Engn, Taoyuan, Taiwan
[3] Acad Sinica, Ctr GIS, Res Ctr Humanitiesand Social Sci, Taipei 115, Taiwan
关键词
Speech synthesis; model fusion; electropalatography; speech signal; speech generation; NOISE; ALGORITHM; QUALITY; SYSTEM;
D O I
10.1109/LSP.2022.3184636
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement.
引用
收藏
页码:2582 / 2586
页数:5
相关论文
共 50 条
  • [41] A rough set theory and deep learning-based predictive system for gender recognition using audio speech
    Yasmin, Ghazaala
    Das, Asit Kumar
    Nayak, Janmenjoy
    Vimal, S.
    Dutta, Soumi
    [J]. SOFT COMPUTING, 2022,
  • [42] A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement
    He, Mao-Kui
    Du, Jun
    Wang, Zi-Rui
    Sun, Lei
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1228 - 1232
  • [43] Parkinson's Detection Using RNN-Graph-LSTM with Optimization Based on Speech Signals
    Almasoud, Ahmed S.
    Eisa, Taiseer Abdalla Elfadil
    Al-Wesabi, Fahd N.
    Elsafi, Abubakar
    Al Duhayyim, Mesfer
    Yaseen, Ishfaq
    Hamza, Manar Ahmed
    Motwakel, Abdelwahed
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 72 (01): : 871 - 886
  • [44] 5G-IoT Cloud based Demonstration of Real-Time Audio-Visual Speech Enhancement for Multimodal Hearing-aids
    Gupta, Ankit
    Bishnu, Abhijeet
    Gogate, Mandar
    Dashtipour, Kia
    Arslan, Tughrul
    Adeel, Ahsan
    Hussain, Amir
    Ratnarajah, Tharmalingam
    Sellathurai, Mathini
    [J]. INTERSPEECH 2023, 2023, : 686 - 687
  • [45] Emotion Recognition from Children Speech Signals Using Attention Based Time Series Deep Learning
    Cao, Guitao
    Tang, Yunming
    Sheng, Jiyu
    Cao, Wenming
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1296 - 1300
  • [46] Automatic Speech Activity Recognition from MEG Signals Using Seq2Seq Learning
    Dash, Debadatta
    Ferrari, Paul
    Malik, Saleem
    Wang, Jun
    [J]. 2019 9TH INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING (NER), 2019, : 340 - 343
  • [47] Improving the Performance of Deep Learning Based Speech Enhancement System Using Fuzzy Restricted Boltzmann Machine
    Samui, Suman
    Chakrabarti, Indrajit
    Ghosh, Soumya K.
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 : 534 - 542
  • [48] Speaker source localization using audio-visual data and array processing based speech enhancement for in-vehicle environments
    Zhang, Xianxian
    Hansen, John H. L.
    Takeda, Kazuya
    Mae, Toshiki
    Arehart, Kathryn
    [J]. ADVANCES FOR IN-VEHICLE AND MOBILE SYSTEMS: CHALLENGES FOR INTERNATIONAL STANDARDS, 2007, : 123 - 140
  • [49] Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep Learning-Based Speech Enhancement
    Chai, Li
    Du, Jun
    Liu, Qing-Feng
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 1919 - 1931
  • [50] iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning
    Li, Haoyu
    Fu, Szu-Wei
    Tsao, Yu
    Yamagishi, Junichi
    [J]. INTERSPEECH 2020, 2020, : 1336 - 1340