EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning

被引:5
|
作者
Chen, Li-Chin [1 ]
Chen, Po-Hsun [2 ]
Tsai, Richard Tzong-Han [2 ,3 ]
Tsao, Yu [1 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 115, Taiwan
[2] Natl Cent Univ, Dept Comp Sci & Informat Engn, Taoyuan, Taiwan
[3] Acad Sinica, Ctr GIS, Res Ctr Humanitiesand Social Sci, Taipei 115, Taiwan
关键词
Speech synthesis; model fusion; electropalatography; speech signal; speech generation; NOISE; ALGORITHM; QUALITY; SYSTEM;
D O I
10.1109/LSP.2022.3184636
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement.
引用
收藏
页码:2582 / 2586
页数:5
相关论文
共 50 条
  • [1] Enhancement of Assamese Speech Signals Using Learning Based Techniques
    Sharma, Mridusmita
    Sarma, Kandarpa Kumar
    [J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2021, 14 (05): : 100 - +
  • [2] Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting
    Agrawal, Purvi
    Ganapathy, Sriram
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2823 - 2836
  • [3] Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks
    Hou, Jen-Cheng
    Wang, Syu-Siang
    Lai, Ying-Hui
    Tsao, Yu
    Chang, Hsiu-Wen
    Wang, Hsin-Min
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2018, 2 (02): : 117 - 128
  • [4] Audio-Visual Speech Enhancement using Hierarchical Extreme Learning Machine
    Hussain, Tassadaq
    Tsao, Yu
    Wang, Hsin-Min
    Wang, Jia-Ching
    Siniscalchi, Sabato Marco
    Liao, Wen-Hung
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [5] Lip landmark-based audio-visual speech enhancement with multimodal feature fusion network
    Li, Yangke
    Zhang, Xinman
    [J]. NEUROCOMPUTING, 2023, 549
  • [6] An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
    Michelsanti, Daniel
    Tan, Zheng-Hua
    Zhang, Shi-Xiong
    Xu, Yong
    Yu, Meng
    Yu, Dong
    Jensen, Jesper
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1368 - 1396
  • [7] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
    Su, Rongfeng
    Wang, Lan
    Liu, Xunying
    [J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
  • [8] An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks
    Jolad B.
    Khanai R.
    [J]. International Journal of Speech Technology, 2023, 26 (02) : 287 - 305
  • [9] Using Twin-HMM-Based Audio-Visual Speech Enhancement as a Front-End for Robust Audio-Visual Speech Recognition
    Abdelaziz, Ahmed Hussen
    Zeiler, Steffen
    Kolossa, Dorothea
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 867 - 871
  • [10] Live Demonstration: Cloud-based Audio-Visual Speech Enhancement in Multimodal Hearing-aids
    Bishnu, Abhijeet
    Gupta, Ankit
    Gogate, Mandar
    Dashtipour, Kia
    Arslan, Tughrul
    Adeel, Ahsan
    Hussain, Amir
    Sellathurai, Mathini
    Ratnarajah, Tharmalingam
    [J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,