EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning

被引：5

作者：

Chen, Li-Chin ^{[1
]}

Chen, Po-Hsun ^{[2
]}

Tsai, Richard Tzong-Han ^{[2
,3
]}

Tsao, Yu ^{[1
]}

机构：

[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 115, Taiwan

[2] Natl Cent Univ, Dept Comp Sci & Informat Engn, Taoyuan, Taiwan

[3] Acad Sinica, Ctr GIS, Res Ctr Humanitiesand Social Sci, Taipei 115, Taiwan

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

关键词：

Speech synthesis; model fusion; electropalatography; speech signal; speech generation; NOISE; ALGORITHM; QUALITY; SYSTEM;

D O I：

10.1109/LSP.2022.3184636

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement.

引用

页码：2582 / 2586

页数：5

共 50 条

[1] Enhancement of Assamese Speech Signals Using Learning Based Techniques
Sharma, Mridusmita
Sarma, Kandarpa Kumar
[J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2021, 14 (05): : 100 - +
[2] Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting
Agrawal, Purvi
Ganapathy, Sriram
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2823 - 2836
[3] Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks
Hou, Jen-Cheng
Wang, Syu-Siang
Lai, Ying-Hui
Tsao, Yu
Chang, Hsiu-Wen
Wang, Hsin-Min
[J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2018, 2 (02): : 117 - 128
[4] Audio-Visual Speech Enhancement using Hierarchical Extreme Learning Machine
Hussain, Tassadaq
Tsao, Yu
Wang, Hsin-Min
Wang, Jia-Ching
Siniscalchi, Sabato Marco
Liao, Wen-Hung
[J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[5] Lip landmark-based audio-visual speech enhancement with multimodal feature fusion network
Li, Yangke
Zhang, Xinman
[J]. NEUROCOMPUTING, 2023, 549
[6] An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
Michelsanti, Daniel
Tan, Zheng-Hua
Zhang, Shi-Xiong
Xu, Yong
Yu, Meng
Yu, Dong
Jensen, Jesper
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1368 - 1396
[7] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
Su, Rongfeng
Wang, Lan
Liu, Xunying
[J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
[8] An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks
Jolad B.
Khanai R.
[J]. International Journal of Speech Technology, 2023, 26 (02) : 287 - 305
[9] Using Twin-HMM-Based Audio-Visual Speech Enhancement as a Front-End for Robust Audio-Visual Speech Recognition
Abdelaziz, Ahmed Hussen
Zeiler, Steffen
Kolossa, Dorothea
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 867 - 871
[10] Live Demonstration: Cloud-based Audio-Visual Speech Enhancement in Multimodal Hearing-aids
Bishnu, Abhijeet
Gupta, Ankit
Gogate, Mandar
Dashtipour, Kia
Arslan, Tughrul
Adeel, Ahsan
Hussain, Amir
Sellathurai, Mathini
Ratnarajah, Tharmalingam
[J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,

← 1 2 3 4 5 →