Speech Enhancement for Noise-Robust Speech Synthesis using Wasserstein GAN

被引：11

作者：

Adiga, Nagaraj ^{[1
]}

Pantazis, Yannis ^{[2
]}

Tsiaras, Vassilis ^{[1
]}

Stylianou, Yannis ^{[1
]}

机构：

[1] Univ Crete, Dept Comp Sci, Iraklion, Greece

[2] FORTH, Inst Appl & Computat Math, Iraklion, Greece

来源：

INTERSPEECH 2019 | 2019年

关键词：

Wasserstein GAN; Speech Enhancement; Gated activation; WaveNet Vocoder; Speech Synthesis;

D O I：

10.21437/Interspeech.2019-2648

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

The quality of speech synthesis systems can be significantly deteriorated by the presence of background noise in the recordings. Despite the existence of speech enhancement techniques for effectively suppressing additive noise under low signal-to-noise (SNR) conditions, these techniques have been neither designed nor tested in speech synthesis tasks where background noise has relatively lower energy. In this paper, we propose a speech enhancement technique based on generative adversarial networks (GANs) which acts as a preprocessing step of speech synthesis. Motivated by the speech enhancement generative adversarial network (SEGAN) approach and recent advances in deep learning, we propose to use Wasserstein GAN (WGAN) with gradient penalty and gated activation functions to the autoencoder network of SEGAN. We studied the impact of the proposed method on a data set consisting of 28 speakers and different noise types with 3 different SNR level. The effectiveness of the proposed method in the context of speech synthesis is demonstrated through the training of WaveNet vocoder. We compare our method against SEGAN. Both subjective and objective metrics confirm that the proposed speech enhancement approach outperforms SEGAN in terms of speech synthesis quality.

引用

页码：1821 / 1825

页数：5

共 50 条

[1] Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks
Valentini-Botinhao, Cassia
Wang, Xin
Takaki, Shinji
Yamagishi, Junichi
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 352 - 356
[2] FLEXIBLE MULTICHANNEL SPEECH ENHANCEMENT FOR NOISE-ROBUST FRONTEND
Jukic, Ante
Balam, Jagadeesh
Ginsburg, Boris
[J]. 2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
[3] Noise-robust speech triage
Bartos, Anthony L.
Cipr, Tomas
Nelson, Douglas J.
Schwarz, Petr
Banowetz, John
Jerabek, Ladislav
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (04): : 2313 - 2320
[4] Noise-Robust speech recognition of Conversational Telephone Speech
Chen, Gang
Tolba, Hesham
O'Shaughnessy, Douglas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
[5] A speech emphasis method for noise-robust speech recognition by using repetitive phrase
Hirai, Takanori
Kuroiwa, Shingo
Tsuge, Satoru
Ren, Fuji
Fattah, Mohamed Abdel
[J]. 2006 10TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2006, : 1269 - +
[6] Speech Enhancement Based on Teacher-Student Deep Learning Using Improved Speech Presence Probability for Noise-Robust Speech Recognition
Tu, Yan-Hui
Du, Jun
Lee, Chin-Hui
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2080 - 2091
[7] Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition
Woo Lee, Geon
Kook Kim, Hong
Kong, Duk-Jo
[J]. IEEE ACCESS, 2024, 12 : 72707 - 72720
[8] A neural network approach for speech enhancement and noise-robust bandwidth extension
Hao, Xiang
Xu, Chenglin
Zhang, Chen
Xie, Lei
[J]. COMPUTER SPEECH AND LANGUAGE, 2025, 89
[9] V-Speech: NOISE-ROBUST SPEECH CAPTURING GLASSES USING VIBRATION SENSORS
Maruri, Hector A. Cordourier
Lopez-Meyer, Paulo
Huang, Jonathan
Beltman, Willem
Nachman, Lama
Lu, Hong
[J]. GETMOBILE-MOBILE COMPUTING & COMMUNICATIONS REVIEW, 2020, 24 (02) : 18 - 24
[10] A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition
Zhu, Qiu-Shi
Zhang, Jie
Zhang, Zi-Qiang
Dai, Li-Rong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1927 - 1939

← 1 2 3 4 5 →