Speech Enhancement for Noise-Robust Speech Synthesis using Wasserstein GAN

被引:11
|
作者
Adiga, Nagaraj [1 ]
Pantazis, Yannis [2 ]
Tsiaras, Vassilis [1 ]
Stylianou, Yannis [1 ]
机构
[1] Univ Crete, Dept Comp Sci, Iraklion, Greece
[2] FORTH, Inst Appl & Computat Math, Iraklion, Greece
来源
关键词
Wasserstein GAN; Speech Enhancement; Gated activation; WaveNet Vocoder; Speech Synthesis;
D O I
10.21437/Interspeech.2019-2648
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The quality of speech synthesis systems can be significantly deteriorated by the presence of background noise in the recordings. Despite the existence of speech enhancement techniques for effectively suppressing additive noise under low signal-to-noise (SNR) conditions, these techniques have been neither designed nor tested in speech synthesis tasks where background noise has relatively lower energy. In this paper, we propose a speech enhancement technique based on generative adversarial networks (GANs) which acts as a preprocessing step of speech synthesis. Motivated by the speech enhancement generative adversarial network (SEGAN) approach and recent advances in deep learning, we propose to use Wasserstein GAN (WGAN) with gradient penalty and gated activation functions to the autoencoder network of SEGAN. We studied the impact of the proposed method on a data set consisting of 28 speakers and different noise types with 3 different SNR level. The effectiveness of the proposed method in the context of speech synthesis is demonstrated through the training of WaveNet vocoder. We compare our method against SEGAN. Both subjective and objective metrics confirm that the proposed speech enhancement approach outperforms SEGAN in terms of speech synthesis quality.
引用
收藏
页码:1821 / 1825
页数:5
相关论文
共 50 条
  • [1] Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks
    Valentini-Botinhao, Cassia
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 352 - 356
  • [2] FLEXIBLE MULTICHANNEL SPEECH ENHANCEMENT FOR NOISE-ROBUST FRONTEND
    Jukic, Ante
    Balam, Jagadeesh
    Ginsburg, Boris
    [J]. 2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [3] Noise-robust speech triage
    Bartos, Anthony L.
    Cipr, Tomas
    Nelson, Douglas J.
    Schwarz, Petr
    Banowetz, John
    Jerabek, Ladislav
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (04): : 2313 - 2320
  • [4] Noise-Robust speech recognition of Conversational Telephone Speech
    Chen, Gang
    Tolba, Hesham
    O'Shaughnessy, Douglas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
  • [5] A speech emphasis method for noise-robust speech recognition by using repetitive phrase
    Hirai, Takanori
    Kuroiwa, Shingo
    Tsuge, Satoru
    Ren, Fuji
    Fattah, Mohamed Abdel
    [J]. 2006 10TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2006, : 1269 - +
  • [6] Speech Enhancement Based on Teacher-Student Deep Learning Using Improved Speech Presence Probability for Noise-Robust Speech Recognition
    Tu, Yan-Hui
    Du, Jun
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2080 - 2091
  • [7] Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition
    Woo Lee, Geon
    Kook Kim, Hong
    Kong, Duk-Jo
    [J]. IEEE ACCESS, 2024, 12 : 72707 - 72720
  • [8] A neural network approach for speech enhancement and noise-robust bandwidth extension
    Hao, Xiang
    Xu, Chenglin
    Zhang, Chen
    Xie, Lei
    [J]. COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [9] V-Speech: NOISE-ROBUST SPEECH CAPTURING GLASSES USING VIBRATION SENSORS
    Maruri, Hector A. Cordourier
    Lopez-Meyer, Paulo
    Huang, Jonathan
    Beltman, Willem
    Nachman, Lama
    Lu, Hong
    [J]. GETMOBILE-MOBILE COMPUTING & COMMUNICATIONS REVIEW, 2020, 24 (02) : 18 - 24
  • [10] A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition
    Zhu, Qiu-Shi
    Zhang, Jie
    Zhang, Zi-Qiang
    Dai, Li-Rong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1927 - 1939