Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network

被引:0
|
作者
Nossier, Soha A. [1 ]
Wall, Julie [1 ]
Moniri, Mansour [1 ]
Glackin, Cornelius [2 ]
Cannings, Nigel [2 ]
机构
[1] Univ East London, Dept Comp Sci & Digital Technol, London, England
[2] Intelligent Voice Ltd, London, England
关键词
Automatic speech recognition; deep learning; generative adversarial network; speech distortion; speech enhancement;
D O I
10.1109/ICTAI59109.2023.00087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech enhancement is an essential preprocessing stage for automatic speech recognition in noisy conditions; however, the distortion caused by the denoising process may lead to degradation in automatic speech recognition performance. This paper presents a deep learning-based speech enhancement architecture to overcome this issue by applying a second-stage network that deals with distortion noise. Moreover, a signal-to-noise ratio binary classifier is implemented to activate the speech enhancement network for intrusive noise environments only, which improves the overall performance. The proposed architecture outperforms powerful models in the literature, as it improves a challenging noisy speech test set by 0.8 and 5.9% improvement in the quality and intelligibility scores, respectively. Furthermore, the architecture improves the performance of automatic speech recognition with a 13.8% reduction in the word error rate at 0 dB signal-to-noise ratio. Finally, the second-stage network was proven to improve the performance of first-stage speech enhancement models, not previously seen in the training process.
引用
收藏
页码:546 / 552
页数:7
相关论文
共 50 条
  • [1] SEGAN: Speech Enhancement Generative Adversarial Network
    Pascual, Santiago
    Bonafonte, Antonio
    Serra, Joan
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3642 - 3646
  • [2] EXPLORING SPEECH ENHANCEMENT WITH GENERATIVE ADVERSARIAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Donahue, Chris
    Li, Bo
    Prabhavalkar, Rohit
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5024 - 5028
  • [3] VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
    Xu, Xinmeng
    Wang, Yang
    Xu, Dongxiang
    Peng, Yiyuan
    Zhang, Cong
    Jia, Jie
    Chen, Binbin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7307 - 7311
  • [4] GSC Based Speech Enhancement with Generative Adversarial Network
    Zhou, Yao
    Bao, Changchun
    Cheng, Rui
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 901 - 906
  • [5] Speech Enhancement Using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    Maskeliunas, Rytis
    [J]. HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 273 - 282
  • [6] On Enhancing Speech Emotion Recognition using Generative Adversarial Networks
    Sahu, Saurabh
    Gupta, Rahul
    Espy-Wilson, Carol
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3693 - 3697
  • [7] Multi-Stage Speech Enhancement for Automatic Speech Recognition
    Lee, Seungyeol
    Lee, Youngwoo
    Cho, Namgook
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
  • [8] SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
    Huy Phan
    Nguyen, Huy Le
    Chen, Oliver Y.
    Koch, Philipp
    Duong, Ngoc Q. K.
    McLoughlin, Ian
    Mertins, Alfred
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7103 - 7107
  • [9] Enhancement of Alaryngeal Speech using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    [J]. 2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
  • [10] LANGUAGE AND NOISE TRANSFER IN SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
    Pascual, Santiago
    Park, Maruchan
    Serra, Joan
    Bonafonte, Antonio
    Ahn, Kang-Hun
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5019 - 5023