Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network

被引：0

作者：

Nossier, Soha A. ^{[1
]}

Wall, Julie ^{[1
]}

Moniri, Mansour ^{[1
]}

Glackin, Cornelius ^{[2
]}

Cannings, Nigel ^{[2
]}

机构：

[1] Univ East London, Dept Comp Sci & Digital Technol, London, England

[2] Intelligent Voice Ltd, London, England

来源：

2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI | 2023年

关键词：

Automatic speech recognition; deep learning; generative adversarial network; speech distortion; speech enhancement;

D O I：

10.1109/ICTAI59109.2023.00087

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech enhancement is an essential preprocessing stage for automatic speech recognition in noisy conditions; however, the distortion caused by the denoising process may lead to degradation in automatic speech recognition performance. This paper presents a deep learning-based speech enhancement architecture to overcome this issue by applying a second-stage network that deals with distortion noise. Moreover, a signal-to-noise ratio binary classifier is implemented to activate the speech enhancement network for intrusive noise environments only, which improves the overall performance. The proposed architecture outperforms powerful models in the literature, as it improves a challenging noisy speech test set by 0.8 and 5.9% improvement in the quality and intelligibility scores, respectively. Furthermore, the architecture improves the performance of automatic speech recognition with a 13.8% reduction in the word error rate at 0 dB signal-to-noise ratio. Finally, the second-stage network was proven to improve the performance of first-stage speech enhancement models, not previously seen in the training process.

引用

页码：546 / 552

页数：7

共 50 条

[1] SEGAN: Speech Enhancement Generative Adversarial Network
Pascual, Santiago
Bonafonte, Antonio
Serra, Joan
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3642 - 3646
[2] EXPLORING SPEECH ENHANCEMENT WITH GENERATIVE ADVERSARIAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Donahue, Chris
Li, Bo
Prabhavalkar, Rohit
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5024 - 5028
[3] VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
Xu, Xinmeng
Wang, Yang
Xu, Dongxiang
Peng, Yiyuan
Zhang, Cong
Jia, Jie
Chen, Binbin
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7307 - 7311
[4] GSC Based Speech Enhancement with Generative Adversarial Network
Zhou, Yao
Bao, Changchun
Cheng, Rui
[J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 901 - 906
[5] Speech Enhancement Using Generative Adversarial Network (GAN)
Huq, Mahmudul
Maskeliunas, Rytis
[J]. HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 273 - 282
[6] On Enhancing Speech Emotion Recognition using Generative Adversarial Networks
Sahu, Saurabh
Gupta, Rahul
Espy-Wilson, Carol
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3693 - 3697
[7] Multi-Stage Speech Enhancement for Automatic Speech Recognition
Lee, Seungyeol
Lee, Youngwoo
Cho, Namgook
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
[8] SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
Huy Phan
Nguyen, Huy Le
Chen, Oliver Y.
Koch, Philipp
Duong, Ngoc Q. K.
McLoughlin, Ian
Mertins, Alfred
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7103 - 7107
[9] Enhancement of Alaryngeal Speech using Generative Adversarial Network (GAN)
Huq, Mahmudul
[J]. 2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
[10] LANGUAGE AND NOISE TRANSFER IN SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
Pascual, Santiago
Park, Maruchan
Serra, Joan
Bonafonte, Antonio
Ahn, Kang-Hun
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5019 - 5023

← 1 2 3 4 5 →