Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

被引：19

作者：

Kim, Geonmin ^{[1
]}

Lee, Hwaran ^{[1
]}

Kim, Bo-Kyeong ^{[1
]}

Oh, Sang-Hoon ^{[2
]}

Lee, Soo-Young ^{[3
]}

机构：

[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 305701, South Korea

[2] Mokwon Univ, Div Informat & Commun Convergence Engn, Daejeon 302318, South Korea

[3] Korea Adv Inst Sci & Technol, Inst Artificial Intelligence, Daejeon 305701, South Korea

来源：

IEEE SIGNAL PROCESSING LETTERS | 2019年 / 26卷 / 01期

关键词：

Speech enhancement; room simulator; connectionist temporal classification; generative adversarial network;

D O I：

10.1109/LSP.2018.2880285

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Many speech enhancement methods try to learn the relationship between noisy and clean speechs, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of this letter is to propose an alternative learning algorithm, called acoustic and adversarial supervision (AAS). AAS makes the enhanced output both maximizing the likelihood of transcription on the pre-trained acoustic model and having general characteristics of clean speech, which improve generalization on unseen noisy speeches. We employ the connectionist temporal classification and the unpaired conditional boundary equilibrium generative adversarial network as the loss function of AAS. AAS is tested on two datasets including additive noise without and with reverberation, Librispeech + DEMAND, and CHiME-4. By visualizing the enhanced speech with different loss combinations, we demonstrate the role of each supervision. AAS achieves a lower word error rate than other state-of-the-art methods using the clean speech target in both datasets.

引用

页码：159 / 163

页数：5

共 50 条

[1] EXPLORING SPEECH ENHANCEMENT WITH GENERATIVE ADVERSARIAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Donahue, Chris
Li, Bo
Prabhavalkar, Rohit
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5024 - 5028
[2] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
Du, Zhihao
Han, Jiqing
Zhang, Xueliang
[J]. INTERSPEECH 2020, 2020, : 309 - 313
[3] Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
Leem, Seong-Gyun
Fulford, Daniel
Onnela, Jukka-Pekka
Gard, David
Busso, Carlos
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 917 - 929
[4] Speech Enhancement Based on Masking Approach Considering Speech Quality and Acoustic Confidence for Noisy Speech Recognition
Chu, Shih-Chuan
Wu, Chung-Hsien
Lin, Yun-Wen
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 536 - 540
[5] Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network
Nossier, Soha A.
Wall, Julie
Moniri, Mansour
Glackin, Cornelius
Cannings, Nigel
[J]. 2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 546 - 552
[6] Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech
Chen, Li-Wei
Lee, Hung-Yi
Tsao, Yu
[J]. INTERSPEECH 2019, 2019, : 719 - 723
[7] NETWORKS FOR SPEECH ENHANCEMENT AND AUTOMATIC SPEECH RECOGNITION
Vu, Thanh T.
Bigot, Benjamin
Chng, Eng Siong
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 499 - 503
[8] SPEECH ENHANCEMENT FOR TELEPHONY NAME SPEECH RECOGNITION
You, Chang Huai
Rahardja, Susanto
Li, Haizhou
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 973 - 976
[9] β-Masking MMSE Speech Enhancement for Speech Recognition
You, Chang Huai
Ma, Bin
[J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 341 - 345
[10] Noisy speech recognition based on speech enhancement
Wang, Xia
Tang, Hongmei
Zhao, Xiaoqun
[J]. SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 713 - +

← 1 2 3 4 5 →