Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

被引:19
|
作者
Kim, Geonmin [1 ]
Lee, Hwaran [1 ]
Kim, Bo-Kyeong [1 ]
Oh, Sang-Hoon [2 ]
Lee, Soo-Young [3 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 305701, South Korea
[2] Mokwon Univ, Div Informat & Commun Convergence Engn, Daejeon 302318, South Korea
[3] Korea Adv Inst Sci & Technol, Inst Artificial Intelligence, Daejeon 305701, South Korea
关键词
Speech enhancement; room simulator; connectionist temporal classification; generative adversarial network;
D O I
10.1109/LSP.2018.2880285
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many speech enhancement methods try to learn the relationship between noisy and clean speechs, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of this letter is to propose an alternative learning algorithm, called acoustic and adversarial supervision (AAS). AAS makes the enhanced output both maximizing the likelihood of transcription on the pre-trained acoustic model and having general characteristics of clean speech, which improve generalization on unseen noisy speeches. We employ the connectionist temporal classification and the unpaired conditional boundary equilibrium generative adversarial network as the loss function of AAS. AAS is tested on two datasets including additive noise without and with reverberation, Librispeech + DEMAND, and CHiME-4. By visualizing the enhanced speech with different loss combinations, we demonstrate the role of each supervision. AAS achieves a lower word error rate than other state-of-the-art methods using the clean speech target in both datasets.
引用
收藏
页码:159 / 163
页数:5
相关论文
共 50 条
  • [41] COMPARISON OF DIFFERENT SPEECH ENHANCEMENT METHODS ON RECOGNITION OF NOISY SPEECH
    AHMED, MS
    ALMARZOUG, AM
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 1994, 19 (01): : 45 - 56
  • [42] Multi-Stage Speech Enhancement for Automatic Speech Recognition
    Lee, Seungyeol
    Lee, Youngwoo
    Cho, Namgook
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
  • [43] Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition
    Avila, Anderson R.
    Alam, Jahangir
    O'Shaughnessy, Douglas
    Falk, Tiago H.
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3663 - 3667
  • [44] REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION
    Shen, Yih-Liang
    Huang, Chao-Yuan
    Wang, Syu-Siang
    Tsao, Yu
    Wang, Hsin-Min
    Chi, Tai-Shih
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6750 - 6754
  • [45] An Improved Switch Speech Enhancement Algorithm for Automatic Speech Recognition
    Ma, Yongbao
    Zhou, Yi
    Liu, Jingang
    Xia, Jie
    Liu, Hongqing
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 430 - 435
  • [46] Using Deep Speech Recognition to Evaluate Speech Enhancement Methods
    Siddiqui, Shamoon
    Rasool, Ghulam
    Ramachandran, Ravi P.
    Bouaynaya, Nidhal C.
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [47] Speech Enhancement Based on Spectral Subtraction for Speech Recognition System
    Han, Jung-woo
    Kim, Se-young
    Kim, Ki-man
    Jung, Ji-won
    Yun, Young
    [J]. IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE 2011), 2011, : 417 - 418
  • [48] UNSUPERVISED SPEECH ENHANCEMENT WITH SPEECH RECOGNITION EMBEDDING AND DISENTANGLEMENT LOSSES
    Viet Anh Trinh
    Braun, Sebastian
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 391 - 395
  • [49] SPEECH ENHANCEMENT AND FEATURES COMPENSATION ALGORITHMS FOR CONTINUOUS SPEECH RECOGNITION
    Arcos, Christian
    Grivet, Marco
    Alcaim, Abraham
    [J]. 2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 27 - 31
  • [50] Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
    Wang, Ke
    Zhang, Junbo
    Sun, Sining
    Wang, Yujun
    Xiang, Fei
    Xie, Lei
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1581 - 1585