A Speech Enhancement Neural Network Architecture with SNR-Progressive Multi-Target Learning for Robust Speech Recognition

被引:0
|
作者
Zhou, Nan [1 ]
Du, Jun [1 ]
Tu, Yan-Hui [1 ]
Gao, Tian [2 ]
Lee, Chin-Hui [3 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] iFlytek Res, Hefei, Anhui, Peoples R China
[3] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
progressive ratio mask; progressively enhanced log-power spectra; progressive multi-targets; deep learning based speech enhancement; robust speech recognition; FRONT-END;
D O I
10.1109/apsipaasc47483.2019.9023157
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a pre-processing speech enhancement network architecture for noise-robust speech recognition by learning progressive multiple targets (PMTs). PMTs are represented by a series of progressive ratio masks (PRMs) and progressively enhanced log-power spectra (PELPS) targets at various layers based on different signal-to-noise-ratios (SNRs), attempting to make a tradeoff between reduced background noises and increased speech distortions. As a PMT implementation, long short-term memory (LSTM) is adopted at each network layer to progressively learn intermediate dual targets of both PRM and PELPS. Experiments on the CHiME-4 automatic speech recognition (ASR) task, when compared to unprocessed speech using multi-condition trained LSTM-based acoustic models without retraining, show that PRM-only as the learning target can achieve a relative word error rate (WER) reduction of 6.32% (from 27.68% to 25.93 %) averaging over the RealData evaluation set, while conventional ideal ration masks severely degrade the ASR performance. Moreover, the proposed LSTM-based PMT network, with the best configuration, outperforms the PRM-only model, with a relative WER reduction of 13.31 % (further down to 22.48%) averaging over the same test set.
引用
收藏
页码:873 / 877
页数:5
相关论文
共 50 条
  • [1] A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement
    Tu, Yan-Hui
    Du, Jun
    Gao, Tian
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1608 - 1619
  • [2] PROGRESSIVE MULTI-TARGET NETWORK BASED SPEECH ENHANCEMENT WITH SNR-PRESELECTION FOR ROBUST SPEAKER DIARIZATION
    Sun, Lei
    Du, Jun
    Zhang, Xueyang
    Gao, Tian
    Fang, Xin
    Lee, Chin-Hui
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7099 - 7103
  • [3] SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
    Gao, Tian
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3713 - 3717
  • [4] A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement
    Xia, Yangyang
    Stern, Richard M.
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3274 - 3278
  • [5] LOCAL TRAJECTORY BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION WITH DEEP NEURAL NETWORK
    You, Yongbin
    Qian, Yanmin
    Yu, Kai
    [J]. 2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 5 - 9
  • [6] A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement
    Zhang, Xiao-Qi
    Du, Jun
    Chai, Li
    Lee, Chin-Hui
    [J]. INTERSPEECH 2021, 2021, : 2701 - 2705
  • [7] REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION
    Shen, Yih-Liang
    Huang, Chao-Yuan
    Wang, Syu-Siang
    Tsao, Yu
    Wang, Hsin-Min
    Chi, Tai-Shih
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6750 - 6754
  • [8] Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
    Abdullah, Salinna
    Zamani, Majid
    Demosthenous, Andreas
    [J]. IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS, 2024, 5 : 141 - 152
  • [9] Multi-target ensemble learning based speech enhancement with temporal-spectral structured target
    Wang, Wenbo
    Guo, Weiwei
    Liu, Houguang
    Yang, Jianhua
    Liu, Songyong
    [J]. APPLIED ACOUSTICS, 2023, 205
  • [10] Deep Neural Network Based Speech Separation for Robust Speech Recognition
    Tu Yanhui
    Jun, Du
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536