A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks

被引:37
|
作者
Li, Bo [1 ]
Sim, Khe Chai [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore
关键词
Deep neural network; noise robustness; spectral masking;
D O I
10.1109/TASLP.2014.2329237
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Improving the noise robustness of automatic speech recognition systems has been a challenging task for many years. Recently, it was found that Deep Neural Networks (DNNs) yield large performance gains over conventional GMM-HMM systems, when used in both hybrid and tandem systems. However, they are still far from the level of human expectations especially under adverse environments. Motivated by the separation-prior-to-recognition process of the human auditory system, we propose a robust spectral masking system where power spectral domain masks are predicted using a DNN trained on the same filter-bank features used for acoustic modeling. To further improve performance, Linear Input Network (LIN) adaptation is applied to both the mask estimator and the acousticmodel DNNs. Since the estimation of LINs for the mask estimator requires stereo data, which is not available during testing, we proposed using the LINs estimated for the acoustic model DNNs to adapt the mask estimators. Furthermore, we used the same set of weights obtained from pre-training for the input layers of both the mask estimator and the acoustic model DNNs to ensure a better consistency for sharing LINs. Experimental results on benchmark Aurora2 and Aurora4 tasks demonstrated the effectiveness of our system, which yielded Word Error Rates (WERs) of 4.6% and 11.8% respectively. Furthermore, the simple averaging of posteriors from systems with and without spectral masking can further reduce the WERs to 4.3% on Aurora2 and 11.4% on Aurora4.
引用
收藏
页码:1296 / 1305
页数:10
相关论文
共 50 条
  • [1] An engineering model of the masking for the noise-robust speech recognition
    Park, KY
    Lee, SY
    [J]. NEUROCOMPUTING, 2003, 52-4 : 615 - 620
  • [2] Deep Maxout Networks Applied to Noise-Robust Speech Recognition
    de-la-Calle-Silos, F.
    Gallardo-Antolin, A.
    Pelaez-Moreno, C.
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 109 - 118
  • [3] EXPLOITING SYNCHRONY SPECTRA AND DEEP NEURAL NETWORKS FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
    Ma, Ning
    Marxer, Ricard
    Barker, Jon
    Brown, Guy J.
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 490 - 495
  • [4] An Efficient Noise-Robust Automatic Speech Recognition System using Artificial Neural Networks
    Gupta, Santosh
    Bhurchandi, Kishor M.
    Keskar, Avinash G.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 1873 - 1877
  • [5] AN IDEAL HIDDEN-ACTIVATION MASK FOR DEEP NEURAL NETWORKS BASED NOISE-ROBUST SPEECH RECOGNITION
    Li, Bo
    Sim, Khe Chai
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] Noise-robust speech recognition in mobile network based on convolution neural networks
    Bouchakour, Lallouani
    Debyeche, Mohamed
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 269 - 277
  • [7] Noise-robust speech recognition in mobile network based on convolution neural networks
    Lallouani Bouchakour
    Mohamed Debyeche
    [J]. International Journal of Speech Technology, 2022, 25 : 269 - 277
  • [8] Novel frequency masking curves for noise-robust automatic speech recognition
    Chen, Chia-Ping
    Yeh, Ja-Zang
    Wu, Bo-Feng
    [J]. JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2013, 36 (06) : 696 - 703
  • [9] AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION
    Seltzer, Michael L.
    Yu, Dong
    Wang, Yongqiang
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7398 - 7402
  • [10] Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks
    Valentini-Botinhao, Cassia
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 352 - 356