Robust automatic speech recognition based on neural network in reverberant environments

被引:0
|
作者
Bai, L. [1 ]
Li, H. L. [1 ]
He, Y. Y. [1 ]
机构
[1] Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Beijing, Peoples R China
关键词
BOTTLE-NECK FEATURES;
D O I
暂无
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The reverberant environment is still a big challenge to speech recognition. This paper presents a method of reverberant Automatic Speech Recognition (ASR) using front-end based methods and enhanced Voice Activity Detection (VAD). A 2-channel dereverberation method is adopted to achieve robust dereverberation under different reverberant conditions. Also a 2-channel spectral enhancement method is used where the gain of each frequency bin is controlled by acoustic scene, which is detected based on the analysis of full-band coherent property. We also use Deep Neural Network (DNN) as a feature extractor, and a DNN based VAD is also used to improve the ASR performance. The DNN based front-end allows a very flexible integration of meta-information. Bottle neck features are extracted in place of MFCC features used in the HMM-GMM system. Finally, we evaluate our methods on the data provided by REVERB challenge. On simulated data, the performance yields more than 33% relative reduction in Word Error Rate (WER).
引用
收藏
页码:1319 / 1324
页数:6
相关论文
共 50 条
  • [1] Neural Network Front-ends Based Speech Recognition In Reverberant Environments
    Zhang, Zhen
    Li, Peng
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ELECTRONIC TECHNOLOGY, 2016, 48 : 213 - 218
  • [2] Model based feature enhancement for automatic speech recognition in reverberant environments
    Krueger, Alexander
    Haeb-Umbach, Reinhold
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1239 - 1242
  • [3] Methods for Robust Speech Recognition in Reverberant Environments: A Comparison
    Petrick, Rico
    Feher, Thomas
    Unoki, Masashi
    Hoffmann, Ruediger
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 582 - +
  • [4] Blind Model Selection for Automatic Speech Recognition in Reverberant Environments
    Laurent Couvreur
    Christophe Couvreur
    [J]. Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 189 - 203
  • [5] Blind model selection for automatic speech recognition in reverberant environments
    Couvreur, L
    Couvreur, C
    [J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 189 - 203
  • [6] AMPLITUDE MODULATION SPECTROGRAM BASED FEATURES FOR ROBUST SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
    Moritz, Niko
    Anemueller, Joern
    Kollmeier, Birger
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5492 - 5495
  • [7] Automatic Image and Speech Recognition Based on Neural Network
    Krol, Dariusz
    Szlachetko, Boguslaw
    [J]. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2010, 3 (02) : 1 - 17
  • [8] Binaural Deep Neural Network for Noise Robust Automatic Speech Recognition
    Jiang, Yi
    Zu, Yuan-Yuan
    [J]. INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND AUTOMATION (ICCEA 2014), 2014, : 512 - 517
  • [9] Robust Front End Processing for Speech Recognition in Reverberant Environments: Utilization of Speech Characteristics
    Petrick, Rico
    Lu, Xugang
    Unoki, Masashi
    Akagi, Masato
    Hoffmann, Ruediger
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 658 - +
  • [10] Deep Neural Network Based Speech Separation for Robust Speech Recognition
    Tu Yanhui
    Jun, Du
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536