Binaural Deep Neural Network for Noise Robust Automatic Speech Recognition

被引:0
|
作者
Jiang, Yi [1 ]
Zu, Yuan-Yuan [1 ]
机构
[1] Quartermaster Equipment Res Inst, Beijing, Peoples R China
关键词
Deep Neural Network (DNN); Computational Auditory Scene Analysis (CASA); Automatic Speech Recognition (ASR); Ideal Parameter Mask;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Robust automatic speech recognition (ASR) is a challenge task, especially in noisy environments. The difference between the clean training speech model and the noisy speech model is a main factor to reduce the performance of ASR systems. The goal of a robust ASR system is getting the target speech energy distribution, which provides the discriminate information for the acoustic model. We use a binaural deep neural network (DNN) to estimate the energy of the target speech in the mixture through SNR estimation. Then the estimated target speech is used as the input of a convenient ASR system to improve the recognition accuracy. We use the ideal parameter mask as the DNN training goal, and cross entropy as the training cost function. Experiments show the robust ASR performance of the proposed algorithm with various signal to noise ratio conditions.
引用
收藏
页码:512 / 517
页数:6
相关论文
共 50 条
  • [41] Noise-robust speech recognition in mobile network based on convolution neural networks
    Bouchakour, Lallouani
    Debyeche, Mohamed
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 269 - 277
  • [42] Issues with uncertainty decoding for noise robust automatic speech recognition
    Liao, H.
    Gales, M. J. F.
    SPEECH COMMUNICATION, 2008, 50 (04) : 265 - 277
  • [43] JOINT NOISE ADAPTIVE TRAINING FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Narayanan, Arun
    Wang, DeLiang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [44] Deep Neural Network Driven Binaural Audio Visual Speech Separation
    Gogate, Mandar
    Dashtipour, Kia
    Bell, Peter
    Hussain, Amir
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [45] BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH
    Menon, Anjali
    Kim, Chanwoo
    Kurokawa, Umpei
    Stern, Richard M.
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 24 - 31
  • [46] A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK
    Fan, Nana
    Du, Jun
    Dai, Li-Rona
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [47] Stimulated Deep Neural Network for Speech Recognition
    Wu, Chunyang
    Karanasou, Penny
    Gales, Mark J. F.
    Sim, Khe Chai
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 400 - 404
  • [48] RECURRENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Weng, Chao
    Yu, Dong
    Watanabe, Shinji
    Juang, Biing-Hwang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [49] Noise Robust Exemplar Matching for Speech Enhancement: Applications to Automatic Speech Recognition
    Yilmaz, Emre
    Baby, Deepak
    Van Hannne, Hugo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 688 - 692
  • [50] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642