Deep Neural Network Based Speech Separation for Robust Speech Recognition

被引:0
|
作者
Tu Yanhui [1 ]
Jun, Du [1 ]
Xu Yong [1 ]
Dai Lirong [1 ]
Chin-Hui, Lee [2 ]
机构
[1] Univ Sci & Technol China, Shanghai, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
single-channel speech separation; robust speech recognition; deep neural networks; semi-supervised mode; ALGORITHM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, a novel deep neural network (DNN) architecture is proposed to generate the speech features of both the target speaker and interferer for speech separation without using any prior information about the interfering speaker. DNN is adopted here to directly model the highly nonlinear relationship between speech features of the mixed signals and the two competing speakers. Experimental results on a monaural speech separation and recognition challenge task show that the proposed DNN framework enhances the separation performance in terms of different objective measures under the semi-supervised mode where the training data of the target speaker is provided while the unseen interferer in the separation stage is predicted by using multiple interfering speakers mixed with the target speaker in the training stage. Furthermore, as a preprocessing step in the testing stage for robust speech recognition, our speech separation approach can achieve significant improvements of the recognition accuracy over the baseline system with no source separation.
引用
收藏
页码:532 / 536
页数:5
相关论文
共 50 条
  • [41] Robust speech recognition by integrating speech separation and hypothesis testing
    Srinivasan, S
    Wang, DL
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 89 - 92
  • [42] Robust speech recognition by integrating speech separation and hypothesis testing
    Srinivasan, Soundararajan
    Wang, DeLiang
    [J]. SPEECH COMMUNICATION, 2010, 52 (01) : 72 - 81
  • [43] Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-channel Speech Recognition
    Li, Guanjun
    Liang, Shan
    Nie, Shuai
    Liu, Wenju
    Yang, Zhanlei
    Xiao, Longshuai
    [J]. INTERSPEECH 2020, 2020, : 51 - 55
  • [44] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
    Du, Zhihao
    Han, Jiqing
    Zhang, Xueliang
    [J]. INTERSPEECH 2020, 2020, : 309 - 313
  • [45] Research on Speech Emotion Recognition Technology based on Deep and Shallow Neural Network
    Wang, Jian
    Han, Zhiyan
    [J]. PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 3555 - 3558
  • [46] Deep Neural Network Frontend for Continuous EMG-based Speech Recognition
    Wand, Michael
    Schmidhuber, Jurgen
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3032 - 3036
  • [47] An Improved Tibetan Lhasa Speech Recognition Method Based on Deep Neural Network
    Ruan, Wenbin
    Gan, Zhenye
    Liu, Bin
    Guo, Yin
    [J]. 2017 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION (ICICTA 2017), 2017, : 303 - 306
  • [48] Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition
    Xue, Shaofei
    Abdel-Hamid, Ossama
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1713 - 1725
  • [49] Employing Robust Principal Component Analysis for Noise-Robust Speech Feature Extraction in Automatic Speech Recognition with the Structure of a Deep Neural Network
    Hung, Jeih-weih
    Lin, Jung-Shan
    Wu, Po-Jen
    [J]. APPLIED SYSTEM INNOVATION, 2018, 1 (03) : 1 - 14
  • [50] Deep Neural Network Based Speech Recognition Systems Under Noise Perturbations
    An, Qiyuan
    Bai, Kangjun
    Zhang, Moqi
    Yi, Yang
    Liu, Yifang
    [J]. PROCEEDINGS OF THE TWENTYFIRST INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2020), 2020, : 377 - 382