CASA Based Speech Separation for Robust Speech Recognition

被引:0
|
作者
Han Runqiang [1 ]
Zhao Pei [1 ]
Gao Qin [1 ]
Zhang Zhiping [1 ]
Wu Hao [1 ]
Wu Xihong [1 ]
机构
[1] Peking Univ, Natl Lab Machine Percept, Speech & Hearing Res Ctr, Beijing 100871, Peoples R China
关键词
speech separation; CASA; speaker recognition; pitch tracking; units grouping; reconstruction;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a speech separation system as a front-end processing step for automatic speech recognition (ASR). It employs computational auditory scene analysis (CASA) to separate the target speech from the interference speech. Specifically, the mixed speech is preprocessed based on auditory peripheral model. Then a pitch tracking is conducted and the dominant pitch is used as a main cue to find the target speech. Next, the time frequency (TF) units are merged into many segments. These segments are then combined into streams via CASA initial grouping. A regrouping strategy is employed to refine these streams via amplitude modulate (AM) cues, which are finally organized by the speaker recognition techniques into corresponding speakers. Finally, the output streams are reconstructed to compensate the missing data in the abovementioned processing steps by a cluster based feature reconstruction. Experimental results of ASR show that at low TMR (<-6dB) the proposed method offers significantly higher recognition accuracy.
引用
收藏
页码:77 / 80
页数:4
相关论文
共 50 条
  • [1] Monaural speech separation based on MAXVQ and CASA for robust speech recognition
    Li, Peng
    Guan, Yong
    Wang, Shijin
    Xu, Bo
    Liu, Wenju
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 30 - 44
  • [2] Deep Neural Network Based Speech Separation for Robust Speech Recognition
    Tu Yanhui
    Jun, Du
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536
  • [3] Robust speech recognition by integrating speech separation and hypothesis testing
    Srinivasan, S
    Wang, DL
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 89 - 92
  • [4] Robust speech recognition by integrating speech separation and hypothesis testing
    Srinivasan, Soundararajan
    Wang, DeLiang
    [J]. SPEECH COMMUNICATION, 2010, 52 (01) : 72 - 81
  • [5] Speech Separation and Recognition Using CASA Segmentation and Language-Based Grouping
    Karpukhin, Ivan
    Konushin, Anton
    [J]. ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7650 - 7654
  • [6] Blind speech separation of nonlinear convolutive mixtures for robust speech recognition
    Koutras, A.
    Dermatas, E.
    Kokkinakis, G.
    [J]. Control and Intelligent Systems, 2002, 30 (02) : 83 - 90
  • [7] A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition
    Morales-Cordovilla, Juan A.
    Cabanas-Molero, Pablo
    Peinado, Antonio M.
    Sanchez, Victoria
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 197 - 206
  • [8] SPEECH SEPARATION FOR SPEECH RECOGNITION
    DECHEVEIGNE, A
    HAWAHARA, H
    AIKAWA, K
    LEA, A
    [J]. JOURNAL DE PHYSIQUE IV, 1994, 4 (C5): : 545 - 548
  • [9] SPEECH SEPARATION BASED ON THE IMAGES ANALYSIS METHOD IN CASA
    Lin, Jie
    Fu, Bo
    [J]. 2012 INTERNATIONAL CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (LCWAMTIP), 2012, : 33 - 36
  • [10] Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition
    Narayanan, Arun
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 826 - 835