Monaural speech separation based on MAXVQ and CASA for robust speech recognition

被引:32
|
作者
Li, Peng [1 ]
Guan, Yong [2 ]
Wang, Shijin [1 ]
Xu, Bo [1 ,2 ]
Liu, Wenju [2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
来源
COMPUTER SPEECH AND LANGUAGE | 2010年 / 24卷 / 01期
关键词
Monaural speech separation; Computational auditory scene analysis (CASA); Factorial-max vector quantization (MAXVQ); Automatic speech recognition (ASR); MAXIMUM-LIKELIHOOD-ESTIMATION; AUDITORY SCENE ANALYSIS; BIAS REMOVAL; NOISE; ADAPTATION;
D O I
10.1016/j.csl.2008.05.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:30 / 44
页数:15
相关论文
共 50 条
  • [1] CASA Based Speech Separation for Robust Speech Recognition
    Han Runqiang
    Zhao Pei
    Gao Qin
    Zhang Zhiping
    Wu Hao
    Wu Xihong
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 77 - 80
  • [2] Monaural speech separation and recognition challenge
    Cooke, Martin
    Hershey, John R.
    Rennie, Steven J.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 1 - 15
  • [3] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
    Du, Zhihao
    Han, Jiqing
    Zhang, Xueliang
    [J]. INTERSPEECH 2020, 2020, : 309 - 313
  • [4] DEEP CASA FOR TALKER-INDEPENDENT MONAURAL SPEECH SEPARATION
    Liu, Yuzhou
    Delfarah, Masood
    Wang, DeLiang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6354 - 6358
  • [5] Deep Neural Network Based Speech Separation for Robust Speech Recognition
    Tu Yanhui
    Jun, Du
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536
  • [6] SPEECH RECOGNITION ROBUST AGAINST SPEECH OVERLAPPING IN MONAURAL RECORDINGS OF TELEPHONE CONVERSATIONS
    Suzuki, Masayuki
    Kurata, Gakuto
    Nagano, Tohru
    Tachibana, Ryuki
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5685 - 5689
  • [7] Robust speech recognition by integrating speech separation and hypothesis testing
    Srinivasan, Soundararajan
    Wang, DeLiang
    [J]. SPEECH COMMUNICATION, 2010, 52 (01) : 72 - 81
  • [8] Robust speech recognition by integrating speech separation and hypothesis testing
    Srinivasan, S
    Wang, DL
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 89 - 92
  • [9] Speech Separation and Recognition Using CASA Segmentation and Language-Based Grouping
    Karpukhin, Ivan
    Konushin, Anton
    [J]. ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7650 - 7654
  • [10] DEEP LEARNING FOR MONAURAL SPEECH SEPARATION
    Huang, Po-Sen
    Kim, Minje
    Hasegawa-Johnson, Mark
    Smaragdis, Paris
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,