Monaural speech separation based on MAXVQ and CASA for robust speech recognition

被引:32
|
作者
Li, Peng [1 ]
Guan, Yong [2 ]
Wang, Shijin [1 ]
Xu, Bo [1 ,2 ]
Liu, Wenju [2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
来源
COMPUTER SPEECH AND LANGUAGE | 2010年 / 24卷 / 01期
关键词
Monaural speech separation; Computational auditory scene analysis (CASA); Factorial-max vector quantization (MAXVQ); Automatic speech recognition (ASR); MAXIMUM-LIKELIHOOD-ESTIMATION; AUDITORY SCENE ANALYSIS; BIAS REMOVAL; NOISE; ADAPTATION;
D O I
10.1016/j.csl.2008.05.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:30 / 44
页数:15
相关论文
共 50 条
  • [41] A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech
    S. Shoba
    R. Rajavel
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11 : 433 - 446
  • [42] A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech
    Shoba, S.
    Rajavel, R.
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (01) : 433 - 446
  • [43] Speech parameters for the robust emotional speech recognition
    Kim W.-G.
    [J]. Journal of Institute of Control, Robotics and Systems, 2010, 16 (12) : 1137 - 1142
  • [44] Japanese speech databases for robust speech recognition
    Nakamura, A
    Matsunaga, S
    Shimizu, T
    Tonomura, M
    Sagisaka, Y
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2199 - 2202
  • [45] Robust speech detector for speech recognition applications
    Liang, WQ
    Chen, YN
    Shan, YX
    Liu, J
    Liu, RS
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 453 - 456
  • [46] Distilled Binary Neural Network for Monaural Speech Separation
    Chen, Xiuyi
    Liu, Guangcan
    Shi, Jing
    Xu, Jiaming
    Xu, Bo
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [47] A Deep Ensemble Learning Method for Monaural Speech Separation
    Zhang, Xiao-Lei
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (05) : 967 - 977
  • [48] On Synthesis for Supervised Monaural Speech Separation in Time Domain
    Chen, Jingjing
    Mao, Qirong
    Liu, Dons
    [J]. INTERSPEECH 2020, 2020, : 2627 - 2631
  • [49] Deep Attractor with Convolutional Network for Monaural Speech Separation
    Lan, Tian
    Qian, Yuxin
    Tai, Wenxin
    Chu, Boce
    Liu, Qiao
    [J]. 2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 40 - 44
  • [50] Investigation of Cost Function for Supervised Monaural Speech Separation
    Liu, Yun
    Zhang, Hui
    Zhang, Xueliang
    Cao, Yuhang
    [J]. INTERSPEECH 2019, 2019, : 3178 - 3182